Unrag
Getting Started

Quickstart

Get from zero to working ingest and retrieve in under 10 minutes.

This guide walks you through the complete setup from a fresh project to your first successful retrieval. By the end, you'll have content stored in Postgres and be able to query it with semantic search.

Initialize Unrag

Run the CLI with your preferences. For this quickstart, we'll use Drizzle as the store adapter:

bunx unrag@latest init --yes --store drizzle --dir lib/unrag --alias @unrag

This creates your configuration file, installs the vendored module code, adds the necessary dependencies to your package.json, and automatically runs your package manager to install them.

Set environment variables

Unrag needs two environment variables to function. Add them to your .env file (or however you manage secrets):

# Your Postgres connection string
DATABASE_URL="postgresql://user:password@localhost:5432/mydb"

# API key for embedding (required by the AI SDK)
AI_GATEWAY_API_KEY="your-openai-api-key"

The AI_GATEWAY_API_KEY is used by the default embedding provider, which calls OpenAI's text-embedding-3-small model. If you want to use a different model, you can also set AI_GATEWAY_MODEL:

AI_GATEWAY_MODEL="openai/text-embedding-3-large"

Create the database schema

Connect to your Postgres database and run the schema creation SQL. If you generated docs with unrag init --with-docs, the complete schema is in lib/unrag/unrag.md. Here's the quick version:

-- Enable pgvector
create extension if not exists vector;

-- Create tables
create table documents (
  id uuid primary key,
  source_id text not null unique,
  content text not null,
  metadata jsonb,
  created_at timestamp default now()
);

create table chunks (
  id uuid primary key,
  document_id uuid not null references documents(id) on delete cascade,
  source_id text not null,
  idx integer not null,
  content text not null,
  token_count integer not null,
  metadata jsonb,
  created_at timestamp default now()
);

create table embeddings (
  chunk_id uuid primary key references chunks(id) on delete cascade,
  embedding vector,
  embedding_dimension integer,
  created_at timestamp default now()
);

-- Add indexes for common queries
create index if not exists chunks_source_id_idx on chunks(source_id);
create index if not exists documents_source_id_idx on documents(source_id);
-- Recommended for vector search performance (pgvector 0.5.0+):
create index if not exists embeddings_hnsw_idx
on embeddings using hnsw (embedding vector_cosine_ops);

You can verify your database setup is correct by running bunx unrag doctor --db. This checks that pgvector is enabled, tables exist, and indexes are present.

Ingest your first content

Create a simple script to test ingestion. This can be a standalone file or part of your application:

import { createUnragEngine } from "@unrag/config";

async function ingestDemo() {
  const engine = createUnragEngine();

  // Ingest a piece of content
  const result = await engine.ingest({
    sourceId: "docs:quickstart",
    content: `
      Unrag is a RAG installer for TypeScript projects. It installs a small,
      composable module into your codebase as vendored source files. Instead
      of depending on an external SDK, you own the code and can modify it
      to fit your needs.
      
      The two core operations are ingest and retrieve. Ingestion takes content,
      splits it into chunks, generates embeddings for each chunk, and stores
      everything in Postgres with pgvector. Retrieval takes a query, embeds it,
      and finds the most similar chunks using vector similarity search.
    `,
    metadata: { 
      section: "getting-started",
      language: "en" 
    },
  });

  console.log("Ingested document:", result.documentId);
  console.log("Created chunks:", result.chunkCount);
  console.log("Embedding model:", result.embeddingModel);
  console.log("Timings:", result.durations);
}

ingestDemo().catch(console.error);

Run this script, and you should see output showing the document was chunked and stored. The durations object tells you how long each phase took—chunking, embedding, and storage—which is helpful for understanding performance characteristics.

Retrieve content

Now query the content you just ingested:

import { createUnragEngine } from "@unrag/config";

async function retrieveDemo() {
  const engine = createUnragEngine();

  const result = await engine.retrieve({
    query: "What are the core operations in Unrag?",
    topK: 5,
  });

  console.log("Found", result.chunks.length, "chunks");
  console.log("Embedding model:", result.embeddingModel);
  console.log("Timings:", result.durations);
  
  for (const chunk of result.chunks) {
    console.log("\n---");
    console.log("Score:", chunk.score);
    console.log("Source:", chunk.sourceId);
    console.log("Content:", chunk.content.substring(0, 200) + "...");
  }
}

retrieveDemo().catch(console.error);

The score field represents the distance between the query embedding and each chunk's embedding. Lower scores mean higher similarity (the chunks are "closer" to your query in the embedding space).

Understanding the results

When you run retrieval, you get back an object containing:

  • chunks: An array of the most relevant chunks, each with its content, metadata, source information, and similarity score
  • embeddingModel: The model used to embed the query (useful for debugging if you have multiple models)
  • durations: Timing breakdown showing how long embedding and retrieval took

The chunks are sorted by score ascending, so the first chunk is the most similar to your query. You can use these chunks to build context for an LLM prompt, populate a search results page, or power any other retrieval-based feature.

Where to customize

Open unrag.config.ts to adjust settings. The most common customizations are:

  1. Chunking parameters: Adjust chunkSize and chunkOverlap (in tokens) to change how documents are split. Smaller chunks give more precise retrieval but cost more to embed. Default: 512 tokens with 50 token overlap.

  2. Default topK: Change how many chunks are returned by default when you don't specify topK in your retrieve call.

  3. Embedding model: Switch to a different model by changing the model field. Larger models like text-embedding-3-large may give better results at higher cost.

  4. Database connection: If you already have a database pool or client configured elsewhere in your app, you can modify createUnragEngine() to use that instead of creating a new one.

Going beyond text

Your quickstart used plain text content, but Unrag can handle more. If you didn't enable rich media during init, you can add it now by re-running the initializer:

bunx unrag@latest init --rich-media

This presents a list of extractors to install (PDF, image, audio, video, files) and configures your unrag.config.ts with extractors registered and the right assetProcessing flags enabled. Note that multimodal embeddings are configured separately—see Multimodal Embeddings if you want to embed images directly.

If you prefer to configure things manually, the generated unrag.config.ts includes assetProcessing settings you can edit directly.

Speeding up development with AI

If you're using an AI coding assistant like Claude Code, Cursor, or Windsurf, you can give it deep knowledge of Unrag's API to help you write correct code faster. Out of the box, AI assistants often hallucinate method names or configuration options because Unrag wasn't heavily represented in their training data. By installing the Unrag Agent Skill, your AI gets access to complete type definitions, all twelve embedding providers, extractors, connectors, and production patterns—so it can produce working code instead of plausible-looking guesses.

This is especially helpful as you move beyond the basics and start configuring specific embedding providers, adding extractors for different file types, or building production search endpoints. Instead of context-switching between your editor and documentation, you can ask your AI assistant and get accurate answers.

When you're ready to ingest content with embedded PDFs or images, explore AI-assisted workflows, or dive deeper:

On this page

RAG handbook banner image

Free comprehensive guide

Complete RAG Handbook

Learn RAG from first principles to production operations. Tackle decisions, tradeoffs and failure modes in production RAG operations

The RAG handbook covers retrieval augmented generation from foundational principles through production deployment, including quality-latency-cost tradeoffs and operational considerations. Click to access the complete handbook.