Unrag
Concepts

ContextEngine

The central class that coordinates ingestion, retrieval, deletion, and reranking.

The ContextEngine class is what you interact with day-to-day. It's a small coordinator that ties together your embedding provider, store adapter, and configuration options into a cohesive interface.

Creating an engine

You typically don't construct ContextEngine directly. Instead, you use the createUnragEngine() function from your generated config file:

import { createUnragEngine } from "@unrag/config";

const engine = createUnragEngine();

This function is generated when you run unrag@latest init. It creates the embedding provider, initializes the database connection, constructs the store adapter, and assembles everything into a working engine. You can open unrag.config.ts to see exactly how it's built.

If you need multiple engine instances with different configurations (for example, different embedding models for different content types), you can create them by defining multiple configs with defineUnragConfig() and calling unrag.createEngine(...):

import { defineUnragConfig } from "@unrag/core";
import { createDrizzleVectorStore } from "@unrag/store/drizzle";

const unrag = defineUnragConfig({
  defaults: {
    chunking: { chunkSize: 768, chunkOverlap: 75 },  // tokens, not words
    retrieval: { topK: 8 },
  },
  embedding: {
    provider: "ai",
    config: { model: "openai/text-embedding-3-large" },
  },
  engine: {},
} as const);

const customEngine = unrag.createEngine({ store: createDrizzleVectorStore(db) });

Using the engine

The engine exposes four methods that handle all the heavy lifting:

ingest() takes content and stores it as searchable chunks:

const result = await engine.ingest({
  sourceId: "docs:architecture",
  content: "Your document text here...",
  metadata: { category: "technical", author: "alice" },
  chunking: { chunkSize: 256 }, // Optional per-call override (in tokens)
});

The sourceId is a string identifier for the logical document. Use consistent, meaningful identifiers—like docs:getting-started or article:12345—so you can update content by re-ingesting with the same ID.

The metadata object is stored alongside the document and its chunks. You can use it for filtering, display, or analytics. It's stored as JSON, so stick to serializable values.

The optional chunking parameter lets you override the default chunk size and overlap for this specific document. This is useful when different content types need different chunking strategies.

retrieve() searches for chunks similar to a query:

const result = await engine.retrieve({
  query: "How do I configure authentication?",
  topK: 10,
  scope: { sourceId: "docs:" }, // Optional: only search docs
});

The query is the search string. It gets embedded using the same model that embedded your chunks, then compared against stored embeddings to find the most similar matches.

topK controls how many results you get back. The default is 8, which is usually a good starting point.

The scope parameter filters results. When you provide { sourceId: "docs:" }, only chunks whose sourceId starts with "docs:" are considered. This is useful for searching within specific collections or tenants.

rerank() improves retrieval precision by reordering candidates with a more expensive relevance model:

// Retrieve more candidates than you need
const retrieved = await engine.retrieve({
  query: "How do I configure authentication?",
  topK: 30,
});

// Rerank to get the most relevant results
const reranked = await engine.rerank({
  query: "How do I configure authentication?",
  candidates: retrieved.chunks,
  topK: 8,
});

Reranking is optional and requires installing the reranker battery. It adds latency but significantly improves precision for many use cases. See the Reranker documentation for setup and usage details.

delete() removes stored content by logical identity:

// Delete one logical document (exact match)
await engine.delete({ sourceId: "docs:architecture" });

// Delete an entire namespace (prefix match)
await engine.delete({ sourceIdPrefix: "tenant:acme:" });

Deletion works by deleting document rows and relying on cascading deletes to remove dependent chunks and embeddings.

What the methods return

Ingest returns information about what was stored:

{
  documentId: "550e8400-e29b-41d4-a716-446655440000",
  chunkCount: 12,
  embeddingModel: "ai-sdk:openai/text-embedding-3-small",
  durations: {
    totalMs: 1523,
    chunkingMs: 2,
    embeddingMs: 1456,
    storageMs: 65
  }
}

The documentId is the UUID assigned to this document in the database. The chunkCount tells you how many chunks were created. The durations object helps you understand where time is being spent—usually embedding dominates.

Retrieve returns the matching chunks and metadata:

{
  chunks: [
    {
      id: "550e8400-e29b-41d4-a716-446655440001",
      documentId: "550e8400-e29b-41d4-a716-446655440000",
      sourceId: "docs:auth",
      index: 2,
      content: "To configure authentication, first...",
      tokenCount: 47,
      metadata: { category: "technical" },
      score: 0.234
    },
    // ... more chunks
  ],
  embeddingModel: "ai-sdk:openai/text-embedding-3-small",
  durations: {
    totalMs: 234,
    embeddingMs: 189,
    retrievalMs: 45
  }
}

Each chunk includes its content, the document it came from, any metadata, and a score representing its similarity to the query. Lower scores mean higher similarity when using cosine distance.

If you set storage.storeChunkContent: false in your engine config, chunk.content will be an empty string in retrieval results (you’ll need to resolve the original content from your source system using ids/metadata).

Configuration options

When constructing an engine (either through createUnragEngine() or directly), you can configure:

  1. embedding: The provider that turns text into vectors
  2. store: The adapter that handles database operations
  3. storage: Whether Unrag persists chunks.content and/or documents.content
  4. defaults: Default chunking parameters (chunkSize and chunkOverlap)
  5. chunker: A custom function for splitting documents (optional)
  6. idGenerator: A custom function for generating UUIDs (optional, defaults to crypto.randomUUID())

Most projects only customize the first three. The defaults in unrag.config.ts work well for general-purpose text content.

Thread safety and instance reuse

The engine is designed to be created once and reused. The store adapter maintains a database connection pool, and the embedding provider is stateless. You can safely use the same engine instance across multiple concurrent requests.

In Next.js or similar frameworks with hot reloading, the generated createUnragEngine() uses a singleton pattern (via globalThis) to prevent connection pool exhaustion during development. In production, the engine is created once and reused for the lifetime of the process.

On this page

RAG handbook banner image

Free comprehensive guide

Complete RAG Handbook

Learn RAG from first principles to production operations. Tackle decisions, tradeoffs and failure modes in production RAG operations

The RAG handbook covers retrieval augmented generation from foundational principles through production deployment, including quality-latency-cost tradeoffs and operational considerations. Click to access the complete handbook.