Introduction

Most RAG solutions fall into two categories: hosted services where you upload documents and query an API, or heavyweight frameworks that abstract away the vector storage and retrieval logic behind layers of configuration. Unrag takes a third path.

When you install Unrag, you're not adding a dependency that phones home or hides its implementation. You're copying a small, self-contained TypeScript module into your project. This module lives in your repository, gets reviewed in your pull requests, and ships with your application. If you want to understand exactly what happens when you call ingest() or retrieve(), you can open the source files and read them. If you need to modify the chunking algorithm, change the similarity function, or add custom filters to your queries, you edit the code directly.

What Unrag installs

After running bunx unrag@latest init, your project will contain:

unrag.config.ts

index.ts

assets.ts

chunking.ts

config.ts

context-engine.ts

deep-merge.ts

delete.ts

ingest.ts

rerank.ts

retrieve.ts

types.ts

unrag.json

unrag.config.ts at your project root is your configuration hub—the single file where you wire together your database client, embedding provider, and default settings.

lib/unrag/core/ has the heart of the system. The ContextEngine class orchestrates all operations: ingestion, retrieval, deletion, and reranking. The ingest.ts file handles chunking documents, generating embeddings for each chunk, and storing everything in your database. The retrieve.ts file embeds your query and runs a similarity search. The delete.ts file handles document removal by source ID. The rerank.ts file provides two-stage retrieval via cross-encoder rerankers.

lib/unrag/store/ contains your chosen database adapter—whether that's Drizzle, Prisma, or raw SQL. Each adapter implements the same interface: upsert() to write documents and their chunks, and query() to find the most similar chunks.

lib/unrag/embedding/ contains the selected embedding provider implementation plus a dispatcher. The default uses the Vercel AI SDK, but you can switch providers by changing a config option and re-running unrag init --full (to vendor all providers) or by re-running init with a different provider selection.

lib/unrag/unrag.md is optional. If you want a generated README with setup notes (schema, env vars, adapter tips), run unrag init --with-docs.

What Unrag assumes

Unrag is opinionated in a few ways that keep it simple:

Postgres with pgvector. Vector databases are proliferating, but most teams already have Postgres. Adding the pgvector extension gives you vector similarity search without introducing a new system. Unrag's adapters generate the SQL to create tables, insert embeddings, and query by cosine distance—all using standard Postgres tooling.

Server-side only. Unrag reads your database credentials and embedding API keys from environment variables. It's designed to run in Route Handlers, Server Actions, API routes, or backend scripts—never in browser code. This also means edge runtimes (Cloudflare Workers, Vercel Edge) aren't supported out of the box. See Supported Runtimes for the full picture on where Unrag runs.

You manage migrations. Unrag doesn't run migrations for you. It gives you the SQL schema you need, and you apply it however your team manages database changes (raw SQL files, Drizzle migrations, Prisma migrations, or any other approach).

What Unrag is not

Unrag focuses narrowly on the storage and retrieval layer. It's intentionally minimal.

It doesn't include:

A hosted vector database. Your vectors live in your Postgres instance.
A chat framework. Unrag returns chunks; you decide how to build prompts, stream responses, or integrate with chat interfaces.
A crawler or ETL pipeline. You bring content to Unrag. Whether that content comes from uploaded files, scraped web pages, database records, or API responses is outside Unrag's scope.
Permission or authentication logic. Unrag provides basic scoping by sourceId, but anything more sophisticated (row-level security, tenant isolation, access control lists) is something you implement in your application code.

The mental model

The system provides four core operations:

Ingestion takes a piece of content (a document, an article, a code file, whatever), splits it into chunks, generates an embedding vector for each chunk, and stores everything in Postgres. The sourceId you provide acts as a logical identifier—if you ingest the same sourceId again, you're updating that document.

Retrieval takes a query string, generates an embedding for it, and asks Postgres for the chunks whose embeddings are most similar to your query embedding. You get back the chunks, their similarity scores, and timing information.

Deletion removes documents by their sourceId or by a source ID prefix. This is useful for removing outdated content or when a user deletes their data.

Reranking (optional) takes retrieved chunks and reorders them using a cross-encoder model for higher precision. This two-stage retrieval pattern—fast vector search followed by expensive but accurate reranking—is common in production systems.

Everything else—how you turn those chunks into a prompt, how you call an LLM, how you handle streaming, how you surface results in your UI—is your application's concern. Unrag gives you the retrieval primitives and gets out of your way.

Beyond text: Multimodal ingestion

Real-world content isn't just text. Your Notion pages have embedded PDFs. Your documentation includes diagrams. Unrag's asset processing system handles this:

PDFs can be processed through LLM extraction—Unrag sends the PDF to an LLM (Gemini by default) which extracts the text content. That text is then chunked and embedded like any other document.

Images can be embedded directly if you're using a multimodal embedding model, or Unrag falls back to embedding their captions.

Connectors like Notion and Google Drive automatically extract these assets from the content they sync, so you don't have to build the asset list yourself.

Asset processing is opt-in and configurable. The library defaults to skipping expensive operations, and rich media settings are only enabled when you choose extractors during init (or add them later). See Multimodal Ingestion for the full picture.

New to RAG?

If you're new to retrieval-augmented generation or want a deeper understanding of the concepts, tradeoffs, and failure modes involved in production RAG systems, check out our comprehensive RAG Handbook. It covers everything from first principles to production operations.