unrag.config.ts Reference
The generated configuration file and how to customize it.
The unrag.config.ts file is the central place to configure Unrag. It's generated when you run unrag@latest init and contains everything needed to construct a working engine: database connection, embedding provider, and default settings.
Structure overview
The generated config is minimal by default and grows as you add modules. Here's what a basic text-only setup looks like (the default when you don't enable rich media):
import { defineUnragConfig } from "./lib/unrag/core";
import { createDrizzleVectorStore } from "./lib/unrag/store/drizzle";
import { drizzle } from "drizzle-orm/node-postgres";
import { Pool } from "pg";
export const unrag = defineUnragConfig({
defaults: {
chunking: {
chunkSize: 512,
chunkOverlap: 50,
},
retrieval: {
topK: 8,
},
},
embedding: {
provider: "ai",
config: {
type: "text",
model: "openai/text-embedding-3-small",
timeoutMs: 15_000,
},
},
engine: {
storage: {
storeChunkContent: true,
storeDocumentContent: true,
},
extractors: [],
},
} as const);
export function createUnragEngine() {
const databaseUrl = process.env.DATABASE_URL;
if (!databaseUrl) throw new Error("DATABASE_URL is required");
const pool = (globalThis as any).__unragPool ?? new Pool({ connectionString: databaseUrl });
(globalThis as any).__unragPool = pool;
const db = (globalThis as any).__unragDrizzleDb ?? drizzle(pool);
(globalThis as any).__unragDrizzleDb = db;
const store = createDrizzleVectorStore(db);
return unrag.createEngine({ store });
}Notice that assetProcessing is omitted entirely in the minimal config. The library uses sensible defaults internally, and you only need to add assetProcessing overrides when you install extractors.
Config grows with modules
When you run unrag add extractor <name>, the CLI automatically:
- Adds the extractor import
- Registers it in the
extractorsarray - Adds minimal
assetProcessingoverrides (only theenabled: trueflags needed for that extractor)
For example, after running unrag add extractor pdf-text-layer, your config will include:
import { createPdfTextLayerExtractor } from "./lib/unrag/extractors/pdf-text-layer";
// ...
export const unrag = defineUnragConfig({
// ...
engine: {
// ...
extractors: [
createPdfTextLayerExtractor(),
],
assetProcessing: {
pdf: {
textLayer: {
enabled: true,
},
},
},
},
} as const);The config only includes the minimal overrides needed—not the full verbose assetProcessing tree. If you ran init --rich-media and selected extractors, the same minimal overrides are generated during initialization. Note that multimodal embedding is configured separately—see Multimodal Embeddings to enable it.
The unrag config
This object holds your default settings. Changing values here affects all operations that use these defaults.
defaults.chunking
Controls how documents are split into chunks. Unrag uses token-based recursive chunking with the o200k_base tokenizer (same as GPT-5, GPT-4o). See Chunking for details on strategies and plugin chunkers.
Prop
Type
defaults.retrieval
Convenience defaults you can use in your own engine.retrieve() calls/helpers (the engine default is topK: 8 when omitted).
Prop
Type
defaults.embedding
Controls embedding batching and concurrency during ingestion. This helps improve throughput and reduce rate-limit risk (especially when your provider supports embedMany()).
You can set these under defaults.embedding (recommended). If you need a per-engine override, you can also set engine.embeddingProcessing (it overrides these defaults).
Prop
Type
chunking
Configures the chunking method and options at the top level. This is an alternative to defaults.chunking that also lets you specify which chunking method to use (built-in, plugin, or custom).
export default defineUnragConfig({
chunking: {
method: "recursive", // or "markdown", "code", "semantic", etc.
options: {
chunkSize: 512,
chunkOverlap: 50,
minChunkSize: 24,
},
},
// ...
});Prop
Type
Plugin chunkers must be installed via CLI before use:
bunx unrag add chunker:markdown # For documentation
bunx unrag add chunker:code # For source code (uses tree-sitter)
bunx unrag add chunker:semantic # LLM-guided semantic boundaries
bunx unrag add chunker:hierarchical # Section-first with header context
bunx unrag add chunker:agentic # LLM-powered highest qualitystorage
Controls what Unrag persists to your database.
Prop
Type
embedding
Configuration for the embedding provider. Unrag supports twelve built-in providers, each with its own configuration options. See Providers for detailed setup instructions for each provider.
The embedding field accepts an object with a provider field and an optional config object:
embedding: {
provider: "openai", // or "google", "voyage", "ollama", etc.
config: {
model: "text-embedding-3-small",
timeoutMs: 15_000,
// Provider-specific options
},
},Prop
Type
Common configuration options available on most providers:
Prop
Type
For multimodal embeddings (embedding images alongside text), use the Voyage provider with type: "multimodal". See Multimodal Embeddings for details.
assetProcessing
Controls how PDFs, images, and other rich media are processed during ingestion. For complete type definitions, see Asset Processing Reference.
Prop
Type
For minimal installs, assetProcessing is omitted entirely and the library uses internal defaults. When you run init --rich-media and select extractors, or when you run unrag add extractor <name>, the CLI automatically adds minimal assetProcessing overrides with only the enabled: true flags needed for your selected extractors. If no extractor is registered for an enabled flag, ingestion emits warnings so you don't miss content silently.
extractors
The extractors array holds extractor module instances that process rich media assets. Extractors are installed via the CLI and automatically registered in your config.
import { createPdfLlmExtractor } from "./lib/unrag/extractors/pdf-llm";
export const unrag = defineUnragConfig({
// ...
engine: {
// ...
extractors: [
createPdfLlmExtractor(),
// Add more extractors as you install them
],
},
} as const);Installing extractors
Use the CLI to install extractor modules:
bunx unrag@latest add extractor pdf-llmThis:
- Copies the extractor source to
lib/unrag/extractors/pdf-llm/ - Adds dependencies to your
package.json - Automatically patches
unrag.config.tsto:- Add the extractor import
- Register it in the
extractorsarray - Enable the corresponding
assetProcessingflags (minimal overrides only)
Available extractors
| Module | Extractor name | Description |
|---|---|---|
pdf-text-layer | pdf:text-layer | Fast/cheap PDF text-layer extraction |
pdf-llm | pdf:llm | Extract text from PDFs using an LLM (Gemini by default) |
pdf-ocr | pdf:ocr | OCR PDFs by rasterizing pages (worker-only) |
image-ocr | image:ocr | OCR images into text chunks |
image-caption-llm | image:caption-llm | Generate image captions via LLM |
audio-transcribe | audio:transcribe | Transcribe audio into text chunks |
video-transcribe | video:transcribe | Transcribe video audio track into text chunks |
video-frames | video:frames | Sample frames + extract text per frame (worker-only) |
file-text | file:text | Decode text-ish attachments |
file-docx | file:docx | Extract raw text from .docx |
file-pptx | file:pptx | Extract slide text from .pptx |
file-xlsx | file:xlsx | Extract sheet content from .xlsx |
Image handling (image:embed, image:caption) is built into the core engine—no extractor module needed. Configure via your embedding provider's type setting. Installable image extractors (image:ocr, image:caption-llm) can generate additional text chunks when enabled.
See Extractors Overview for details on all available extractors and how to create custom ones.
The createUnragEngine function
This function assembles all the pieces into a working engine. You'll typically call it at the start of your request handlers or scripts.
The generated version includes:
- Embedding provider wiring (derived from
unrag.embedding) - Database connection with a singleton pattern to prevent connection exhaustion
- Store adapter creation using your chosen adapter type
- Engine construction via
unrag.createEngine({ store })
Customizing database connection
The generated code uses globalThis singletons for connection reuse. You can replace this with your own connection management:
// Use an existing pool from elsewhere in your app
import { pool } from "@/lib/db";
import { drizzle } from "drizzle-orm/node-postgres";
export function createUnragEngine() {
const db = drizzle(pool);
const store = createDrizzleVectorStore(db);
// ... rest of function
}For different database providers, adjust the connection setup:
import { neon } from "@neondatabase/serverless";
import { drizzle } from "drizzle-orm/neon-http";
const sql = neon(process.env.DATABASE_URL!);
const db = drizzle(sql);import { Pool } from "pg";
const pool = new Pool({
connectionString: process.env.SUPABASE_DB_URL,
});import { Pool } from "pg";
const pool = new Pool({
connectionString: process.env.DATABASE_URL,
});Adding custom helpers
Extend the config file with application-specific helpers:
// Tenant-scoped retrieval
export async function retrieveForTenant(tenantId: string, query: string) {
const engine = createUnragEngine();
return engine.retrieve({
query,
scope: { sourceId: `tenant:${tenantId}:` },
});
}
// Ingest with validation
export async function ingestDocument(
sourceId: string,
content: string,
metadata: Record<string, unknown>
) {
if (!sourceId || !content) {
throw new Error("sourceId and content are required");
}
const engine = createUnragEngine();
return engine.ingest({ sourceId, content, metadata });
}Environment variables
The config file expects these environment variables. The embedding-related variables depend on which provider you're using.
Required for all setups
Prop
Type
Provider-specific environment variables
Each provider requires its own API key and optionally accepts a model override. See the Providers documentation for details on each provider's requirements. Common examples:
| Provider | API Key Variable | Model Override Variable |
|---|---|---|
| OpenAI | OPENAI_API_KEY | OPENAI_EMBEDDING_MODEL |
| Google AI | GOOGLE_GENERATIVE_AI_API_KEY | GOOGLE_GENERATIVE_AI_EMBEDDING_MODEL |
| Voyage | VOYAGE_API_KEY | VOYAGE_MODEL |
| Cohere | COHERE_API_KEY | COHERE_EMBEDDING_MODEL |
| Mistral | MISTRAL_API_KEY | MISTRAL_EMBEDDING_MODEL |
| Ollama | (none required) | OLLAMA_EMBEDDING_MODEL |
| AI Gateway | AI_GATEWAY_API_KEY | AI_GATEWAY_MODEL |
Azure, Vertex, and Bedrock use their respective cloud authentication mechanisms (Azure credentials, GCP Application Default Credentials, AWS credentials) rather than API keys.
Server-only
Keep unrag.config.ts server-only. It imports database drivers, reads secrets from environment variables, and should never be bundled into client code. In Next.js, the file naturally stays server-side when imported only from Route Handlers and Server Actions.
