Chunking
How documents are split into chunks and why it matters for retrieval quality.
Before your documents can be searched semantically, they need to be embedded—turned into vectors that represent their meaning. But embedding models work best with focused pieces of text, not sprawling documents. A 10,000-word article embedded as a single vector becomes a vague point in semantic space, losing the nuance of individual sections and paragraphs.
Chunking is how Unrag splits documents into those focused pieces. Each chunk gets its own embedding, its own position in vector space, its own chance to match user queries. The quality of your retrieval depends significantly on how well your chunking strategy matches your content.
The default chunker
When you create a new Unrag project, you get token-based recursive chunking out of the box. This algorithm tries to split text at natural boundaries—paragraphs first, then sentences, then clauses—while respecting token limits. It uses the o200k_base tokenizer, the same encoding used by GPT-5, GPT-4o, and current OpenAI models.
The default settings work well for most content:
- chunkSize: 512 tokens
- chunkOverlap: 50 tokens
- minChunkSize: 24 tokens
These values balance precision (chunks focused enough to match specific queries) with context (chunks large enough to be self-contained). The overlap ensures that ideas spanning chunk boundaries appear in both adjacent chunks, reducing the chance of missing relevant content.
Choosing a chunking method
Different content types benefit from different approaches. Unrag provides several chunking methods:
Recursive chunking (the default) works for general prose. It's fast, predictable, and handles mixed content gracefully.
Semantic chunking uses an LLM to identify where topics actually shift, rather than relying on formatting cues. It costs more but produces more coherent chunks.
Markdown chunking understands markdown structure—headings, code blocks, horizontal rules. It keeps code blocks intact and splits at section boundaries.
Code chunking parses source code with tree-sitter and splits at function and class boundaries, keeping complete definitions together.
Hierarchical chunking prepends section headers to every chunk, so each chunk carries context about where it fits in the document.
Agentic chunking uses an LLM to optimize chunks specifically for retrieval, considering what users might search for.
Custom chunking lets you implement your own logic when built-in options don't fit.
You can configure your preferred method in unrag.config.ts:
export default defineUnragConfig({
chunking: {
method: "markdown",
options: {
chunkSize: 512,
chunkOverlap: 50,
},
},
// ...
});Per-ingest overrides
The configured chunking method becomes your default, but you're not locked into it. You can override chunking behavior for individual engine.ingest() calls in two ways.
Override just the options when you want to keep the same algorithm but adjust parameters:
await engine.ingest({
sourceId: "specs:detailed-design",
content: designDoc,
chunking: { chunkSize: 768, chunkOverlap: 75 },
});This uses your configured chunker but with larger chunks and more overlap—useful for dense technical content where you want more context per chunk.
Override the chunker itself when different content needs a different algorithm:
import { codeChunker } from "@unrag/chunking/code";
await engine.ingest({
sourceId: "src/utils/helpers.ts",
content: sourceCode,
chunker: codeChunker, // Use code chunking for this file
});The per-ingest chunker parameter takes precedence over your configured method. This means you can handle heterogeneous content—documentation, code, articles—with a single engine instance, applying the appropriate chunking strategy to each.
Token counting
Unrag measures chunk sizes in tokens, not characters or words. This matters because embedding models process tokens, and their context limits are defined in tokens. A 512-token chunk is guaranteed to fit in any modern embedding model's context window.
If you're building custom logic or want to understand your content better, Unrag exports a countTokens utility:
import { countTokens } from "unrag";
const tokens = countTokens("Hello world"); // 2
const docSize = countTokens(myDocument); // exact countThis uses the same tokenizer as the chunker, so counts are consistent.
