Reranker
Improve retrieval precision with second-stage reranking using Cohere or custom models.
Vector similarity search is fast but imprecise. When you retrieve the top 30 chunks by embedding distance, the best match isn't always first—it might be third, or seventh, or twentieth. The initial ranking is based on how close vectors are in embedding space, which is a useful approximation but not a perfect measure of relevance to the specific query.
Reranking solves this by adding a second stage. You retrieve a larger set of candidates using fast vector search, then run those candidates through a more expensive relevance model that directly scores each candidate against the query. The reranker sees both the query text and the candidate text, so it can make much more nuanced relevance judgments than embedding distance alone.
This two-stage pattern—fast retrieval followed by precise reranking—is how most production search systems work. Unrag's reranker battery makes it straightforward to add this capability to your RAG pipeline.
When reranking helps
Reranking is most valuable when your initial retrieval returns good candidates in the wrong order. This happens more often than you might expect. Embedding models are trained on general semantic similarity, not query-document relevance. A chunk that's semantically related to your query might rank higher than a chunk that directly answers it.
Consider a query like "how do I reset my password?" Your vector search might return chunks about:
- Account security best practices (mentions passwords, ranks high by similarity)
- The password reset process (directly relevant, but ranks third)
- User authentication architecture (technically related, ranks second)
A reranker trained on query-document relevance will recognize that chunk 2 directly answers the question and promote it to the top.
Reranking also helps when you need to retrieve more results. If you're building context for an LLM, you might want 10-15 highly relevant chunks. Retrieving 10 with vector search alone often includes some marginal matches. Retrieving 30 and reranking to the top 10 gives you better precision.
That said, reranking adds latency and cost. The Cohere reranker (the default implementation) takes 100-300ms per call and has per-request pricing. For simple use cases where vector search already gives good results, reranking might not be worth it. Start without it, measure your retrieval quality, and add reranking when you need the precision boost.
Installing the reranker battery
Install the reranker battery using the CLI:
bunx unrag@latest add battery rerankerThis copies the reranker module into your project at lib/unrag/rerank/, adds the required dependencies (ai and @ai-sdk/cohere), and updates your unrag.json to track the installed battery.
Setting up the Cohere API key
The default reranker uses Cohere's rerank-v3.5 model, which requires an API key. Add it to your environment:
COHERE_API_KEY=your-cohere-api-keyYou can get an API key from the Cohere dashboard. They offer a free tier that's sufficient for development and testing.
Wiring the reranker into your config
After installing, you need to tell the engine about your reranker. Open unrag.config.ts and add the reranker configuration:
import { defineUnragConfig } from "./lib/unrag/core";
import { createCohereReranker } from "./lib/unrag/rerank";
export const unrag = defineUnragConfig({
// ... your existing config
engine: {
// ... other engine config
reranker: createCohereReranker(),
},
} as const);The createCohereReranker() function returns a reranker that uses Cohere's rerank-v3.5 model by default. You can customize it:
reranker: createCohereReranker({
model: "rerank-english-v2.0", // Use a different model
maxDocuments: 500, // Limit batch size (default: 1000)
})Using reranking in your application
Once configured, call engine.rerank() after retrieval:
import { createUnragEngine } from "@unrag/config";
const engine = createUnragEngine();
// Step 1: Retrieve a larger set of candidates
const retrieved = await engine.retrieve({
query: "how do I reset my password?",
topK: 30, // Retrieve more than you need
});
// Step 2: Rerank to get the most relevant results
const reranked = await engine.rerank({
query: "how do I reset my password?",
candidates: retrieved.chunks,
topK: 8, // Return only the top 8 after reranking
});
// Use the reranked results
for (const chunk of reranked.chunks) {
console.log(chunk.content);
}The rerank() method takes the same query you used for retrieval (important—the reranker scores candidates against this query) and the chunks from your retrieval results. It returns a new set of chunks in reranked order.
API Reference
This section documents the types for reranking inputs and outputs.
RerankInput
The input to engine.rerank():
type RerankInput = {
query: string;
candidates: RerankCandidate[];
topK?: number;
onMissingReranker?: "throw" | "skip";
onMissingText?: "throw" | "skip";
resolveText?: (candidate: RerankCandidate) => string | Promise<string>;
};Prop
Type
RerankResult
The output from engine.rerank():
type RerankResult = {
chunks: RerankCandidate[];
ranking: RerankRankingItem[];
meta: { rerankerName: string; model?: string };
durations: { rerankMs: number; totalMs: number };
warnings: string[];
};Prop
Type
RerankRankingItem
Each item in the ranking array:
type RerankRankingItem = {
index: number;
rerankScore?: number;
};| Field | Type | Description |
|---|---|---|
index | number | Original index into the candidates array |
rerankScore | number | undefined | Score assigned by the reranker (if available). Higher scores indicate more relevance. |
The ranking array contains the complete ranking of all candidates, not just the top K. This is useful for debugging and evaluation—you can see exactly how the reranker reordered your results:
for (const item of result.ranking) {
console.log(`Candidate ${item.index}: score ${item.rerankScore}`);
}RerankCandidate
A chunk with its retrieval score, used as input to reranking:
type RerankCandidate = Chunk & { score: number };This is the same type returned by engine.retrieve(), so you can pass retrieval results directly to rerank.
Reranker
The interface that reranker implementations must satisfy:
type Reranker = {
name: string;
rerank: (args: RerankerRerankArgs) => Promise<RerankerRerankResult>;
};
type RerankerRerankArgs = {
query: string;
documents: string[];
};
type RerankerRerankResult = {
order: number[];
scores?: number[];
model?: string;
};Prop
Type
CohereRerankerConfig
Configuration for the default Cohere reranker:
type CohereRerankerConfig = {
model?: string;
apiKey?: string;
baseUrl?: string;
maxDocuments?: number;
};Prop
Type
Handling edge cases
Reranking requires text content to score candidates against the query. If you've configured Unrag with storage.storeChunkContent: false (storing only embeddings, not text), the reranker won't have access to the chunk content it needs.
By default, engine.rerank() throws an error if it encounters a candidate with empty content:
// This will throw if any candidate has empty content
const result = await engine.rerank({
query: "test",
candidates: retrieved.chunks,
});
// Error: Candidate 3 (id=...) has empty content. Enable 'storeChunkContent'
// in engine config, provide 'resolveText' hook, or use 'onMissingText: "skip"'.You have three options for handling this:
Option 1: Store chunk content (recommended)
The simplest solution is to ensure storage.storeChunkContent: true in your engine config. This is the default, so unless you explicitly disabled it, your chunks should have content.
Option 2: Provide a resolveText hook
If you store content externally (in S3, a CMS, etc.), provide a function that fetches the text:
const result = await engine.rerank({
query: "password reset",
candidates: retrieved.chunks,
resolveText: async (candidate) => {
// Fetch from your external store
const doc = await contentStore.get(candidate.documentId);
return doc.sections[candidate.index].text;
},
});The resolveText hook is only called for candidates that have empty content. If a candidate already has content, the hook is skipped.
Option 3: Skip candidates with missing text
If some candidates might legitimately have no text (like image-only chunks), you can skip them:
const result = await engine.rerank({
query: "password reset",
candidates: retrieved.chunks,
onMissingText: "skip", // Skip instead of throwing
});
// Check warnings to see what was skipped
if (result.warnings.length > 0) {
console.log("Skipped candidates:", result.warnings);
}Skipped candidates are appended to the end of the ranking in their original order. If you're taking the top K, they typically won't appear in your final results unless most candidates were skipped.
When no reranker is configured
If you call engine.rerank() without configuring a reranker, it throws by default:
// Throws: Reranker not configured. Install the reranker battery...
const result = await engine.rerank({
query: "test",
candidates: retrieved.chunks,
});For graceful degradation (useful during development or gradual rollout), pass onMissingReranker: "skip":
const result = await engine.rerank({
query: "test",
candidates: retrieved.chunks,
onMissingReranker: "skip", // Return original order instead of throwing
});
// result.chunks is in the same order as input
// result.meta.rerankerName is "none"
// result.warnings includes a note about the missing rerankerThis lets you write reranking code that works regardless of whether a reranker is configured, which is helpful for shared code paths or feature flags.
Building a custom reranker
The Cohere reranker works well out of the box, but you might want to use a different service or model. The reranker battery includes a createCustomReranker function for this:
import { createCustomReranker } from "./lib/unrag/rerank";
const myReranker = createCustomReranker({
name: "my-reranker",
rerank: async ({ query, documents }) => {
// Call your reranking service
const response = await myRerankerApi.rerank({
query,
documents,
});
// Return indices in relevance order (best first)
// and optionally the scores
return {
order: response.ranking.map(r => r.documentIndex),
scores: response.ranking.map(r => r.score),
model: "my-model-v1",
};
},
});The reranker interface is simple: you receive the query and an array of document texts, and you return the reordered indices. The order array should contain indices into the original documents array, sorted by relevance (most relevant first).
Using an LLM for reranking
You can even use a language model as a reranker. This is more expensive but can be very effective for specialized domains:
import { createCustomReranker } from "./lib/unrag/rerank";
import { generateText } from "ai";
const llmReranker = createCustomReranker({
name: "llm-reranker",
rerank: async ({ query, documents }) => {
// Score each document with the LLM
const scores = await Promise.all(
documents.map(async (doc, i) => {
const result = await generateText({
model: "openai/gpt-4o-mini",
messages: [{
role: "user",
content: `Rate how well this document answers the query on a scale of 0-10.
Query: ${query}
Document: ${doc}
Reply with just the number.`,
}],
});
return { index: i, score: parseFloat(result.text) || 0 };
})
);
// Sort by score descending
scores.sort((a, b) => b.score - a.score);
return {
order: scores.map(s => s.index),
scores: scores.map(s => s.score),
model: "gpt-4o-mini-reranker",
};
},
});This is slower and more expensive than Cohere, but it gives you complete control over the relevance judgment. You can customize the prompt for your domain, add few-shot examples, or use a fine-tuned model.
Performance considerations
Reranking adds latency to your retrieval pipeline. The Cohere reranker typically takes 100-300ms depending on the number of candidates and their lengths. Here are some ways to manage this:
Tune your candidate count. Retrieving 30 candidates and reranking to 10 gives good results for most use cases. Retrieving 100+ candidates increases rerank time without proportional quality improvement—the best results are usually in the initial top 30.
Consider async reranking. If your UI can show initial results and update them, you can return vector search results immediately and rerank in the background:
// Return fast initial results
const initial = await engine.retrieve({ query, topK: 10 });
res.write(JSON.stringify({ chunks: initial.chunks, isReranked: false }));
// Rerank in background and send update
const reranked = await engine.rerank({
query,
candidates: initial.chunks,
topK: 10,
});
res.write(JSON.stringify({ chunks: reranked.chunks, isReranked: true }));Cache rerank results. If the same queries appear frequently, cache the reranked results. The cache key should include both the query and the candidate IDs (since candidates change as content is updated).
Skip reranking for simple queries. If your analytics show that certain query patterns get good results from vector search alone, you can skip reranking for those cases. This is especially useful for very specific queries where the top result is usually correct.
Typical workflow
Here's a complete example of a search endpoint with reranking:
// app/api/search/route.ts
import { createUnragEngine } from "@unrag/config";
import { NextResponse } from "next/server";
export async function GET(request: Request) {
const { searchParams } = new URL(request.url);
const query = searchParams.get("q");
if (!query) {
return NextResponse.json({ error: "Missing query" }, { status: 400 });
}
const engine = createUnragEngine();
// Retrieve more candidates than we need
const retrieved = await engine.retrieve({
query,
topK: 30,
});
// Rerank to get the best results
const reranked = await engine.rerank({
query,
candidates: retrieved.chunks,
topK: 8,
onMissingReranker: "skip", // Graceful fallback
});
return NextResponse.json({
results: reranked.chunks.map(chunk => ({
content: chunk.content,
sourceId: chunk.sourceId,
score: chunk.score,
})),
meta: {
reranked: reranked.meta.rerankerName !== "none",
model: reranked.meta.model,
timings: {
retrieveMs: retrieved.durations.totalMs,
rerankMs: reranked.durations.rerankMs,
},
},
});
}Next steps
Once you have reranking working, consider:
