Reranker

Improve retrieval precision with second-stage reranking using Cohere or custom models.

Vector similarity search is fast but imprecise. When you retrieve the top 30 chunks by embedding distance, the best match isn't always first—it might be third, or seventh, or twentieth. The initial ranking is based on how close vectors are in embedding space, which is a useful approximation but not a perfect measure of relevance to the specific query.

Reranking solves this by adding a second stage. You retrieve a larger set of candidates using fast vector search, then run those candidates through a more expensive relevance model that directly scores each candidate against the query. The reranker sees both the query text and the candidate text, so it can make much more nuanced relevance judgments than embedding distance alone.

This two-stage pattern—fast retrieval followed by precise reranking—is how most production search systems work. Unrag's reranker battery makes it straightforward to add this capability to your RAG pipeline.

When reranking helps

Reranking is most valuable when your initial retrieval returns good candidates in the wrong order. This happens more often than you might expect. Embedding models are trained on general semantic similarity, not query-document relevance. A chunk that's semantically related to your query might rank higher than a chunk that directly answers it.

Consider a query like "how do I reset my password?" Your vector search might return chunks about:

Account security best practices (mentions passwords, ranks high by similarity)
The password reset process (directly relevant, but ranks third)
User authentication architecture (technically related, ranks second)

A reranker trained on query-document relevance will recognize that chunk 2 directly answers the question and promote it to the top.

Reranking also helps when you need to retrieve more results. If you're building context for an LLM, you might want 10-15 highly relevant chunks. Retrieving 10 with vector search alone often includes some marginal matches. Retrieving 30 and reranking to the top 10 gives you better precision.

That said, reranking adds latency and cost. The Cohere reranker (the default implementation) takes 100-300ms per call and has per-request pricing. For simple use cases where vector search already gives good results, reranking might not be worth it. Start without it, measure your retrieval quality, and add reranking when you need the precision boost.

Installing the reranker battery

Install the reranker battery using the CLI:

bunx unrag@latest add battery reranker

This copies the reranker module into your project at lib/unrag/rerank/, adds the required dependencies (ai and @ai-sdk/cohere), and updates your unrag.json to track the installed battery.

Setting up the Cohere API key

The default reranker uses Cohere's rerank-v3.5 model, which requires an API key. Add it to your environment:

COHERE_API_KEY=your-cohere-api-key

You can get an API key from the Cohere dashboard. They offer a free tier that's sufficient for development and testing.

Wiring the reranker into your config

After installing, you need to tell the engine about your reranker. Open unrag.config.ts and add the reranker configuration:

import { defineUnragConfig } from "./lib/unrag/core";
import { createCohereReranker } from "./lib/unrag/rerank";

export const unrag = defineUnragConfig({
  // ... your existing config
  engine: {
    // ... other engine config
    reranker: createCohereReranker(),
  },
} as const);

The createCohereReranker() function returns a reranker that uses Cohere's rerank-v3.5 model by default. You can customize it:

reranker: createCohereReranker({
  model: "rerank-english-v2.0",  // Use a different model
  maxDocuments: 500,             // Limit batch size (default: 1000)
})

Using reranking in your application

Once configured, call engine.rerank() after retrieval:

import { createUnragEngine } from "@unrag/config";

const engine = createUnragEngine();

// Step 1: Retrieve a larger set of candidates
const retrieved = await engine.retrieve({
  query: "how do I reset my password?",
  topK: 30,  // Retrieve more than you need
});

// Step 2: Rerank to get the most relevant results
const reranked = await engine.rerank({
  query: "how do I reset my password?",
  candidates: retrieved.chunks,
  topK: 8,  // Return only the top 8 after reranking
});

// Use the reranked results
for (const chunk of reranked.chunks) {
  console.log(chunk.content);
}

The rerank() method takes the same query you used for retrieval (important—the reranker scores candidates against this query) and the chunks from your retrieval results. It returns a new set of chunks in reranked order.

API Reference

This section documents the types for reranking inputs and outputs.

RerankInput

The input to engine.rerank():

type RerankInput = {
  query: string;
  candidates: RerankCandidate[];
  topK?: number;
  onMissingReranker?: "throw" | "skip";
  onMissingText?: "throw" | "skip";
  resolveText?: (candidate: RerankCandidate) => string | Promise<string>;
};

Prop

Type

RerankResult

The output from engine.rerank():

type RerankResult = {
  chunks: RerankCandidate[];
  ranking: RerankRankingItem[];
  meta: { rerankerName: string; model?: string };
  durations: { rerankMs: number; totalMs: number };
  warnings: string[];
};

Prop

Type

RerankRankingItem

Each item in the ranking array:

type RerankRankingItem = {
  index: number;
  rerankScore?: number;
};

Field	Type	Description
`index`	`number`	Original index into the candidates array
`rerankScore`	`number \| undefined`	Score assigned by the reranker (if available). Higher scores indicate more relevance.

The ranking array contains the complete ranking of all candidates, not just the top K. This is useful for debugging and evaluation—you can see exactly how the reranker reordered your results:

for (const item of result.ranking) {
  console.log(`Candidate ${item.index}: score ${item.rerankScore}`);
}

RerankCandidate

A chunk with its retrieval score, used as input to reranking:

type RerankCandidate = Chunk & { score: number };

This is the same type returned by engine.retrieve(), so you can pass retrieval results directly to rerank.

Reranker

The interface that reranker implementations must satisfy:

type Reranker = {
  name: string;
  rerank: (args: RerankerRerankArgs) => Promise<RerankerRerankResult>;
};

type RerankerRerankArgs = {
  query: string;
  documents: string[];
};

type RerankerRerankResult = {
  order: number[];
  scores?: number[];
  model?: string;
};

Prop

Type

CohereRerankerConfig

Configuration for the default Cohere reranker:

type CohereRerankerConfig = {
  model?: string;
  apiKey?: string;
  baseUrl?: string;
  maxDocuments?: number;
};

Prop

Type

Handling edge cases

Reranking requires text content to score candidates against the query. If you've configured Unrag with storage.storeChunkContent: false (storing only embeddings, not text), the reranker won't have access to the chunk content it needs.

By default, engine.rerank() throws an error if it encounters a candidate with empty content:

// This will throw if any candidate has empty content
const result = await engine.rerank({
  query: "test",
  candidates: retrieved.chunks,
});
// Error: Candidate 3 (id=...) has empty content. Enable 'storeChunkContent' 
// in engine config, provide 'resolveText' hook, or use 'onMissingText: "skip"'.

You have three options for handling this:

Option 1: Store chunk content (recommended)

The simplest solution is to ensure storage.storeChunkContent: true in your engine config. This is the default, so unless you explicitly disabled it, your chunks should have content.

Option 2: Provide a resolveText hook

If you store content externally (in S3, a CMS, etc.), provide a function that fetches the text:

const result = await engine.rerank({
  query: "password reset",
  candidates: retrieved.chunks,
  resolveText: async (candidate) => {
    // Fetch from your external store
    const doc = await contentStore.get(candidate.documentId);
    return doc.sections[candidate.index].text;
  },
});

The resolveText hook is only called for candidates that have empty content. If a candidate already has content, the hook is skipped.

Option 3: Skip candidates with missing text

If some candidates might legitimately have no text (like image-only chunks), you can skip them:

const result = await engine.rerank({
  query: "password reset",
  candidates: retrieved.chunks,
  onMissingText: "skip",  // Skip instead of throwing
});

// Check warnings to see what was skipped
if (result.warnings.length > 0) {
  console.log("Skipped candidates:", result.warnings);
}

Skipped candidates are appended to the end of the ranking in their original order. If you're taking the top K, they typically won't appear in your final results unless most candidates were skipped.

When no reranker is configured

If you call engine.rerank() without configuring a reranker, it throws by default:

// Throws: Reranker not configured. Install the reranker battery...
const result = await engine.rerank({
  query: "test",
  candidates: retrieved.chunks,
});

For graceful degradation (useful during development or gradual rollout), pass onMissingReranker: "skip":

const result = await engine.rerank({
  query: "test",
  candidates: retrieved.chunks,
  onMissingReranker: "skip",  // Return original order instead of throwing
});

// result.chunks is in the same order as input
// result.meta.rerankerName is "none"
// result.warnings includes a note about the missing reranker

This lets you write reranking code that works regardless of whether a reranker is configured, which is helpful for shared code paths or feature flags.

Building a custom reranker

The Cohere reranker works well out of the box, but you might want to use a different service or model. The reranker battery includes a createCustomReranker function for this:

import { createCustomReranker } from "./lib/unrag/rerank";

const myReranker = createCustomReranker({
  name: "my-reranker",
  rerank: async ({ query, documents }) => {
    // Call your reranking service
    const response = await myRerankerApi.rerank({
      query,
      documents,
    });
    
    // Return indices in relevance order (best first)
    // and optionally the scores
    return {
      order: response.ranking.map(r => r.documentIndex),
      scores: response.ranking.map(r => r.score),
      model: "my-model-v1",
    };
  },
});

The reranker interface is simple: you receive the query and an array of document texts, and you return the reordered indices. The order array should contain indices into the original documents array, sorted by relevance (most relevant first).

Using an LLM for reranking

You can even use a language model as a reranker. This is more expensive but can be very effective for specialized domains:

import { createCustomReranker } from "./lib/unrag/rerank";
import { generateText } from "ai";

const llmReranker = createCustomReranker({
  name: "llm-reranker",
  rerank: async ({ query, documents }) => {
    // Score each document with the LLM
    const scores = await Promise.all(
      documents.map(async (doc, i) => {
        const result = await generateText({
          model: "openai/gpt-4o-mini",
          messages: [{
            role: "user",
            content: `Rate how well this document answers the query on a scale of 0-10.
                Query: ${query}
                Document: ${doc}
                Reply with just the number.`,
          }],
        });
        return { index: i, score: parseFloat(result.text) || 0 };
      })
    );
    
    // Sort by score descending
    scores.sort((a, b) => b.score - a.score);
    
    return {
      order: scores.map(s => s.index),
      scores: scores.map(s => s.score),
      model: "gpt-4o-mini-reranker",
    };
  },
});

This is slower and more expensive than Cohere, but it gives you complete control over the relevance judgment. You can customize the prompt for your domain, add few-shot examples, or use a fine-tuned model.

Performance considerations

Reranking adds latency to your retrieval pipeline. The Cohere reranker typically takes 100-300ms depending on the number of candidates and their lengths. Here are some ways to manage this:

Tune your candidate count. Retrieving 30 candidates and reranking to 10 gives good results for most use cases. Retrieving 100+ candidates increases rerank time without proportional quality improvement—the best results are usually in the initial top 30.

Consider async reranking. If your UI can show initial results and update them, you can return vector search results immediately and rerank in the background:

// Return fast initial results
const initial = await engine.retrieve({ query, topK: 10 });
res.write(JSON.stringify({ chunks: initial.chunks, isReranked: false }));

// Rerank in background and send update
const reranked = await engine.rerank({
  query,
  candidates: initial.chunks,
  topK: 10,
});
res.write(JSON.stringify({ chunks: reranked.chunks, isReranked: true }));

Cache rerank results. If the same queries appear frequently, cache the reranked results. The cache key should include both the query and the candidate IDs (since candidates change as content is updated).

Skip reranking for simple queries. If your analytics show that certain query patterns get good results from vector search alone, you can skip reranking for those cases. This is especially useful for very specific queries where the top result is usually correct.

Typical workflow

Here's a complete example of a search endpoint with reranking:

// app/api/search/route.ts
import { createUnragEngine } from "@unrag/config";
import { NextResponse } from "next/server";

export async function GET(request: Request) {
  const { searchParams } = new URL(request.url);
  const query = searchParams.get("q");
  
  if (!query) {
    return NextResponse.json({ error: "Missing query" }, { status: 400 });
  }
  
  const engine = createUnragEngine();
  
  // Retrieve more candidates than we need
  const retrieved = await engine.retrieve({
    query,
    topK: 30,
  });
  
  // Rerank to get the best results
  const reranked = await engine.rerank({
    query,
    candidates: retrieved.chunks,
    topK: 8,
    onMissingReranker: "skip",  // Graceful fallback
  });
  
  return NextResponse.json({
    results: reranked.chunks.map(chunk => ({
      content: chunk.content,
      sourceId: chunk.sourceId,
      score: chunk.score,
    })),
    meta: {
      reranked: reranked.meta.rerankerName !== "none",
      model: reranked.meta.model,
      timings: {
        retrieveMs: retrieved.durations.totalMs,
        rerankMs: reranked.durations.rerankMs,
      },
    },
  });
}

Reranker

When reranking helps

Installing the reranker battery

Setting up the Cohere API key

Wiring the reranker into your config

Using reranking in your application

API Reference

RerankInput

RerankResult

RerankRankingItem

RerankCandidate

Reranker

CohereRerankerConfig

Handling edge cases

Option 1: Store chunk content (recommended)

Option 2: Provide a resolveText hook

Option 3: Skip candidates with missing text

When no reranker is configured

Building a custom reranker

Using an LLM for reranking

Performance considerations

Typical workflow

Next steps

Performance

Build a Search Endpoint

RAG Handbook: Reranking

On this page

Complete RAG Handbook

Reranker

RerankerRerankResult fields

Performance

Build a Search Endpoint

RAG Handbook: Reranking

On this page

Complete RAG Handbook