Introduction

Turn candidates into useful context - reranking, compression, packing, citations, and conflict handling.

Welcome to the Reranking and Context Module

Retrieval gives you candidates. What you do with those candidates determines whether your RAG system produces good answers. This module covers the critical stage between retrieval and generation: reranking candidates to surface the most relevant ones, compressing context to fit token budgets, packing information for effective LLM consumption, and handling the messy realities of conflicting or stale sources.

The techniques in this module directly impact answer quality. A well-reranked, properly compressed context produces better answers than dumping raw retrieval results into a prompt. These steps are where you refine signal from noise.

What you'll learn in this module

By the end of this module, you will understand:

Two-stage retrieval architecture: Why fast retrieval followed by expensive reranking outperforms either alone.
Cross-encoder rerankers: How they work, when to use them, and the latency/quality tradeoffs.
LLM-based reranking: Using language models for relevance judgments when cross-encoders aren't available or sufficient.
Context compression: Extractive and abstractive techniques for fitting more signal into limited token budgets.
Context packing and citations: Structuring context for LLM consumption and making answers traceable to sources.
Handling conflicts and staleness: Policies for when sources disagree or information is outdated.

Introduction

Welcome to the Reranking and Context Module

What you'll learn in this module

Chapters in this module

Two-stage retrieval

Cross-encoder rerankers

LLM reranking

Context compression

Context packing and citations

Conflicts, staleness, and freshness

Ready to begin?

Next: Two-stage retrieval

On this page