RAG Handbook

A practical guide to building retrieval-augmented generation systems that work in production.

This guide aims to help beginners learning how to build RAG systems as well as teams deploying production ready RAG applications at scale. This guide is framework and tooling agnostic and only delves into the ideas and philosophies of a RAG system.

Most teams building with LLMs eventually need to ground the model's responses in their own data. The standard approach is Retrieval-Augmented Generation: you search your content for relevant information, inject it into the model's context, and generate an answer that's informed by what you actually know rather than what the model was trained on.

The concept is simple. The execution is where things get interesting.

This handbook walks through RAG from first principles to production operations. It covers the decisions you'll make (chunking strategies, embedding models, retrieval algorithms, reranking, prompt design), the tradeoffs behind those decisions (quality vs latency vs cost), and the failure modes you'll encounter in practice (false positives, hallucinations, stale data, permission leaks).

The goal is to help you understand RAG deeply enough to build systems that actually work, debug them when they don't, and improve them over time.

Who this is for

This handbook assumes you're comfortable reading code and have some familiarity with LLMs, but it doesn't assume prior RAG experience. If you're building your first retrieval system, start from the beginning. If you're debugging a production system that's misbehaving, jump to the module that matches your problem.

The material progresses from foundational concepts through increasingly production-focused topics. Early modules explain what RAG is and how the pieces fit together. Later modules cover evaluation, security, cost control, and operational patterns that matter when real users depend on your system.

How to use this guide

The handbook is organized as a linear progression: Module 0 through Module 8, plus an appendix of reference material. Each module builds on concepts from earlier modules, so reading in order works well if you're learning RAG from scratch.

If you're looking for specific guidance, here are two common paths:

Learning path: Start with Module 0 (Orientation) and Module 1 (Foundations) to understand the core concepts. Then work through Module 2 (Data and Ingestion) and Module 3 (Chunking) to understand how content gets into the system. Module 4 (Retrieval) covers how to get it back out. You can skim the later modules and return when you need them.

Production debugging path: If you're already running a RAG system and hitting problems, start with Module 0 to calibrate vocabulary, then jump to Module 7 (Evaluation) to set up measurement. From there, use the evaluation results to identify which module addresses your specific issues (chunking, retrieval, reranking, or generation).

About the examples

When examples appear, they're written to illustrate the general concept rather than any specific framework. When an example uses Unrag specifically, it's marked as one concrete implementation of the pattern. You should be able to apply the same ideas with any RAG stack.

Building with Unrag? Check out the practical examples and guides alongside the handbook.

Debugging production issues? Jump to Module 7: Evaluation to set up measurement, then use results to identify which module addresses your specific problem.

RAG Handbook

Who this is for

How to use this guide

About the examples

Module overview

Module 0: Orientation

Module 1: Foundations

Module 2: Data and Ingestion

Module 3: Chunking

Module 4: Retrieval

Module 5: Reranking

Module 6: Generation

Module 7: Evaluation

Module 8: Production

Appendix

Quick navigation

On this page