AI / LLM Integration AI · 03 · 07

RAG architecture: multiple-choice review

Multiple-choice synthesis across the RAG unit — chunking, embedding cost, wide-then-narrow retrieval, lost-in-the-middle, and the confident-hallucination failure mode.

AI Senior ◷ 13 min

Level

FoundationsJuniorMiddleSenior

Six questions that cut across the whole pipeline. None is a definition to recite — each is a decision you make while a RAG system is silently returning confident, wrong answers in production.

Goal

Confirm you can connect chunking, embedding cost, two-stage retrieval, context assembly, and the retrieval-driven failure mode into one diagnosis — the synthesis the unit built toward.

Quiz

A support bot confidently reports a specific but wrong Q3 churn number; the answer survived review for a month. Where does the fault most often lie, and why does it stay invisible?

Quiz

A policy doc states a rule whose exception lives one paragraph later. Queries about the exception return half-truths. Which chunking change is the right first move?

Quiz

Index search p99 is too high at 3072 dims and storage cost is climbing. What's the senior trade before changing infrastructure?

Quiz

Why is 'retrieve wide (k=20–50) then rerank narrow to 3–8 with a cross-encoder' better than a single embedding top-3?

Quiz

You have a 128k-token window, so you stuff all 40 retrieved chunks in retrieval order and put the decisive one near the middle. What does the literature predict?

Quiz

A RAG bot that worked last week now returns last quarter's policy with no code change and no error. Most likely cause, and the structural fix?

Recap

The unit’s through-line is one diagnosis: retrieval, not generation, is where production RAG fails. Chunking sets the ceiling (size to the answer-bearing unit, overlap so seams survive); embedding dimensionality is a real cost lever you can truncate; the recall-vs-precision split is solved by retrieve-wide-then-rerank-narrow; context ordering must dodge lost-in-the-middle by placing best evidence at the edges; and stale or poisoned indexes turn a silent miss into a confident wrong answer — which you defend against with a score gate, freshness, and an instruction to answer only from context or say “I don’t know.”

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.