Crux Read real RAG pipeline code — a chunker, a cosine-similarity search, a retrieve-then-rerank stage, and a context assembler — and pick the highest-leverage fix.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at senior altitude — in orbit
◷ 14 min
RAG bugs hide in code that runs without error and returns plausible text. Read each snippet the way you’d read it in review, then pick the defect a senior catches before it ships a confident wrong answer.
Goal
Practise the review loop for a RAG pipeline: spot the chunking flaw, the similarity-math bug, the missing rerank stage, and the context-assembly mistake that quietly tanks answer quality.
Snippet 1 — the chunker
def chunk(text, size=1000): # split on a fixed character count, no overlap return [text[i:i + size] for i in range(0, len(text), size)]
Quiz
Completed
What is the highest-leverage problem with this chunker for a prose corpus, and the first fix?
Heads-up One giant chunk dilutes the embedding so badly that narrow-query recall collapses — one vector now averages the whole document. Bigger isn't the fix; respecting semantic boundaries with overlap is.
Heads-up Splitting on raw bytes can break multi-byte characters and makes things worse. The real defect is severing meaning at arbitrary offsets with no overlap, not the character-vs-byte unit.
Heads-up Zero overlap guarantees boundary facts get split; ~10–15% overlap is the cheap insurance the unit calls for, and splitting on semantic units preserves whole ideas.
Snippet 2 — the similarity score
import numpy as npdef score(query_vec, chunk_vec): # rank candidates by this value return np.dot(query_vec, chunk_vec)
Quiz
Completed
Ranking by raw dot product instead of cosine similarity is risky when the embeddings are not normalised. Why, and what's the fix?
Heads-up Dot product is well-defined at any dimensionality. The issue is that without normalisation it conflates magnitude with relevance, not the number of dimensions.
Heads-up They agree only when all vectors share the same norm. On un-normalised embeddings, magnitude leaks into the score and can reorder results; normalise first to make them equivalent.
Heads-up Dot product is the cheap operation ANN indexes are built on. The defect is correctness on un-normalised vectors, and the fix is normalisation/cosine — not swapping the metric for speed.
Snippet 3 — retrieve and rerank
def answer(query): qv = embed(query) candidates = vector_store.search(qv, k=3) # top-3 by cosine context = "\n".join(c.text for c in candidates) return llm(f"Answer using:\n{context}\n\nQ: {query}")
Quiz
Completed
A senior flags this for low answer quality on hard queries. Which change has the most leverage?
Heads-up That fixes recall but wrecks precision: 50 chunks bloat tokens and cost and bury the decisive one in the low-attention middle. Wide retrieval needs a narrowing rerank stage.
Heads-up The bottleneck is what k=3 retrieves, not generation. If the right chunk wasn't in the top-3, a bigger model and window can't reason over text that was never fetched.
Heads-up k=1 maximises brittleness — a single ANN miss yields zero relevant context and a guaranteed hallucination. The pattern is retrieve wide, then rerank narrow, not retrieve narrow.
Snippet 4 — context assembly
def assemble(reranked, query): # reranked is sorted best-first blocks = [c.text for c in reranked] # best chunk first, rest after context = "\n\n".join(blocks) return f"{context}\n\nQuestion: {query}"
Quiz
Completed
Given a long assembled context, what does 'lost in the middle' say about this ordering, and how should you assemble instead?
Heads-up Reranking sets which chunks you keep, but position still matters: attention is U-shaped, so even relevant text in the middle gets under-used. Order best evidence at the edges.
Heads-up The center is the least-attended region. Dead-center placement is the worst spot for the decisive chunk; the edges are best.
Heads-up Models attend well to the start, so you'd be wasting the strongest position on the worst chunk. Strong evidence belongs at the start and end, not the weakest.
Recap
Every RAG bug is read in code: fixed-size zero-overlap chunking severs facts at boundaries; ranking by raw dot product on un-normalised vectors lets magnitude beat meaning, so use cosine; a single k=3 retrieve with no rerank is one ANN miss away from a hallucination, so retrieve wide then rerank narrow; and assembly order matters because attention is U-shaped — keep few chunks and place the strongest evidence at the edges. Fix the pipeline stage with the most leverage, then re-evaluate on a held-out set to confirm.