AI / LLM Integration AI · 03 · 09

RAG architecture: code and pipeline reading

Read real RAG pipeline code — a chunker, a cosine-similarity search, a retrieve-then-rerank stage, and a context assembler — and pick the highest-leverage fix.

AI Senior ◷ 14 min

Level

FoundationsJuniorMiddleSenior

RAG bugs hide in code that runs without error and returns plausible text. Read each snippet the way you’d read it in review, then pick the defect a senior catches before it ships a confident wrong answer.

Goal

Practise the review loop for a RAG pipeline: spot the chunking flaw, the similarity-math bug, the missing rerank stage, and the context-assembly mistake that quietly tanks answer quality.

Snippet 1 — the chunker

def chunk(text, size=1000):
    # split on a fixed character count, no overlap
    return [text[i:i + size] for i in range(0, len(text), size)]

Quiz

What is the highest-leverage problem with this chunker for a prose corpus, and the first fix?

Snippet 2 — the similarity score

import numpy as np

def score(query_vec, chunk_vec):
    # rank candidates by this value
    return np.dot(query_vec, chunk_vec)

Quiz

Ranking by raw dot product instead of cosine similarity is risky when the embeddings are not normalised. Why, and what's the fix?

Snippet 3 — retrieve and rerank

def answer(query):
    qv = embed(query)
    candidates = vector_store.search(qv, k=3)   # top-3 by cosine
    context = "\n".join(c.text for c in candidates)
    return llm(f"Answer using:\n{context}\n\nQ: {query}")

Quiz

A senior flags this for low answer quality on hard queries. Which change has the most leverage?

Snippet 4 — context assembly

def assemble(reranked, query):
    # reranked is sorted best-first
    blocks = [c.text for c in reranked]          # best chunk first, rest after
    context = "\n\n".join(blocks)
    return f"{context}\n\nQuestion: {query}"

Quiz

Given a long assembled context, what does 'lost in the middle' say about this ordering, and how should you assemble instead?

Recap

Every RAG bug is read in code: fixed-size zero-overlap chunking severs facts at boundaries; ranking by raw dot product on un-normalised vectors lets magnitude beat meaning, so use cosine; a single k=3 retrieve with no rerank is one ANN miss away from a hallucination, so retrieve wide then rerank narrow; and assembly order matters because attention is U-shaped — keep few chunks and place the strongest evidence at the edges. Fix the pipeline stage with the most leverage, then re-evaluate on a held-out set to confirm.

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.