Data Engineering DATA · 07 · 10

Vector search: catch the silent recall drop

Hands-on project — build a pgvector RAG search, expose its silent recall drop with an exact baseline, then tune ef_search, fix post-filtering, and add hybrid search, proving each step with recall numbers.

DATA Senior ◷ 240 min

Level

FoundationsJuniorMiddleSenior

Reading about silent recall is not the same as catching it. Build a small pgvector search, measure its recall against an exact baseline, then watch the number move as you turn ef_search, fix a selective filter, and bolt on hybrid search — with evidence at every step.

Goal

Turn the unit’s mental model into a reproducible engineering loop: build vector search, measure recall@k against ground truth, expose the silent failure modes, and fix them with the right lever, proving each change with before/after recall.

Project

0 of 7

Objective

Build a pgvector-backed semantic search over a real document corpus, then prove with measured recall@10 that you can detect and fix the silent failure modes — low ef_search, post-filtered selective filters, and missing exact-match support — without ever relying on row count or latency to tell you something is wrong.

Requirements

Acceptance criteria

A before/after table of recall@10 and p99 latency at the default ef_search vs the tuned value, measured against the exact baseline, not estimated.
Evidence that the filtered-query recall collapsed under post-filtering and was restored by iterative scan, with the latency cost recorded.
A short list of exact-token queries where hybrid RRF beats pure vector search on recall@10, with the numbers.
A one-paragraph write-up stating the recall SLO, the ef_search you chose, and why recall@k against an exact baseline — not latency or row count — is the metric you alert on.

Senior stretch

Add an on-call runbook: how to detect a silent recall drop, the recall@k harness command, the ef_search / probes / iterative_scan levers, and a verification checklist.
Compare HNSW against IVFFlat on the same corpus: measure recall drift after inserting 20% more rows without reindexing, showing IVF centroids go stale while HNSW holds.
Add IVF-PQ (or an external store like Qdrant/Faiss) and measure the recall loss from quantization, then recover it with a full-precision rerank step on the top candidates.
Wire a CI gate that runs the recall@10 harness on a canary and fails the build if recall drops more than 2 points against main.

Recap

This is the loop you will run in every real vector-search incident: build the index with a matching metric, measure recall@k against an exact baseline, expose the silent failures (default ef_search, post-filtered selective filters, missing exact-match support), and fix each with the right lever — dial ef_search, switch on iterative scan, add hybrid RRF — verifying every change with measured recall, never row count or latency. Doing it once on a real corpus makes the production version muscle memory.

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.