awesome-everything RU
↑ Back to the climb

Data Engineering

Vector search: catch the silent recall drop

Crux Hands-on project — build a pgvector RAG search, expose its silent recall drop with an exact baseline, then tune ef_search, fix post-filtering, and add hybrid search, proving each step with recall numbers.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at senior altitude — in orbit
◷ 240 min

Reading about silent recall is not the same as catching it. Build a small pgvector search, measure its recall against an exact baseline, then watch the number move as you turn ef_search, fix a selective filter, and bolt on hybrid search — with evidence at every step.

Goal

Turn the unit’s mental model into a reproducible engineering loop: build vector search, measure recall@k against ground truth, expose the silent failure modes, and fix them with the right lever, proving each change with before/after recall.

Project
0 of 7
Objective

Build a pgvector-backed semantic search over a real document corpus, then prove with measured recall@10 that you can detect and fix the silent failure modes — low ef_search, post-filtered selective filters, and missing exact-match support — without ever relying on row count or latency to tell you something is wrong.

Requirements
Acceptance criteria
  • A before/after table of recall@10 and p99 latency at the default ef_search vs the tuned value, measured against the exact baseline, not estimated.
  • Evidence that the filtered-query recall collapsed under post-filtering and was restored by iterative scan, with the latency cost recorded.
  • A short list of exact-token queries where hybrid RRF beats pure vector search on recall@10, with the numbers.
  • A one-paragraph write-up stating the recall SLO, the ef_search you chose, and why recall@k against an exact baseline — not latency or row count — is the metric you alert on.
Senior stretch
  • Add an on-call runbook: how to detect a silent recall drop, the recall@k harness command, the ef_search / probes / iterative_scan levers, and a verification checklist.
  • Compare HNSW against IVFFlat on the same corpus: measure recall drift after inserting 20% more rows without reindexing, showing IVF centroids go stale while HNSW holds.
  • Add IVF-PQ (or an external store like Qdrant/Faiss) and measure the recall loss from quantization, then recover it with a full-precision rerank step on the top candidates.
  • Wire a CI gate that runs the recall@10 harness on a canary and fails the build if recall drops more than 2 points against main.
Recap

This is the loop you will run in every real vector-search incident: build the index with a matching metric, measure recall@k against an exact baseline, expose the silent failures (default ef_search, post-filtered selective filters, missing exact-match support), and fix each with the right lever — dial ef_search, switch on iterative scan, add hybrid RRF — verifying every change with measured recall, never row count or latency. Doing it once on a real corpus makes the production version muscle memory.

Continue the climb ↑Putting it together: the system breaks at the seams, not the stores
shortcuts expand
search
K
prev piece
k
next piece
j
cycle tier
t
this menu
?
sources2
expand
  1. 01
  2. 02

Trademarks belong to their respective owners. Editorial reference only.