Data Engineering
Vector search: catch the silent recall drop
Reading about silent recall is not the same as catching it. Build a small pgvector search, measure its recall against an exact baseline, then watch the number move as you turn ef_search, fix a selective filter, and bolt on hybrid search — with evidence at every step.
Turn the unit’s mental model into a reproducible engineering loop: build vector search, measure recall@k against ground truth, expose the silent failure modes, and fix them with the right lever, proving each change with before/after recall.
Build a pgvector-backed semantic search over a real document corpus, then prove with measured recall@10 that you can detect and fix the silent failure modes — low ef_search, post-filtered selective filters, and missing exact-match support — without ever relying on row count or latency to tell you something is wrong.
- A before/after table of recall@10 and p99 latency at the default ef_search vs the tuned value, measured against the exact baseline, not estimated.
- Evidence that the filtered-query recall collapsed under post-filtering and was restored by iterative scan, with the latency cost recorded.
- A short list of exact-token queries where hybrid RRF beats pure vector search on recall@10, with the numbers.
- A one-paragraph write-up stating the recall SLO, the ef_search you chose, and why recall@k against an exact baseline — not latency or row count — is the metric you alert on.
- Add an on-call runbook: how to detect a silent recall drop, the recall@k harness command, the ef_search / probes / iterative_scan levers, and a verification checklist.
- Compare HNSW against IVFFlat on the same corpus: measure recall drift after inserting 20% more rows without reindexing, showing IVF centroids go stale while HNSW holds.
- Add IVF-PQ (or an external store like Qdrant/Faiss) and measure the recall loss from quantization, then recover it with a full-precision rerank step on the top candidates.
- Wire a CI gate that runs the recall@10 harness on a canary and fails the build if recall drops more than 2 points against main.
This is the loop you will run in every real vector-search incident: build the index with a matching metric, measure recall@k against an exact baseline, expose the silent failures (default ef_search, post-filtered selective filters, missing exact-match support), and fix each with the right lever — dial ef_search, switch on iterative scan, add hybrid RRF — verifying every change with measured recall, never row count or latency. Doing it once on a real corpus makes the production version muscle memory.