Crux Read real pgvector DDL and queries — predict the recall/latency behaviour and pick the highest-leverage fix for HNSW params, distance operators, and filtered search.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at senior altitude — in orbit
◷ 14 min
Vector-search bugs live in the DDL and the query, not in an exception. Read the SQL, predict the recall and latency it produces, then choose the fix a senior engineer makes first.
Goal
Practise the loop you run on every recall incident: read the index definition and the query, predict where recall leaks or latency spikes, and reach for the highest-leverage fix.
Snippet 1 — the HNSW index and the default knob
CREATE INDEX ON docs USING hnsw (embedding vector_cosine_ops) WITH (m = 16, ef_construction = 64);-- query path, run per requestSELECT id, body FROM docsORDER BY embedding <=> $1LIMIT 10;
Quiz
Completed
The index builds fine and queries are fast, but recall@10 measures only ~80%. What is the highest-leverage fix?
Heads-up A higher m raises recall and memory but requires a full rebuild and still leaves ef_search at 40. The cheap, immediate lever is the runtime ef_search, not a costly rebuild.
Heads-up <=> is cosine distance and <-> is L2; the index was built with vector_cosine_ops, so changing the operator at query time mismatches the opclass. The recall gap is from ef_search, not the operator.
Heads-up Adding data does not raise recall on existing queries — it enlarges the search space. Recall@10 is governed by ef_search and graph quality, not corpus size.
Snippet 2 — the distance operator vs the opclass
-- embeddings stored from a model trained on cosine similarityCREATE INDEX ON items USING hnsw (embedding vector_l2_ops);SELECT id FROM itemsORDER BY embedding <=> $1 -- cosine operatorLIMIT 10;
Quiz
Completed
The query is slow and recall is poor. What is wrong here?
Heads-up The operators have comparable cost; the slowness is the index being unusable for this operator, forcing a sequential scan over every row.
Heads-up pgvector supports cosine (<=>), L2 (<->), and inner product (<#>). The defect is the mismatch between the query operator and the index opclass, not missing support.
Heads-up LIMIT does not gate index usage. The index is skipped because its opclass does not match the query's distance operator.
Snippet 3 — IVFFlat probes
CREATE INDEX ON docs USING ivfflat (embedding vector_cosine_ops) WITH (lists = 1000);-- querySELECT id FROM docs ORDER BY embedding <=> $1 LIMIT 10;-- ivfflat.probes left at its default
Quiz
Completed
With lists = 1000 and probes left at the default, recall is far below expectation. Why, and what is the first move?
Heads-up A larger lists value gives finer clusters and can help recall once probes is raised; with probes = 1 the symptom is scanning one bucket, not too many buckets.
Heads-up IVFFlat supports cosine via vector_cosine_ops, used here. The recall loss is from probes = 1 scanning a single cluster.
Heads-up Reindexing is for centroid drift over time, not per query. The immediate fix is raising probes so more clusters are scanned.
Snippet 4 — measuring recall against ground truth
-- exact ground truth (no index hint, brute force)SELECT id FROM docs ORDER BY embedding <=> $1 LIMIT 10; -- with index disabled-- ANN resultSET hnsw.ef_search = 100;SELECT id FROM docs ORDER BY embedding <=> $1 LIMIT 10; -- with HNSW-- recall@10 = |exact_ids ∩ ann_ids| / 10
Quiz
Completed
A teammate computes recall by checking the ANN result's average distance against a fixed threshold instead of the overlap with an exact top-10. Why is that wrong?
Heads-up A distance threshold measures how close the returned items are, not whether they are the right items. Two different vectors can both sit under a threshold while one is a true neighbor and one is not.
Heads-up Latency and recall are different axes. The whole point is that low recall is latency-invisible, so you must measure recall directly via ID overlap.
Heads-up Recall is measured by comparing the ANN ID set to an exact ORDER BY scan in the same database; no model call is needed for the ground-truth comparison.
Recap
Every recall incident is read in the DDL and the query: ef_search defaults to 40 and is the runtime recall dial; the query’s distance operator must match the index opclass or the planner falls back to a full scan; IVFFlat’s probes defaults to 1 and scans a single cluster until you raise it; and recall@k is the ID-set overlap against an exact baseline, never a distance threshold. Read the SQL, find the leak, turn the cheapest dial first, then re-measure recall to confirm.