Data Engineering DATA · 06 · 08

Full-text search: free-recall review

Free-recall prompts across the search unit. Answer each in your own words first, then reveal the model answer and compare against the unit's spine.

DATA Senior ◷ 13 min

Level

FoundationsJuniorMiddleSenior

Retrieval beats re-reading. For each prompt, say or write a full answer from memory before you open the model answer — the effort of recall is what turns the unit’s ideas into something you can apply under pressure.

Goal

Reconstruct the unit’s core mechanisms — the inverted index, the analysis pipeline and its parity rule, BM25 and its knobs, the Postgres-vs-engine decision, and the operational traps — without looking back at the lesson.

Recall before you leave

01
Why can LIKE '%term%' never be search, and what two distinct problems does full-text search solve in its place?
02
Describe the inverted index and what makes a query fast on it regardless of corpus size.
03
What does an analyzer do, and why must the same analyzer run at index time and query time?
04
Explain why search moved from TF-IDF to BM25, and what the k1 and b knobs control.
05
When is Postgres tsvector/GIN the right default, what pushes you to a dedicated engine, and how do you choose GIN vs GiST inside Postgres?
06
What does 'near-real-time' mean for a dedicated engine, and why must you design behind a read alias from day one?

Recap

If you could reconstruct each answer from memory, you hold the unit’s spine: LIKE fails at both finding and ranking, the inverted index makes finding a dictionary lookup, the analysis pipeline decides what a term is and its parity rule is non-negotiable, BM25 saturates term frequency and normalizes length so the useful docs surface, Postgres GIN is the right default until facets/fuzziness/scale push you to a dedicated engine, and near-real-time refresh plus the immutable-tokens reindex are why you build behind a read alias from the start. Now when a search feature goes wrong in production, you can trace the failure to its layer — tokenization, index structure, scoring, or consistency — without guessing.

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.