Databases
Index design exercise: full-text search strategy
The team ships a search box. Users type in it. The query is WHERE title ILIKE '%invoice%'. In staging with 10k tasks this is fine. In production with 50M tasks it takes 12 seconds. ILIKE with a leading wildcard cannot use a B-tree. The fix is not “add an index on title.” It is choosing the right index type for the question being asked. That choice depends on whether users want keyword match, fuzzy match, or semantic match — and this lesson works through all three.
Why ILIKE cannot scale
WHERE title ILIKE '%term%' has a leading wildcard. B-tree indexes require a known prefix; they cannot answer “does this string contain the term anywhere.” Every row must be evaluated — this is always O(n).
The three scalable alternatives:
- GIN on tsvector — word-level inverted index; answers “which documents contain this word or phrase”; understands stemming and language rules.
- pg_trgm GIN — trigram decomposition; answers “which strings contain this substring or are similar to this string”; handles typos and partial input.
- pgvector HNSW — graph-based approximate nearest-neighbour on embeddings; answers “which documents are semantically similar to this query”; requires a model inference pipeline.
GIN on tsvector: the Postgres-native choice
-- Add a generated tsvector column (Postgres 12+)
ALTER TABLE tasks
ADD COLUMN tsv_search TSVECTOR
GENERATED ALWAYS AS (
to_tsvector('english', coalesce(title, '') || ' ' || coalesce(body, ''))
) STORED;
-- GIN index on the generated column
CREATE INDEX CONCURRENTLY idx_tasks_search
ON tasks USING GIN (tsv_search);
-- Query
SELECT id, title, ts_rank(tsv_search, query) AS rank
FROM tasks, to_tsquery('english', 'invoice & payment') AS query
WHERE tsv_search @@ query
ORDER BY rank DESC
LIMIT 20;Trade: GIN indexes are 2-5x the size of the column data. Each insert updates GIN posting lists for every lexeme in the document. For documents with 200 words, each insert touches ~200 GIN entries. fastupdate=on (the default) defers these updates — writes are fast, but the deferred buffer must eventually flush, causing periodic latency spikes. Use fastupdate=off for write-heavy tables where consistent latency matters more than peak throughput.
Does not handle: typos (invoce), substring matches inside words (inv), semantic similarity.
pg_trgm: fuzzy and substring search
The pg_trgm extension breaks strings into trigrams (3-character windows) and builds a GIN or GiST index over them. This enables:
LIKE '%term%'andILIKE '%term%'with index support- Similarity search (
% operator) for typo tolerance <->similarity distance ordering
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE INDEX CONCURRENTLY idx_tasks_title_trgm
ON tasks USING GIN (title gin_trgm_ops);
-- Now these use the index:
SELECT * FROM tasks WHERE title ILIKE '%invoice%';
SELECT * FROM tasks WHERE title % 'invoce'; -- similarity, handles typoTrade: trigram indexes are large (similar to GIN on tsvector or larger for short strings), and similarity queries are slower than exact-match GIN searches. Best for short strings (usernames, SKUs, titles) where fuzzy matching is the dominant use case.
pgvector HNSW: semantic search
Embeddings represent meaning numerically. Two semantically similar documents have embeddings that are close in vector space. HNSW (Hierarchical Navigable Small World) is a graph-based index for approximate nearest-neighbour search on high-dimensional vectors.
CREATE EXTENSION IF NOT EXISTS vector;
ALTER TABLE tasks ADD COLUMN embedding VECTOR(1536); -- e.g., OpenAI text-embedding-3-small
CREATE INDEX CONCURRENTLY idx_tasks_embedding_hnsw
ON tasks USING HNSW (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
-- Semantic search: find tasks semantically similar to a query embedding
SELECT id, title, 1 - (embedding <=> $1::vector) AS similarity
FROM tasks
ORDER BY embedding <=> $1::vector
LIMIT 20;Trade: HNSW indexes are large (vector dimension × row count; a 1,536-dimension embedding on 10M rows = ~60 GB). Writes are slow — graph rebalancing per insert. The index is approximate: recall@10 is typically 95-99%. An embedding pipeline (model inference per document and query) is required.
In 2026 production AI/ML systems, pgvector + HNSW is the canonical choice for semantic search when Postgres is the primary store. For scale or advanced feature requirements beyond pgvector’s reach (hundreds of millions of vectors, real-time filtering), dedicated vector databases (Pinecone, Weaviate, Milvus) are alternatives.
Hybrid: combining approaches
For most product search requirements, a hybrid approach combines exact-match and semantic results:
-- GIN tsvector for exact keyword ranking
-- HNSW for semantic similarity ranking
-- Application layer: merge and re-rank results
SELECT id, title, ts_rank(tsv_search, query) AS kw_rank, NULL AS sem_rank
FROM tasks, to_tsquery('english', $1) AS query
WHERE tsv_search @@ query
UNION ALL
SELECT id, title, NULL AS kw_rank, 1 - (embedding <=> $2::vector) AS sem_rank
FROM tasks
ORDER BY embedding <=> $2::vector
LIMIT 50;
-- Merge by id, sum ranks, re-sort, take top 20This is more complex to implement but provides both exact-match recall and semantic relevance.
A new full-text search feature on a 'documents' table (50M rows) needs an index. Pick the right strategy.
Which Postgres version introduced the INCLUDE clause in CREATE INDEX, enabling covering indexes without the included columns affecting the sort key?
A team adds an FK with ON DELETE CASCADE from comments(post_id) to posts(id) but does NOT index comments(post_id). What happens on DELETE FROM posts WHERE id = 42 if comments has 100M rows?
Design the complete index set for a ticketing system. Table: tasks (id BIGSERIAL PK, workspace_id BIGINT, project_id BIGINT, assignee_user_id BIGINT, status TEXT, priority SMALLINT, title TEXT, body TEXT, ticket_id TEXT, created_at TIMESTAMPTZ). Scale: 100M tasks; 80% done (cold), 15% open (hot), 5% in_progress (hot). Budget: total index storage under 20% of table size. Five hot queries listed below.
- Query A: list open/in_progress tasks in a project, ordered by priority then created_at.
- Query B: list open/in_progress tasks assigned to a specific user across all projects in a workspace.
- Query C: find task by ticket_id (unique per workspace).
- Query D: full-text search across task titles and bodies.
- Query E: find tasks created in the last 24 hours in a workspace.
- Partial WHERE status IN ('open','in_progress') cuts index size by 80% and makes writes to done tasks cheaper.
- INCLUDE columns cover the typical projection without bloating the sort key.
- A UNIQUE index on (workspace_id, ticket_id) doubles as both the constraint enforcement and the lookup index.
- GIN on a STORED GENERATED tsvector column is the Postgres-native full-text answer — no extra infrastructure.
- Budget accounting: sum index sizes; verify they are under 20% of table size; verify write overhead is acceptable.
- After design, drop redundant prefix indexes that composites now cover.
Why this works
Why does Postgres need six index types when most other databases default to one? Because the fundamental data structures are incompatible. B-tree requires a total order. JSONB documents have no total order. Geometric shapes require spatial predicates. Text search requires word-level inverted lists. Embedding similarity requires high-dimensional graph navigation. A single universal index structure would either be astronomically expensive or unable to express the right operations. The design trade — multiple specialized types, each fit for its data shape — keeps query performance practical at production scale.
- 01Explain why GIN tsvector is the default full-text search choice in Postgres, and what its limitations are.
- 02Design an index for a query that: filters by workspace_id and status, sorts by created_at DESC, and projects only id and title. Explain every choice.
- 03When would you choose pgvector HNSW over GIN tsvector for search, and what are the operational costs?
WHERE title ILIKE '%term%' is always O(n) — leading wildcards defeat B-tree. Scalable alternatives: GIN on a tsvector STORED GENERATED column for keyword full-text (stemming, ts_rank, no extra infrastructure); pg_trgm GIN for fuzzy and substring matching (typo tolerance, ILIKE-anywhere); pgvector HNSW for semantic embedding search (natural language, requires model inference pipeline).
For a ticketing system at 100M rows, the deliberate index set: two partial composites (WHERE status IN (‘open’,‘in_progress’)) for the hot 20% of data, covering the open-tasks dashboard queries; a UNIQUE composite for per-tenant ticket_id; a GIN tsvector for full-text; a full-table composite for the recent-24h feed — total under 20% of table size, all within write-overhead budget.
The INCLUDE clause (Postgres 11+) adds projection columns to index leaves without affecting the sort key, enabling index-only scans for typical list projections. Partial indexes cut size proportionally to the selectivity of the WHERE clause — the most underused performance lever in production Postgres schemas.
appears again in174
- The journey of a request: seven stops from socket to responsejunior
- Accept and parse: from kernel queue to a typed requestmiddle
- Routing and middleware: choosing what runs, and in what ordermiddle
- Handler and response: from business logic to bytes on the wiremiddle
- Streaming and backpressure: when the client reads slower than you writesenior
- Timeouts and tail latency: budgets, deadlines, and the fan-out trapsenior
- Middleware and DI: the two patterns that shape every backendjunior
- Writing middleware: signatures, next(), and the three framework modelsmiddle
- Inversion of control: how dependencies reach a classmiddle
- DI scopes and lifecycles: singleton, request, transientmiddle
- DI as a testing seam: fakes, mocks, and the boundary that matterssenior
- DI containers in production: resolution graphs, circular deps, and when not tosenior
- Blocking vs non-blocking I/O: two ways to waitjunior
- The event loop: one thread, ordered phasesmiddle
- What blocks the loop: CPU work and sync callsmiddle
- Offloading CPU work: worker threads and the libuv poolmiddle
- Backpressure and bounded concurrencysenior
- Throughput under load: tail latency and saturationsenior
- Why pool: the cost of creating a connectionjunior
- Pool sizing: why bigger is not fastermiddle
- Acquisition and timeouts: the wait queue is the real latency dialmiddle
- Retry strategies: backoff, jitter, and thundering herdmiddle
- Observability, production failures, and global-scale designsenior
- Tasks, microtasks, and scheduler.yield()middle
- Timer accuracy, throttling, and idle workmiddle
- Node.js event loop: phases, nextTick, and loop lagsenior
- Rendering strategies: SSG, SSR, ISR, streaming, and hydrationjunior
- SSG, SSR, ISR, streaming, and RSC — how each worksmiddle
- Hydration cost: selective, progressive, islands, resumabilitymiddle
- Core Web Vitals: what LCP, INP, and CLS measurejunior
- LCP: four phases, one dominant costmiddle
- INP: input delay, processing, presentationmiddle
- Lab vs field: why the two disagree and how to use eachmiddle
- Metric tradeoffs, RUM attribution, and the CI+field loopsenior
- The full picture: URL to LCP to INP as a relay racejunior
- Eight layers traced: from the service worker to the second navigationmiddle
- Five canonical breaks: where production reliably diessenior
- The three-track method: reading traces and building a monitored systemsenior
- Bits on the wirejunior
- Latency mathmiddle
- Bufferbloat and congestionsenior
- The physical frontiersenior
- Sequence numbers and connection statemiddle
- Flow control and congestion controlmiddle
- BBR, production observability, and beyond TCPsenior
- CDN: putting content next doorjunior
- Anycast and GeoDNS: routing to the nearest edgemiddle
- Tiered cache and Cache-Controlmiddle
- Vary header and cache keysmiddle
- Stale-while-revalidate and cache stampedesenior
- Edge workers and edge-side compositionsenior
- CDN operations and observabilitysenior
- WebSocket: the HTTP upgrade handshakejunior
- WebSocket vs SSE vs long-polling: choosing the right transportmiddle
- WebSocket backpressure: when clients can''''t keep upmiddle
- Reconnection: jittered backoff, thundering herd, message resumptionsenior
- WebSocket at scale: HTTP/2 multiplexing, permessage-deflate, C10Msenior
- WebSocket in production: proxies, security, and distributed architecturesenior
- What reverse proxies dojunior
- Balancing algorithms: round-robin to power-of-two-choicesmiddle
- L4 vs L7 load balancing and client-IP preservationmiddle
- Health checks, connection draining, and slow startmiddle
- Retry storms, circuit breakers, and load sheddingsenior
- Resilient LB architecture: anycast, zone-aware routing, and observabilitysenior
- Why QUIC and not TCP+TLSjunior
- QUIC streams and head-of-line blockingjunior
- Integrated handshake and 1-RTTmiddle
- Connection IDs and network migrationmiddle
- Loss detection and congestion controlmiddle
- 0-RTT resumption and packet encryptionsenior
- Deployment tradeoffs and CPU costsenior
- DDoS: what it is and why it worksjunior
- Amplification attacks and state exhaustionmiddle
- Rate limiting: algorithms and architecturemiddle
- WAFs, firewalls, mTLS, and HSTSmiddle
- DNS cache poisoning and BGP hijackingsenior
- Defense-in-depth architecture and attack economicssenior
- The twelve layers: one URL, seven actorsjunior
- DNS, TCP, TLS in sequence: where the milliseconds gomiddle
- Critical render path and Core Web Vitalsmiddle
- Proxy intercepts and security gates: rate limiters, WAF, mTLSmiddle
- Alternate paths: QUIC 0-RTT, WebSocket upgrade, connection migrationmiddle
- Observability: distributed traces, USE/RED, and samplingsenior
- Resilience: cascading retries, circuit breakers, and error budgetssenior
- What the three signals are: logs, metrics, and tracesjunior
- Metrics and cardinality: the cost model of a time-series databasemiddle
- Logs and volume: the cost model of structured loggingmiddle
- Traces and sampling: the cost model of distributed tracingmiddle
- Join keys and exemplars: making the three signals composemiddle
- Observability 2.0: wide events and the cost shiftsenior
- Failure modes and engineering practice: cardinality budgets, PII, and samplingsenior
- Why structured logs exist: the diary vs the spreadsheetjunior
- The production log schema: fields every line must carrymiddle
- Log levels and alert routingmiddle
- Sampling strategies and log costmiddle
- PII redaction and log injectionsenior
- Trace context propagation in logssenior
- OTel Logs Data Model and audit logs as a subsystemsenior
- OTel signals, Semantic Conventions, and the OTLP wire formatmiddle
- Auto-instrumentation and manual spans: the 80/20 of OTelmiddle
- The OTel Collector: receivers, processors, exporters, and deployment patternsmiddle
- Sampling strategies: head, tail, and parent-basedmiddle
- Vendor neutrality, eBPF instrumentation, the Operator, and browser/serverless OTelsenior
- Operating the OTel Collector: reliability, version skew, failure modes, and governancesenior
- RED and USE: two checklists, one triage disciplinejunior
- Instrumenting RED in Prometheus: counters, histograms, and cardinality disciplinemiddle
- USE on Linux: CPU, memory, disk, network, and PSImiddle
- Golden signals, dashboard layout, and service mesh auto-REDmiddle
- Cardinality as a cost driver: labels, PII, exemplars, and samplingmiddle
- Native histograms, SLO tie-in, and production failure patternsmiddle
- Choosing SLIs and SLO targets: ratios, not feelingsmiddle
- Multi-window multi-burn-rate alerting: why AND beats ORmiddle
- Error budget policy, latency SLOs, and composite journeysmiddle
- Iceberg SLIs, composite SLO math, and SLA vs SLOsenior
- Flame graphs: reading the picture that shows where time goesjunior
- Sampling vs instrumentation profiling: why 99 Hz wins in productionmiddle
- Profile types: CPU, memory, off-CPU, mutex — which one to reach formiddle
- Continuous profiling: always-on flame graphs with eBPF and trace-id correlationmiddle
- How flame graphs are built from samples, and the production workflows that use themmiddle
- Linux perf, eBPF internals, PGO, and the limits of samplingsenior
- Profiling in production: security, war stories, OTel profiles, and the infrastructure designsenior
- The debugging funnel: SLO → RED → trace → profilejunior
- OTel architecture: one SDK, four signals, one wire formatmiddle
- Cost discipline: keeping observability under 5% of infra spendmiddle
- Scale, security, and the ROI of observable systemssenior
- Why profile first: measure where time actually goesjunior
- Amdahl''''s law and self-time: the ceiling on every speedup you can shipmiddle
- The measurement loop: microbench, macrobench, prod profile, observer effectmiddle
- Reading flame graphs: shapes, per-language profilers, and the 60-second scanmiddle
- Statistical baselines: why one run is not a measurementmiddle
- Profiler history and microbenchmark pitfalls: Knuth to GWPsenior
- Hardware counters, cold-start profiles, and profile securitysenior
- Continuous profiling at scale: costs, CI gates, trace correlation, and anti-patternssenior
- What makes a hot path: symptom vs causejunior
- Five shapes of hotspot: CPU, alloc, cache, lock, syscallmiddle
- Reading parent and child chains: where to apply the fixmiddle
- JIT deopt, the fix-and-verify loop, and PR-time profilingmiddle
- Hardware counters and Intel TMA: sub-category diagnosissenior
- False sharing and native-bridge hot pathssenior
- Hot paths in production: security, tail latency, and tooling lineagesenior
- Memory hierarchy: why the same O(N) loop can be 17x slowerjunior
- Row-major vs column-major: access order and the 9x gapjunior
- Branch prediction and branchless codemiddle
- Hardware prefetcher, TLB, and memory-level parallelismsenior
- GC basics: what the runtime taxes you forjunior
- GC algorithms: generational, concurrent, and per-runtimemiddle
- GC tradeoffs: pause, throughput, heap — and object poolingmiddle
- GC tuning: pacing, heap shape, and allocation observabilitymiddle
- GC internals: tri-color invariant, write barriers, and per-runtime deep-divessenior
- GC in production: observability, security, edge cases, and fleet governancesenior
- N+1: one logical operation, many round-tripsjunior
- Fix families: JOIN, IN, preload, and DataLoadermiddle
- Detecting N+1: query logs, APM traces, and CI gatesmiddle
- DataLoader: batching across resolver treesmiddle
- Cross-protocol N+1: HTTP fan-out and Redis MGETmiddle
- N+1 at scale: pool exhaustion, plan changes, and denormalisationsenior
- Batching: amortize fixed cost per operationjunior
- The batching window: size and wait timemiddle
- Batching in Kafka and Postgresmiddle
- io_uring and observability of batchingmiddle
- From Nagle to io_uring: evolution of batchingmiddle
- Backpressure, failure isolation, and batch security in productionsenior
- What a bundle actually costs: download, parse, compile, executejunior
- Core Web Vitals: LCP, INP, and CLSmiddle
- Code splitting: route-level, component-level, vendor splittingmiddle
- Tree shaking and compression: removing what you don''''t usemiddle
- Third-party scripts: the silent budget killermiddle
- CI enforcement and RUM: making budgets stickmiddle
- V8 JIT pipeline, HTTP priorities, and bundle securitysenior
- The performance loop: discipline, not a projectjunior
- Classify and fix: matching bottleneck families to remediesmiddle
- Observability stack and CI gates: catching regressions before they shipmiddle
- Incident to enforcement: SLO burn to verified fix in 35 minutesmiddle
- Culture, economics, and org-scale performancesenior