Databases
The leading-column rule and composite index design
A team creates an index on (workspace_id, status, created_at) for a dashboard. Queries filter by workspace_id alone — fast. Queries filter by workspace_id and status — still fast. A new analytics query filters by status alone — sequential scan, 8 seconds. The index exists and is correct. The query is wrong for this index. The difference is the leading-column rule.
B-tree internals: why order matters
Postgres’s default index type is a B-tree (balanced search tree). Each internal node is a disk page (8KB) containing a sorted list of keys and pointers to child pages. Leaf nodes contain the actual key values and pointers (TIDs) to the heap rows.
A B-tree on a single column (a) is sorted globally by a. To answer WHERE a = 42, Postgres walks the tree from the root, taking at most ~4 hops for a 100M-row table (B-tree depth ≈ 4 levels with fanout ~200–400 keys per page), then reads the matching leaf entries.
A multi-column B-tree on (a, b, c) is sorted primarily by a, then by b within equal a values, then by c within equal (a, b) pairs. It accelerates:
- Queries filtering on
aalone: scan the leading subtree. - Queries filtering on
aandb: narrow further. - Queries filtering on
a, b, c: narrowest possible, best case. - Queries with
aplus a range onb(e.g.,WHERE a = 1 AND b > 100): still uses the index. - Queries with
aplusORDER BY b: the index already provides that sort order.
It does NOT accelerate:
- Queries filtering only on
b, only onc, or only onb, c— the index provides no ordering entry point for these; a full scan of the index is required, which is almost always worse than a seq scan.
| Query filter | Index (a, b, c) helps? | Why |
|---|---|---|
| WHERE a = ? | Yes | Leading column — narrows by a |
| WHERE a = ? AND b = ? | Yes | Prefix (a, b) |
| WHERE a = ? AND b > ? | Yes | Range on second column after equality on first |
| WHERE a = ? ORDER BY b | Yes | Index already sorted by b within a |
| WHERE b = ? | No | b is not the leading column |
| WHERE b = ? AND c = ? | No | Neither leading |
Designing composite indexes for real query patterns
The leading-column rule means composite index design follows the queries, not the table. The discipline:
- List the top-N hot queries for a table.
- Group them by which column is always present in the filter.
- For each group, design a composite with that always-present column as the leading column, then add secondary columns in decreasing selectivity order.
- Use
ORDER BYcolumns as trailing key columns — they serve both the filter and the sort without an extra Sort step.
Example: a dashboard query WHERE workspace_id = $1 AND status = 'pending' ORDER BY created_at DESC LIMIT 50. The always-present column is workspace_id (every query is tenant-scoped). The secondary filter is status. The sort is created_at DESC.
Best composite: (workspace_id, created_at DESC) WHERE status = 'pending' — the leading column matches the always-present filter; the sort column is second; the partial WHERE clause is handled separately (covered in lesson 03). An index (workspace_id, status, created_at) would also work but is a different shape.
One composite vs two single-column indexes
A composite (a, b) is one physical structure. It costs roughly 30–50% more than (a) alone in size and write overhead. Two single-column indexes (a) and (b) cost roughly the sum of their sizes and write overheads separately.
A composite wins when the dominant query always filters by the leading column. Two single-column indexes win when queries filter by a often AND by b often, never together — in that case no composite serves both, and the planner can combine them via a Bitmap Index Scan.
Senior rule: design the composite for the dominant query first. Add a second index only if a separate hot query needs a different leading column and the Bitmap And plan is not fast enough.
A team migrates a legacy events table to add proper indexes. Walk the decisions.
A table has an index on (region, status). Which query can use this index?
A query filters on (region, status) where region has 5 values and status has 4. Which composite index order is better?
Order the composite index design steps:
- 1 List the top-N hot queries for the table
- 2 Identify which column is always present in the WHERE clause
- 3 Use that always-present column as the leading column
- 4 Add secondary filter columns in selectivity order
- 5 Add ORDER BY columns as trailing key columns to avoid extra Sort steps
- 6 Verify with EXPLAIN ANALYZE that the planner uses the index and eliminates Sort nodes
- 01Explain in detail why the leading-column rule exists, and what production patterns work around it.
- 02When does a composite index win over two single-column indexes, and when do two single-column indexes win?
- 03A query is SELECT * FROM events WHERE project_id = $1 ORDER BY created_at DESC LIMIT 50. Design the best single index.
A composite B-tree on (a, b, c) is sorted by a globally, b within each a-group, c within each (a, b)-group. The engine navigates from the root using this sort order, so the leading column must appear in the filter — otherwise the index offers no starting point. This is the most violated rule in production indexing. Design composites by listing hot queries, identifying the always-present filter column as the leader, adding secondary columns in selectivity order, and matching ORDER BY with trailing key columns. A well-designed composite typically replaces 3–5 single-column indexes at lower total write cost.
appears again in174
- The journey of a request: seven stops from socket to responsejunior
- Accept and parse: from kernel queue to a typed requestmiddle
- Routing and middleware: choosing what runs, and in what ordermiddle
- Handler and response: from business logic to bytes on the wiremiddle
- Streaming and backpressure: when the client reads slower than you writesenior
- Timeouts and tail latency: budgets, deadlines, and the fan-out trapsenior
- Middleware and DI: the two patterns that shape every backendjunior
- Writing middleware: signatures, next(), and the three framework modelsmiddle
- Inversion of control: how dependencies reach a classmiddle
- DI scopes and lifecycles: singleton, request, transientmiddle
- DI as a testing seam: fakes, mocks, and the boundary that matterssenior
- DI containers in production: resolution graphs, circular deps, and when not tosenior
- Blocking vs non-blocking I/O: two ways to waitjunior
- The event loop: one thread, ordered phasesmiddle
- What blocks the loop: CPU work and sync callsmiddle
- Offloading CPU work: worker threads and the libuv poolmiddle
- Backpressure and bounded concurrencysenior
- Throughput under load: tail latency and saturationsenior
- Why pool: the cost of creating a connectionjunior
- Pool sizing: why bigger is not fastermiddle
- Acquisition and timeouts: the wait queue is the real latency dialmiddle
- Retry strategies: backoff, jitter, and thundering herdmiddle
- Observability, production failures, and global-scale designsenior
- Tasks, microtasks, and scheduler.yield()middle
- Timer accuracy, throttling, and idle workmiddle
- Node.js event loop: phases, nextTick, and loop lagsenior
- Rendering strategies: SSG, SSR, ISR, streaming, and hydrationjunior
- SSG, SSR, ISR, streaming, and RSC — how each worksmiddle
- Hydration cost: selective, progressive, islands, resumabilitymiddle
- Core Web Vitals: what LCP, INP, and CLS measurejunior
- LCP: four phases, one dominant costmiddle
- INP: input delay, processing, presentationmiddle
- Lab vs field: why the two disagree and how to use eachmiddle
- Metric tradeoffs, RUM attribution, and the CI+field loopsenior
- The full picture: URL to LCP to INP as a relay racejunior
- Eight layers traced: from the service worker to the second navigationmiddle
- Five canonical breaks: where production reliably diessenior
- The three-track method: reading traces and building a monitored systemsenior
- Bits on the wirejunior
- Latency mathmiddle
- Bufferbloat and congestionsenior
- The physical frontiersenior
- Sequence numbers and connection statemiddle
- Flow control and congestion controlmiddle
- BBR, production observability, and beyond TCPsenior
- CDN: putting content next doorjunior
- Anycast and GeoDNS: routing to the nearest edgemiddle
- Tiered cache and Cache-Controlmiddle
- Vary header and cache keysmiddle
- Stale-while-revalidate and cache stampedesenior
- Edge workers and edge-side compositionsenior
- CDN operations and observabilitysenior
- WebSocket: the HTTP upgrade handshakejunior
- WebSocket vs SSE vs long-polling: choosing the right transportmiddle
- WebSocket backpressure: when clients can''''t keep upmiddle
- Reconnection: jittered backoff, thundering herd, message resumptionsenior
- WebSocket at scale: HTTP/2 multiplexing, permessage-deflate, C10Msenior
- WebSocket in production: proxies, security, and distributed architecturesenior
- What reverse proxies dojunior
- Balancing algorithms: round-robin to power-of-two-choicesmiddle
- L4 vs L7 load balancing and client-IP preservationmiddle
- Health checks, connection draining, and slow startmiddle
- Retry storms, circuit breakers, and load sheddingsenior
- Resilient LB architecture: anycast, zone-aware routing, and observabilitysenior
- Why QUIC and not TCP+TLSjunior
- QUIC streams and head-of-line blockingjunior
- Integrated handshake and 1-RTTmiddle
- Connection IDs and network migrationmiddle
- Loss detection and congestion controlmiddle
- 0-RTT resumption and packet encryptionsenior
- Deployment tradeoffs and CPU costsenior
- DDoS: what it is and why it worksjunior
- Amplification attacks and state exhaustionmiddle
- Rate limiting: algorithms and architecturemiddle
- WAFs, firewalls, mTLS, and HSTSmiddle
- DNS cache poisoning and BGP hijackingsenior
- Defense-in-depth architecture and attack economicssenior
- The twelve layers: one URL, seven actorsjunior
- DNS, TCP, TLS in sequence: where the milliseconds gomiddle
- Critical render path and Core Web Vitalsmiddle
- Proxy intercepts and security gates: rate limiters, WAF, mTLSmiddle
- Alternate paths: QUIC 0-RTT, WebSocket upgrade, connection migrationmiddle
- Observability: distributed traces, USE/RED, and samplingsenior
- Resilience: cascading retries, circuit breakers, and error budgetssenior
- What the three signals are: logs, metrics, and tracesjunior
- Metrics and cardinality: the cost model of a time-series databasemiddle
- Logs and volume: the cost model of structured loggingmiddle
- Traces and sampling: the cost model of distributed tracingmiddle
- Join keys and exemplars: making the three signals composemiddle
- Observability 2.0: wide events and the cost shiftsenior
- Failure modes and engineering practice: cardinality budgets, PII, and samplingsenior
- Why structured logs exist: the diary vs the spreadsheetjunior
- The production log schema: fields every line must carrymiddle
- Log levels and alert routingmiddle
- Sampling strategies and log costmiddle
- PII redaction and log injectionsenior
- Trace context propagation in logssenior
- OTel Logs Data Model and audit logs as a subsystemsenior
- OTel signals, Semantic Conventions, and the OTLP wire formatmiddle
- Auto-instrumentation and manual spans: the 80/20 of OTelmiddle
- The OTel Collector: receivers, processors, exporters, and deployment patternsmiddle
- Sampling strategies: head, tail, and parent-basedmiddle
- Vendor neutrality, eBPF instrumentation, the Operator, and browser/serverless OTelsenior
- Operating the OTel Collector: reliability, version skew, failure modes, and governancesenior
- RED and USE: two checklists, one triage disciplinejunior
- Instrumenting RED in Prometheus: counters, histograms, and cardinality disciplinemiddle
- USE on Linux: CPU, memory, disk, network, and PSImiddle
- Golden signals, dashboard layout, and service mesh auto-REDmiddle
- Cardinality as a cost driver: labels, PII, exemplars, and samplingmiddle
- Native histograms, SLO tie-in, and production failure patternsmiddle
- Choosing SLIs and SLO targets: ratios, not feelingsmiddle
- Multi-window multi-burn-rate alerting: why AND beats ORmiddle
- Error budget policy, latency SLOs, and composite journeysmiddle
- Iceberg SLIs, composite SLO math, and SLA vs SLOsenior
- Flame graphs: reading the picture that shows where time goesjunior
- Sampling vs instrumentation profiling: why 99 Hz wins in productionmiddle
- Profile types: CPU, memory, off-CPU, mutex — which one to reach formiddle
- Continuous profiling: always-on flame graphs with eBPF and trace-id correlationmiddle
- How flame graphs are built from samples, and the production workflows that use themmiddle
- Linux perf, eBPF internals, PGO, and the limits of samplingsenior
- Profiling in production: security, war stories, OTel profiles, and the infrastructure designsenior
- The debugging funnel: SLO → RED → trace → profilejunior
- OTel architecture: one SDK, four signals, one wire formatmiddle
- Cost discipline: keeping observability under 5% of infra spendmiddle
- Scale, security, and the ROI of observable systemssenior
- Why profile first: measure where time actually goesjunior
- Amdahl''''s law and self-time: the ceiling on every speedup you can shipmiddle
- The measurement loop: microbench, macrobench, prod profile, observer effectmiddle
- Reading flame graphs: shapes, per-language profilers, and the 60-second scanmiddle
- Statistical baselines: why one run is not a measurementmiddle
- Profiler history and microbenchmark pitfalls: Knuth to GWPsenior
- Hardware counters, cold-start profiles, and profile securitysenior
- Continuous profiling at scale: costs, CI gates, trace correlation, and anti-patternssenior
- What makes a hot path: symptom vs causejunior
- Five shapes of hotspot: CPU, alloc, cache, lock, syscallmiddle
- Reading parent and child chains: where to apply the fixmiddle
- JIT deopt, the fix-and-verify loop, and PR-time profilingmiddle
- Hardware counters and Intel TMA: sub-category diagnosissenior
- False sharing and native-bridge hot pathssenior
- Hot paths in production: security, tail latency, and tooling lineagesenior
- Memory hierarchy: why the same O(N) loop can be 17x slowerjunior
- Row-major vs column-major: access order and the 9x gapjunior
- Branch prediction and branchless codemiddle
- Hardware prefetcher, TLB, and memory-level parallelismsenior
- GC basics: what the runtime taxes you forjunior
- GC algorithms: generational, concurrent, and per-runtimemiddle
- GC tradeoffs: pause, throughput, heap — and object poolingmiddle
- GC tuning: pacing, heap shape, and allocation observabilitymiddle
- GC internals: tri-color invariant, write barriers, and per-runtime deep-divessenior
- GC in production: observability, security, edge cases, and fleet governancesenior
- N+1: one logical operation, many round-tripsjunior
- Fix families: JOIN, IN, preload, and DataLoadermiddle
- Detecting N+1: query logs, APM traces, and CI gatesmiddle
- DataLoader: batching across resolver treesmiddle
- Cross-protocol N+1: HTTP fan-out and Redis MGETmiddle
- N+1 at scale: pool exhaustion, plan changes, and denormalisationsenior
- Batching: amortize fixed cost per operationjunior
- The batching window: size and wait timemiddle
- Batching in Kafka and Postgresmiddle
- io_uring and observability of batchingmiddle
- From Nagle to io_uring: evolution of batchingmiddle
- Backpressure, failure isolation, and batch security in productionsenior
- What a bundle actually costs: download, parse, compile, executejunior
- Core Web Vitals: LCP, INP, and CLSmiddle
- Code splitting: route-level, component-level, vendor splittingmiddle
- Tree shaking and compression: removing what you don''''t usemiddle
- Third-party scripts: the silent budget killermiddle
- CI enforcement and RUM: making budgets stickmiddle
- V8 JIT pipeline, HTTP priorities, and bundle securitysenior
- The performance loop: discipline, not a projectjunior
- Classify and fix: matching bottleneck families to remediesmiddle
- Observability stack and CI gates: catching regressions before they shipmiddle
- Incident to enforcement: SLO burn to verified fix in 35 minutesmiddle
- Culture, economics, and org-scale performancesenior