Databases DB · 02 · 02

The leading-column rule and composite index design

A multi-column B-tree is sorted by its first column first. Queries that skip the leading column cannot use it. Designing composites around this rule is the central skill in Postgres index engineering.

DB Middle ◷ 16 min

Level

FoundationsJuniorMiddleSenior

A team creates an index on (workspace_id, status, created_at) for a dashboard. Queries filter by workspace_id alone — fast. Queries filter by workspace_id and status — still fast. A new analytics query filters by status alone — sequential scan, 8 seconds. The index exists and is correct. The query is wrong for this index. The difference is the leading-column rule.

B-tree internals: why order matters

Before you can design a composite index correctly, you need to understand what “sorted by (a, b, c)” actually means on disk — because the physical layout is what makes the rule inescapable, not an arbitrary Postgres decision.

Postgres’s default index type is a B-tree (balanced search tree). Each internal node is a disk page (8KB) containing a sorted list of keys and pointers to child pages. Leaf nodes contain the actual key values and pointers (TIDs) to the heap rows.

A B-tree on a single column (a) is sorted globally by a. To answer WHERE a = 42, Postgres walks the tree from the root, taking at most ~4 hops for a 100M-row table (B-tree depth ≈ 4 levels with fanout ~200–400 keys per page), then reads the matching leaf entries.

A multi-column B-tree on (a, b, c) is sorted primarily by a, then by b within equal a values, then by c within equal (a, b) pairs. It accelerates:

Queries filtering on a alone: scan the leading subtree.
Queries filtering on a and b: narrow further.
Queries filtering on a, b, c: narrowest possible, best case.
Queries with a plus a range on b (e.g., WHERE a = 1 AND b > 100): still uses the index.
Queries with a plus ORDER BY b: the index already provides that sort order.

It does NOT accelerate:

Queries filtering only on b, only on c, or only on b, c — the index provides no ordering entry point for these; a full scan of the index is required, which is almost always worse than a seq scan.

Query filter	Index (a, b, c) helps?	Why
WHERE a = ?	Yes	Leading column — narrows by a
WHERE a = ? AND b = ?	Yes	Prefix (a, b)
WHERE a = ? AND b > ?	Yes	Range on second column after equality on first
WHERE a = ? ORDER BY b	Yes	Index already sorted by b within a
WHERE b = ?	No	b is not the leading column
WHERE b = ? AND c = ?	No	Neither leading

Designing composite indexes for real query patterns

The leading-column rule means composite index design follows the queries, not the table. The discipline:

List the top-N hot queries for a table.
Group them by which column is always present in the filter.
For each group, design a composite with that always-present column as the leading column, then add secondary columns in decreasing selectivity order.
Use ORDER BY columns as trailing key columns — they serve both the filter and the sort without an extra Sort step.

Together these four steps mean you are building the index around the access pattern, not the schema. Skip step 2 and you will have a composite that helps some queries and silently fails others — the leading-column violation is almost never obvious until you read an EXPLAIN plan.

Example: a dashboard query WHERE workspace_id = $1 AND status = 'pending' ORDER BY created_at DESC LIMIT 50. The always-present column is workspace_id (every query is tenant-scoped). The secondary filter is status. The sort is created_at DESC.

Best composite: (workspace_id, created_at DESC) WHERE status = 'pending' — the leading column matches the always-present filter; the sort column is second; the partial WHERE clause is handled separately (covered in lesson 03). An index (workspace_id, status, created_at) would also work but is a different shape.

One composite vs two single-column indexes

A composite (a, b) is one physical structure. It costs roughly 30–50% more than (a) alone in size and write overhead. Two single-column indexes (a) and (b) cost roughly the sum of their sizes and write overheads separately.

A composite wins when the dominant query always filters by the leading column. Two single-column indexes win when queries filter by a often AND by b often, never together — in that case no composite serves both, and the planner can combine them via a Bitmap Index Scan.

Senior rule: design the composite for the dominant query first. Add a second index only if a separate hot query needs a different leading column and the Bitmap And plan is not fast enough.

One composite wins for a dominant leading-column query at lower write cost; two single-column indexes win only when each column is hot alone and never together.

Trace it

1/3

A team migrates a legacy events table to add proper indexes. Walk the decisions.

Step 1 of 3

Step 1: how do you identify which indexes are needed?

Locked

Step 2: how do you design composites?

Locked

Step 3: how do you deploy without downtime?

Quiz

A table has an index on (region, status). Which query can use this index?

Quiz

A query filters on (region, status) where region has 5 values and status has 4. Which composite index order is better?

Order the steps

Order the composite index design steps:

1 List the top-N hot queries for the table
2 Identify which column is always present in the WHERE clause
3 Use that always-present column as the leading column
4 Add secondary filter columns in selectivity order
5 Add ORDER BY columns as trailing key columns to avoid extra Sort steps
6 Verify with EXPLAIN ANALYZE that the planner uses the index and eliminates Sort nodes

a — primary sort key entry point

b — sorted within each equal a needs a first

c — sorted within each equal (a, b) needs a, b first

The index is one ordered sequence: a globally, b inside each a-group, c inside each (a,b)-group. A query that skips a has no entry point — the leading-column rule.

Recall before you leave

01
Explain in detail why the leading-column rule exists, and what production patterns work around it.
02
When does a composite index win over two single-column indexes, and when do two single-column indexes win?
03
A query is SELECT * FROM events WHERE project_id = $1 ORDER BY created_at DESC LIMIT 50. Design the best single index.

Recap

A composite B-tree on (a, b, c) is sorted by a globally, b within each a-group, c within each (a, b)-group. The engine navigates from the root using this sort order, so the leading column must appear in the filter — otherwise the index offers no starting point. This is the most violated rule in production indexing. Design composites by listing hot queries, identifying the always-present filter column as the leader, adding secondary columns in selectivity order, and matching ORDER BY with trailing key columns. A well-designed composite typically replaces 3–5 single-column indexes at lower total write cost.

Now when you see a slow query that “should” be using an index, check the leading column first. If the query filters by the second or third column without the first, you have found the bug — and you know exactly how to fix it.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 5 done

Connected lessons

builds on

What an index is and how it speeds up queriesjunior

unlocks

deepens into

appears again in177

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.

Apply this

Put this lesson to work on a real build.

Job schedulerA cron + backoff job runner with at-least-once delivery, idempotent handlers, and visibility timeouts — so no job is silently lost even when workers crash mid-execution.URL shortener at scaleBuild a URL shortener that survives real traffic — then run it: deploy it, watch it, and work the incident when one hot link melts your cache.