Databases DB · 02 · 03

Partial, expression, and covering indexes

Three index modifiers that change what a B-tree costs and what it can answer: partial restricts to a row subset, expression indexes a derived value, INCLUDE adds projection columns without affecting the sort key.

DB Middle ◷ 14 min

Level

FoundationsJuniorMiddleSenior

A tasks table has 100M rows. 80% have status = 'done' and are never queried by the dashboard. The full B-tree on (workspace_id, created_at) is 2 GB and every insert touches it. A partial index WHERE status IN ('open','in_progress') is 400 MB, 5× cheaper to maintain, and the dashboard queries run just as fast — because it only indexes the rows the dashboard actually needs.

Partial indexes

A partial index is a B-tree (or any index type) built only over rows matching a WHERE predicate.

CREATE INDEX ON orders(user_id) WHERE status = 'pending';

This indexes only pending orders — typically a small fraction of the table. The result: smaller index, faster lookup on the hot subset, cheaper write overhead (the index is only touched when status = 'pending').

For the planner to use a partial index, the query’s WHERE clause must logically imply the index’s predicate. A query WHERE user_id = 42 AND status = 'pending' can use the index. A query WHERE user_id = 42 alone cannot (it would need to match all statuses, but the index only covers pending rows).

Production use cases:

Soft-delete pattern: WHERE deleted_at IS NULL — only active rows are indexed; historical deleted rows are excluded.
Hot-subset filters: a task table where 95% of rows are status = 'done' — index only 'open' and 'in_progress'.
Feature flags: WHERE feature_x_enabled = true — index only rows where the flag is active.

All three patterns share the same insight: when you index every row indiscriminately, writes to cold rows pay the same cost as writes to hot rows. A partial index breaks that coupling — the cold majority pays nothing.

Indexing only the ~20% hot subset cuts the index from 2 GB to 400 MB — 5x less to store and 5x cheaper to maintain on every write.

Partial unique indexes are a powerful variant. CREATE UNIQUE INDEX ON users (email) WHERE deleted_at IS NULL enforces “email is unique among non-deleted users.” This lets you soft-delete a user and later re-register the same email — impossible with a simple UNIQUE constraint on the email column.

Expression indexes

An expression index indexes a function of one or more columns, not the columns themselves.

CREATE INDEX ON users (LOWER(email));

This lets WHERE LOWER(email) = 'alice@x.com' use the index. Without it, LOWER(email) is evaluated on every row, defeating any index on the email column.

Common patterns:

Case-insensitive search: LOWER(email), UPPER(code).
Date truncation: DATE_TRUNC('day', created_at) for group-by-day queries.
JSON path extraction: (data->>'event_type') — indexes one field inside a JSONB column.
Computed values: COALESCE(first_name, '') || ' ' || COALESCE(last_name, '') for name-search.

The query must use the exact expression for the planner to recognize the index. WHERE LOWER(email) = ? uses the index. WHERE email = ? does not (unless the email is already lowercase and a regular index exists).

Covering indexes with INCLUDE

A regular B-tree index stores key columns in both internal nodes and leaf pages. The INCLUDE clause (Postgres 11+) adds extra columns to the leaf pages only — they are not part of the sort key, but they are stored in the leaf, available for the query.

When a query’s SELECT projection needs only columns that are present in the index (either as key columns or INCLUDE columns), Postgres can answer the query without fetching the row from the heap at all — this is an index-only scan. It is 10–100× faster than an index scan for hot reads because it eliminates the heap fetch entirely.

CREATE INDEX CONCURRENTLY idx_orders_workspace_pending_recent
ON orders (workspace_id, created_at DESC)
WHERE status = 'pending'
INCLUDE (id, total_cents);

This index:

Filters by workspace_id (leading column, tenant scope).
Sorts by created_at DESC (serves the ORDER BY without a Sort step).
Partial WHERE status = 'pending' — only covers ~20% of the table.
INCLUDE (id, total_cents) — the typical dashboard projection is covered, no heap fetch needed.

Indexes: size, build, and write cost

B-tree size, 8-byte key, 100M rows: ~1 GB
B-tree depth, 100M rows: ~4 levels
B-tree lookup time: ~5-50 μs
Index build (CREATE INDEX CONCURRENTLY): ~10-30 min / 100M rows
Index write overhead per INSERT/UPDATE: ~5-50 μs per index touched
Typical index overhead vs table size: ~10-30% per index
Index-only scan vs index scan speedup: ~10-100x on hot reads
Partial index size, 5%-matching predicate: ~5% of full B-tree

Modifier	What it does	When to use
Partial (WHERE)	Index only rows matching predicate	Hot subset is <20% of table; or unique constraint on non-deleted rows
Expression	Index a computed value	Query filters on LOWER(col), DATE_TRUNC, JSON field, or other function
INCLUDE	Store extra columns in leaf pages without affecting sort	Query projects columns beyond the key; enables index-only scan

▸Why this works

Why are partial indexes underused in practice? Because they require the developer to know the predicate before writing the query. Many teams add indexes by looking at slow queries after the fact, not by thinking about data distribution at schema design time. The discipline: when a new hot query always filters by a specific status, flag, or soft-delete condition, build the partial index at the same time the feature ships. The cost is near-zero at feature time; the benefit is 10–20× cheaper index maintenance across the table’s lifetime.

Pick the right index for a hot dashboard query

1/3

Quiz

A query: WHERE LOWER(email) = 'alice@x.com'. Which index does the planner use?

Quiz

A table has 10M rows, 95% with status = 'done'. A dashboard queries only WHERE status = 'open'. Which is the best index strategy?

The leaf entry carries both the sort key and the INCLUDE payload. When the query needs only those columns, the index-only scan returns from the leaf and the heap fetch is skipped (10-100x faster).

Recall before you leave

01
Why can a partial unique index enforce a constraint that a plain UNIQUE constraint on a column cannot?
02
What two conditions must hold for Postgres to use an index-only scan?
03
A query filters on DATE_TRUNC('day', created_at) = '2026-01-15'. There is a plain index on created_at. Does the planner use it?

Recap

Three modifiers turn a plain B-tree into a targeted tool. Partial indexes (WHERE predicate) restrict coverage to matching rows — 5% coverage means 5% of full-index size and write cost. Expression indexes cover computed values like LOWER(email) or (data->>'field') — the query must use the exact same expression. INCLUDE columns (Postgres 11+) store projection-only columns in the leaf pages without affecting the sort key, enabling index-only scans that never touch the heap. Used together — partial + composite + INCLUDE — a well-designed index can cover an entire hot dashboard query with near-zero heap I/O.

Now when you review a migration that adds an index, ask yourself: does this really need to cover all rows, or is there a hot subset that deserves its own partial index? That question alone can save gigabytes and cut write overhead by an order of magnitude.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 5 done

Connected lessons

builds on

The leading-column rule and composite index designmiddle

unlocks

Index types: GIN, GiST, BRIN, Hash, Bloom, and HOT updatesmiddle

deepens into

appears again in177

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.

Apply this

Put this lesson to work on a real build.

Grounded RAG ServiceA RAG demo that answers from a corpus is easy; a RAG service you'd trust in front of users is not. The hard part isn't retrieval, it's grounding: making the model say only what the retrieved text supports, attaching citations the reader can check, and proving with an eval set that the answers don't drift into confident fiction. You'll build the whole loop — chunk, embed, store, retrieve top-k, ground, cite, score — and feel exactly where it leaks.Job schedulerA cron + backoff job runner with at-least-once delivery, idempotent handlers, and visibility timeouts — so no job is silently lost even when workers crash mid-execution.