Databases DB · 02 · 05

Index-only scans, the Visibility Map, and INCLUDE

Index-only scans return data straight from the index without touching the heap — but only when the Visibility Map confirms row visibility. This lesson covers the VM, VACUUM, autovacuum tuning, and the INCLUDE clause for covering indexes.

DB Senior ◷ 16 min

Level

FoundationsJuniorMiddleSenior

EXPLAIN ANALYZE shows “Index Only Scan” on one query and “Index Scan” on another — identical index, different tables. The difference is not the index. It is the Visibility Map. One table was VACUUMed yesterday; the other has 12% dead tuples. Understanding why this matters is the difference between a 1 ms query and a 10 ms query on the same data.

What an index scan vs index-only scan costs

A regular index scan for a query like SELECT id, status FROM orders WHERE user_id = 42:

Walk the B-tree to the matching leaf entries — O(log n), a few page reads.
For each matching TID, fetch the heap row to get id and status — one heap page read per row if the rows are scattered.

Step 2 is the bottleneck at scale. For 10,000 matching rows across 10,000 scattered heap pages, that is 10,000 random I/O operations.

An index-only scan eliminates step 2 entirely. If the index contains all columns the query needs, Postgres reads only the index leaf pages. For dense ranges (1,000 rows on 10 index leaf pages), the I/O drops from thousands to tens.

Same index, same rows — only the Visibility Map state changes. A current VM collapses heap I/O from ~1,000 to ~10; a stale VM erases the entire index-only-scan advantage.

Scan type	I/O pattern	When used
Index scan	Index walk + heap fetch per row	Columns not all in index
Bitmap heap scan	Index walk + sorted heap read (batched)	Many rows; reduces random I/O
Index only scan	Index leaves only — no heap	All needed columns in index AND VM bits set
Sequential scan	Full heap read	No usable index, or low selectivity

The Visibility Map: why index-only scans need it

Postgres uses MVCC — each row may have multiple versions (old and new tuples from concurrent updates). To know whether a row is visible to the current transaction, Postgres normally reads the row’s system columns (xmin, xmax, transaction status bits). These columns are in the heap, not the index.

An index-only scan does not read the heap. So how does it know if a row is visible?

The Visibility Map (VM) is a per-table bitmap with one bit per heap page. The bit is set to 1 (“all visible”) when every tuple on that page is visible to every current and future transaction — meaning no concurrent updates or deletes are pending. VACUUM sets these bits after cleaning dead tuples. Any write that touches the page clears the bit.

When Postgres executes an index-only scan and encounters a matching TID:

If the VM bit for that page is set: the row is known-visible without reading the heap. No heap fetch.
If the VM bit is clear: Postgres must fetch the heap page to check visibility. This heap fetch is counted in EXPLAIN ANALYZE as “Heap Fetches: N”.

A high “Heap Fetches” count in an index-only scan means the VM is not current — the “index-only” scan is actually doing heap fetches for most rows.

VACUUM keeps the VM current

VACUUM reclaims dead tuples and sets VM bits. Autovacuum runs it automatically (triggered when dead tuples exceed autovacuum_vacuum_scale_factor × table size, default 20%). For tables that depend on index-only scans:

20% dead tuples is far too tolerant. Set autovacuum_vacuum_scale_factor = 0.05 and autovacuum_analyze_scale_factor = 0.02.
Long-running transactions prevent VACUUM from reclaiming tuples — a single transaction open for hours can block VM updates for that entire duration.
Monitor pg_stat_user_tables.n_dead_tup as the leading indicator of VM freshness.

-- Tune autovacuum per-table on hot OLTP tables
ALTER TABLE orders SET (
  autovacuum_vacuum_scale_factor = 0.05,
  autovacuum_analyze_scale_factor = 0.02
);

The INCLUDE clause: covering indexes

Postgres 11+ supports CREATE INDEX ... INCLUDE (cols) — the INCLUDE columns are stored in the index leaf pages but are not part of the sort key. The result: the index is “covering” for queries that need those extra columns in the SELECT projection.

Without INCLUDE, to make a query covering you would have to add the projection columns to the key — which changes the sort order and may harm other queries that rely on that index’s sort for ORDER BY.

-- Key columns: workspace_id, created_at DESC (sort key for ORDER BY)
-- INCLUDE: id, total_cents (payload for the projection — not in sort key)
CREATE INDEX CONCURRENTLY idx_orders_ws_recent_covering
  ON orders (workspace_id, created_at DESC)
  INCLUDE (id, total_cents);

Query SELECT id, total_cents FROM orders WHERE workspace_id = $1 ORDER BY created_at DESC LIMIT 50 now hits only the index. No heap fetch. 50 leaf entries read in order.

The trade: INCLUDE columns increase index size (they are stored in every leaf entry). Large INCLUDE payloads can make the index approach table size. Keep INCLUDE columns to the projection columns actually needed by the hot query, not everything.

When index-only scans work — and when they do not

Works well: insert-only or append-mostly tables (events, logs, audit trails). VM bits stay set because there are no updates clearing them. VACUUM rarely needed for visibility purposes.

Degrades: heavily-updated tables. Every update clears the VM bit for the updated page. Until VACUUM runs and sets the bit again, every index-only scan for that page must do a heap fetch. On a table with 50,000 updates/second, VM bits are cleared faster than autovacuum can set them — the “index-only scan” becomes an “index scan with extra steps.”

Diagnosis: EXPLAIN (ANALYZE, BUFFERS) on the query. Look for:

Index Only Scan with Heap Fetches: 0 — ideal.
Index Only Scan with Heap Fetches: N (high N) — VM not current, tune autovacuum.
Index Scan instead of Index Only Scan — columns not all in the index; add INCLUDE.

Concurrency and index build timing

CREATE INDEX CONCURRENTLY on a 100M-row table takes 10-30 minutes. During the build:

Postgres scans the table twice.
Between phases, it waits for all current transactions to finish (to ensure the index is consistent with all recent writes).
The index is replicated to standbys via WAL.

Implications:

Long-running analytic queries on the same table can stall the build for hours (the build waits for them).
Multiple concurrent builds on the same database compete for I/O — schedule large index builds in low-traffic windows.
After the build, check pg_indexes.indisvalid — a failed build leaves an INVALID index that is invisible to the planner but still uses storage.

-- Check for invalid indexes after a CONCURRENTLY build
SELECT indexrelid::regclass, indisvalid
FROM pg_index
WHERE NOT indisvalid;

Recovery from INVALID: DROP INDEX CONCURRENTLY idx_name; then retry.

Index-only scan and VACUUM numbers

Index-only scan benefit vs index scan: 10-100x on hot reads
Heap Fetches: 0 in EXPLAIN ANALYZE: ideal — VM is current
autovacuum_vacuum_scale_factor default: 20% dead tuples
Recommended scale_factor for IOS-dependent tables: 5%
INCLUDE supported since: Postgres 11 (2018)
Long-running transaction impact on VM: blocks VM bit-setting for pages it touched
CREATE INDEX CONCURRENTLY on 100M rows: ~10-30 min
INVALID index after failed CONCURRENTLY build: invisible to planner; still uses storage
Monitor: n_dead_tup in pg_stat_user_tables: leading indicator of VM freshness

Trace it

1/4

An analytics query on an orders table shows Index Only Scan but Heap Fetches: 12,000. Diagnose and fix.

Step 1 of 4

Step 1: what does Heap Fetches: 12,000 mean?

Locked

Step 2: how do you check VM health?

Locked

Step 3: what is the fix?

Locked

Step 4: how do you verify the fix is durable?

Quiz

A query is showing 'Index Only Scan' but Heap Fetches: 8,000 in EXPLAIN ANALYZE. What is the most likely cause?

Quiz

What is the purpose of the INCLUDE clause in CREATE INDEX?

▸lesson.inset.warning

The EXPLAIN plan label “Index Only Scan” is misleading when Heap Fetches is high. The plan node name reflects the intended scan type — the actual behaviour depends on VM state at runtime. Always check Heap Fetches in EXPLAIN ANALYZE output, not just the scan type label. A query that “works perfectly in staging” can degrade in production if the production table has a worse autovacuum schedule.

VACUUM sets the VM bit (all tuples visible); any write clears it. Bit set means Heap Fetches: 0 — the ideal. Bit clear forces a heap fetch, so a stale VM degrades an index-only scan into heap-reading work.

Recall before you leave

01
Articulate why index-only scans need the Visibility Map and what operational discipline keeps them firing correctly.
02
When should you use INCLUDE vs adding columns to the sort key in a composite index?
03
What happens when CREATE INDEX CONCURRENTLY fails, and how do you recover?

Recap

An index-only scan reads column values directly from the index leaves, skipping the heap entirely — cutting I/O by 10-100x for queries that access only indexed columns. The Visibility Map enables this: a per-table bitmap where a set bit means all tuples on that page are visible without a heap check. VACUUM sets VM bits; writes clear them. Stale VM causes index-only scans to degrade: “Index Only Scan” with high Heap Fetches is a symptom of insufficient vacuuming.

Use the INCLUDE clause to add projection columns to index leaves without making them part of the sort key. This creates a “covering index” that answers queries entirely from the index when combined with current VM bits.

Operational rules: tune autovacuum_vacuum_scale_factor to 0.05 on tables that depend on index-only scans; monitor n_dead_tup in pg_stat_user_tables; kill long-running transactions that block VACUUM. After CREATE INDEX CONCURRENTLY, verify indisvalid = true in pg_index before considering the migration complete.

Now when you see “Index Only Scan” in EXPLAIN with a surprisingly high execution time, your first check is not the index — it is “Heap Fetches: N”. If N is large, you are not actually doing an index-only scan; you are doing a heap scan with an index as a directory. VACUUM is the fix, not the index redesign.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 5 done

Connected lessons

builds on

Index types: GIN, GiST, BRIN, Hash, Bloom, and HOT updatesmiddle

unlocks

Production failure modes and the index audit playbooksenior

appears again in287

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.