awesome-everything RU
↑ Back to the climb

Databases

Scan types: Seq, Index, Bitmap, Index-Only

Crux Postgres picks among four scan types based on selectivity and I/O cost. Reading the cost line tells you what the planner assumed and whether BUFFERS confirms it.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at middle altitude — in the sky
◷ 14 min

EXPLAIN shows Seq Scan on your 10M-row table even though an index exists on the filter column. The planner is not broken — it calculated that a sequential scan is cheaper for this selectivity. Understanding when and why the planner picks each scan type is what separates guessing from diagnosing.

Reading the cost line

Every plan node prints:

(cost=startup..total rows=N width=B)
  • startup — cost paid before the first row can be emitted. Zero for Seq Scan; non-zero for Sort (must read everything first).
  • total — cost to run the node to completion.
  • rows — estimated rows emitted by this node.
  • width — average row width in bytes.

Cost is in arbitrary units where 1.0 ≈ one sequential page fetch. Only ratios matter; the absolute numbers are meaningless without context. With ANALYZE, every node also gets (actual time=startup..total rows=N loops=L) — real wall time in milliseconds.

Interpretation rule: if the node is inside a Nested Loop’s inner side, divide actual time by loops to get per-execution cost.

The four scan types

Scan typeI/O patternWins when
Seq ScanFull heap, sequentialPredicate matches many rows, or no usable index
Index ScanIndex walk + heap fetch per row (random I/O)Few rows match (<5-15% of table) and rows are not clustered
Bitmap Heap ScanIndex walk → bitmap of TIDs → sorted heap readMany scattered rows (hundreds to millions); turns random I/O into sequential
Index Only ScanIndex leaves only — no heap fetchAll needed columns are in the index AND Visibility Map says pages are all-visible

Seq Scan

Cost formula: relpages × seq_page_cost + reltuples × cpu_tuple_cost. Default seq_page_cost = 1.0, cpu_tuple_cost = 0.01. For a 5,000-page table: 5000 × 1.0 + 1M × 0.01 = 15,000 units. Optimal when the predicate is unselective (most rows match) — reading the whole heap sequentially is faster than hopping randomly to each matching row.

Index Scan

Cost grows with random_page_cost × matching_rows. Default random_page_cost = 4.0 (HDD-era). For 100 matching rows: 100 × 4.0 = 400 — much cheaper than the 15,000 Seq Scan. For 100,000 matching rows: 100,000 × 4.0 = 400,000 — Seq Scan wins. The crossover is roughly when the predicate matches 5–15% of the table.

SSD tuning note: on NVMe SSDs, random reads are only 1.5–2× slower than sequential — not 4×. Set random_page_cost = 1.1 in postgresql.conf for SSD systems. The planner will then prefer Index Scan for a much larger fraction of queries.

Bitmap Heap Scan

A two-step process: first a Bitmap Index Scan builds an in-memory bitmap of all matching TIDs from the index; then Bitmap Heap Scan sorts the TIDs by physical page address and reads the heap in order. This converts scattered random reads into a sequential heap traversal. It wins when many rows match but they are scattered across the heap — the middle ground between Index Scan and Seq Scan.

The bitmap can be OR-ed across multiple indexes (BitmapOr node) — useful when an OR predicate has an index per column.

Index Only Scan

Reads only the index leaf pages — no heap fetch — when two conditions are both true: (1) all columns needed by the query are stored in the index (either as key columns or via INCLUDE), and (2) the Visibility Map confirms the page is “all-visible” (every tuple visible to all transactions). When the VM bit is clear, Postgres must fall back to a heap fetch for that page. In EXPLAIN ANALYZE output, Heap Fetches: N reveals how often this happened — high N means the VM is stale and VACUUM has not run recently.

BUFFERS: the I/O picture

EXPLAIN (ANALYZE, BUFFERS) adds buffer accounting per node:

Seq Scan on orders  (actual time=0.1..450 rows=1000000 loops=1)
  Buffers: shared hit=4800 read=200
  • shared hit — page found in Postgres’s shared_buffers cache (no disk I/O)
  • read — page fetched from OS or disk
  • dirtied — page modified by this node
  • written — page flushed to disk

High read with low hit means the working set is larger than shared_buffers, or the query is touching cold data. This diagnoses “is this slow because of CPU or I/O”: high reads = I/O-bound; low reads but high actual time = CPU-bound (often a poor join or heavy sort).

Scan type cost constants (Postgres defaults)
seq_page_cost (default)
1.0
random_page_cost (default, HDD-tuned)
4.0
random_page_cost (SSD-tuned)
1.1
cpu_tuple_cost
0.01
cpu_index_tuple_cost
0.005
Index Scan vs Seq Scan crossover (typical)
~5-15% of table rows
Index Only Scan speedup vs Index Scan
~10-100x on hot reads
Quiz

A 1M-row table has an index on `status`. A query `WHERE status = 'active'` returns 400,000 rows (40% of the table). What scan type will the planner most likely choose?

Quiz

EXPLAIN ANALYZE shows `Index Only Scan ... Heap Fetches: 12000`. What does this mean?

Quiz

On an NVMe SSD server you leave `random_page_cost` at the default 4.0. What production effect does this have?

Recall before you leave
  1. 01
    Explain why the planner picks Seq Scan on a 1M-row table when an index exists on the filter column.
  2. 02
    When does Bitmap Heap Scan win over both Index Scan and Seq Scan?
  3. 03
    What is the BUFFERS output in EXPLAIN ANALYZE and how do you use it to diagnose I/O vs CPU bottlenecks?
Recap

Postgres picks among four scan types based on cost: Seq Scan reads every heap page sequentially and wins when most rows match; Index Scan walks the B-tree and fetches heap rows individually, winning when the predicate is selective (roughly <10% of rows); Bitmap Heap Scan builds a sorted TID bitmap and reads the heap in order, winning in the middle range of selectivity; Index Only Scan skips the heap entirely when all needed columns are in the index and the Visibility Map is current. The planner uses cost constants — seq_page_cost, random_page_cost, cpu_tuple_cost — to compare options. On NVMe SSD, set random_page_cost = 1.1 so the planner correctly prefers Index Scan for selective predicates. Use EXPLAIN (ANALYZE, BUFFERS) to see whether a query is I/O-bound (high page reads) or CPU-bound (high actual time with few reads).

Practice

Do these to turn recognition into skill.

Connected lessons
appears again in174
Continue the climb ↑Join algorithms and the row-estimate cascade
shortcuts expand
search
K
prev piece
k
next piece
j
cycle tier
t
this menu
?
sources3
expand
  1. 01
  2. 02
  3. 03

Trademarks belong to their respective owners. Editorial reference only.