awesome-everything RU
↑ Back to the climb

Databases

EXPLAIN and execution plans: what the planner decides and why

Crux An execution plan is the tree of operations Postgres picks for a query. EXPLAIN shows the plan; EXPLAIN ANALYZE runs it and shows real timings. Rows-estimated vs rows-actual is the most diagnostic number in the output.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at junior altitude — the surface
◷ 12 min

A dashboard query takes 6 seconds in production. A colleague asks: “did you EXPLAIN it?” You run EXPLAIN ANALYZE and in two seconds you see a Seq Scan on orders ... actual rows=80000000. One index. 200ms. That is what EXPLAIN is for.

What an execution plan is

SQL is declarative — you write which rows you want, not how to get them. The Postgres planner bridges that gap. It reads your SQL, consults pg_statistic to estimate how many rows each condition will produce, enumerates possible combinations of scan types and join algorithms, costs each option, and picks the cheapest one. The result is a plan — a tree of operators with row estimates attached.

A query goes through four stages before any row is returned:

  1. Parse — text is converted to a parse tree
  2. Rewrite — views and rules are applied
  3. Plan — the planner builds the cheapest execution tree
  4. Execute — the engine runs the tree

The planner is the only stage you tune. Everything else is automatic.

CommandRuns the query?Shows
EXPLAINNoPlanner’s estimate: costs, row counts, plan shape
EXPLAIN ANALYZEYesEstimates + actual timings, actual rows, loops
EXPLAIN (ANALYZE, BUFFERS)YesAll the above + page cache hit/read counts

Reading the output

A simple plan looks like this:

Index Scan using idx_orders_workspace on orders
  (cost=0.43..14.2 rows=42 width=120)
  (actual time=0.08..1.1 rows=42 loops=1)
  Index Cond: (workspace_id = 42)
  • cost=0.43..14.2 — startup cost (first row) and total cost in arbitrary units. Not milliseconds. Ratios matter; absolute values do not.
  • rows=42 — planner’s estimate of rows this node will emit
  • actual rows=42 — rows actually emitted when the query ran
  • loops=1 — how many times this node executed (important inside joins)

The diagnostic rule: compare rows (estimate) to actual rows (reality). A 10× gap is suspicious. A 1000× gap is why your query takes 8 minutes instead of 50ms.

The GPS metaphor

EXPLAIN is the GPS route preview before you drive. EXPLAIN ANALYZE is the dashcam from the actual trip. The preview says “this route is 12 minutes” — that is an estimate from map data. The dashcam says “the trip took 47 minutes because of traffic the map did not know about.” The gap between estimate and reality is exactly where you intervene: bad map data (stale statistics) gives bad route choices; bad sensors (no index for this condition) make the trip slower than the route promised.

A concrete scenario: finding the bottleneck

A team’s search query takes 200ms on staging, 8 seconds in production. EXPLAIN ANALYZE reveals:

Hash Join  (cost=2400..55000 rows=50000)
            (actual time=7800..8050 rows=42)
  Hash Batches: 64  Memory Usage: 2.1GB

Batches: 64 means the hash table spilled to disk 64 times — the result of work_mem being too small for the join’s actual size. Fix: SET work_mem = '64MB'. Same query, 220ms. One EXPLAIN, one config change.

Why this works

EXPLAIN ANALYZE actually runs the query. On a SELECT that is always safe. On an UPDATE or DELETE, wrap it in a transaction and roll back: BEGIN; EXPLAIN ANALYZE DELETE ...; ROLLBACK;. This runs the statement (so you see real timings) but undoes the change.

Order the steps

Order the steps a developer should take when a query is unexpectedly slow:

  1. 1 Reproduce the slowness with a representative query and parameters
  2. 2 Run EXPLAIN ANALYZE on it (in a transaction if it modifies data)
  3. 3 Identify the plan node taking most of the actual time
  4. 4 Compare rows estimated vs actual on each node — large gaps mean stale statistics
  5. 5 Decide on a fix: ANALYZE the table, add or modify an index, rewrite the query
  6. 6 Apply the fix and re-run EXPLAIN ANALYZE to confirm improvement
  7. 7 Add a regression test or assertion so the slow query does not return
Quiz

What is the difference between EXPLAIN and EXPLAIN ANALYZE?

Quiz

In EXPLAIN ANALYZE output you see `rows=10 ... actual rows=1240000`. What does this mean?

Complete the analogy

Fill in the blank: EXPLAIN is to a SQL query what a flight ________ is to a journey — the planned route, durations, and connections, printed before you actually take the trip.

Recall before you leave
  1. 01
    In two sentences, what is an execution plan and why does Postgres need one?
  2. 02
    What does 'actual rows' vs 'rows' mean in EXPLAIN ANALYZE output, and why does a large discrepancy matter?
  3. 03
    When should you NOT run EXPLAIN ANALYZE on a query, and how do you work around the restriction?
Recap

An execution plan is the tree of operations Postgres builds to answer a SQL query — picking among scan types, join algorithms, and aggregation strategies using cost estimates from table statistics. EXPLAIN prints the plan without running the query; EXPLAIN ANALYZE runs the query and adds real timings, real row counts, and real loop counts. The single most diagnostic number is the gap between rows (estimate) and actual rows (reality) at each node: a 1000× underestimate propagates upward and causes every join and sort decision above it to be wrong. Every slow-query investigation starts with EXPLAIN ANALYZE and the rows-estimated vs actual comparison.

Practice

Do these to turn recognition into skill.

Connected lessons
appears again in174
Continue the climb ↑Scan types: Seq, Index, Bitmap, Index-Only
shortcuts expand
search
K
prev piece
k
next piece
j
cycle tier
t
this menu
?
sources3
expand
  1. 01
  2. 02
  3. 03

Trademarks belong to their respective owners. Editorial reference only.