Databases DB · 03 · 01

EXPLAIN and execution plans: what the planner decides and why

An execution plan is the tree of operations Postgres picks for a query. EXPLAIN shows the plan; EXPLAIN ANALYZE runs it and shows real timings. Rows-estimated vs rows-actual is the most diagnostic number in the output.

DB Junior ◷ 12 min

Level

FoundationsJuniorMiddleSenior

Already know this unit? Take a 1-minute quick check →

A dashboard query takes 6 seconds in production. A colleague asks: “did you EXPLAIN it?” You run EXPLAIN ANALYZE and in two seconds you see a Seq Scan on orders ... actual rows=80000000. One index. 200ms. That is what EXPLAIN is for. In the next ten minutes you will see exactly how to read that output and why the gap between estimated and actual rows is everything.

What an execution plan is

SQL is declarative — you write which rows you want, not how to get them. The Postgres planner bridges that gap. It reads your SQL, consults pg_statistic to estimate how many rows each condition will produce, enumerates possible combinations of scan types and join algorithms, costs each option, and picks the cheapest one. The result is a plan — a tree of operators with row estimates attached.

A query goes through four stages before any row is returned:

Parse — text is converted to a parse tree
Rewrite — views and rules are applied
Plan — the planner builds the cheapest execution tree
Execute — the engine runs the tree

The planner is the only stage you tune. Everything else is automatic. Together these four stages mean that every SQL query is a two-part contract: you declare what, the planner decides how — and EXPLAIN is how you audit that decision before it costs you 8 minutes.

Command	Runs the query?	Shows
`EXPLAIN`	No	Planner’s estimate: costs, row counts, plan shape
`EXPLAIN ANALYZE`	Yes	Estimates + actual timings, actual rows, loops
`EXPLAIN (ANALYZE, BUFFERS)`	Yes	All the above + page cache hit/read counts

Reading the output

A simple plan looks like this:

Index Scan using idx_orders_workspace on orders
  (cost=0.43..14.2 rows=42 width=120)
  (actual time=0.08..1.1 rows=42 loops=1)
  Index Cond: (workspace_id = 42)

cost=0.43..14.2 — startup cost (first row) and total cost in arbitrary units. Not milliseconds. Ratios matter; absolute values do not.
rows=42 — planner’s estimate of rows this node will emit
actual rows=42 — rows actually emitted when the query ran
loops=1 — how many times this node executed (important inside joins)

The diagnostic rule: compare rows (estimate) to actual rows (reality). A 10× gap is suspicious. A 1000× gap is why your query takes 8 minutes instead of 50ms.

Same query, three estimate qualities. The error in rows is not linear in latency: a 1000x underestimate doesn't make the query 1000x slower — it makes it ~10,000x slower, because a bad estimate flips the planner to the wrong scan and join. That non-linearity is why the rows gap is the first thing you read.

The GPS metaphor

EXPLAIN is the GPS route preview before you drive. EXPLAIN ANALYZE is the dashcam from the actual trip. The preview says “this route is 12 minutes” — that is an estimate from map data. The dashcam says “the trip took 47 minutes because of traffic the map did not know about.” The gap between estimate and reality is exactly where you intervene: bad map data (stale statistics) gives bad route choices; bad sensors (no index for this condition) make the trip slower than the route promised.

A concrete scenario: finding the bottleneck

A team’s search query takes 200ms on staging, 8 seconds in production. EXPLAIN ANALYZE reveals:

Hash Join  (cost=2400..55000 rows=50000)
            (actual time=7800..8050 rows=42)
  Hash Batches: 64  Memory Usage: 2.1GB

Batches: 64 means the hash table spilled to disk 64 times — the result of work_mem being too small for the join’s actual size. Fix: SET work_mem = '64MB'. Same query, 220ms. One EXPLAIN, one config change.

▸Why this works

EXPLAIN ANALYZE actually runs the query. On a SELECT that is always safe. On an UPDATE or DELETE, wrap it in a transaction and roll back: BEGIN; EXPLAIN ANALYZE DELETE ...; ROLLBACK;. This runs the statement (so you see real timings) but undoes the change.

Order the steps

Order the steps a developer should take when a query is unexpectedly slow:

1 Reproduce the slowness with a representative query and parameters
2 Run EXPLAIN ANALYZE on it (in a transaction if it modifies data)
3 Identify the plan node taking most of the actual time
4 Compare rows estimated vs actual on each node — large gaps mean stale statistics
5 Decide on a fix: ANALYZE the table, add or modify an index, rewrite the query
6 Apply the fix and re-run EXPLAIN ANALYZE to confirm improvement
7 Add a regression test or assertion so the slow query does not return

Quiz

What is the difference between EXPLAIN and EXPLAIN ANALYZE?

Quiz

In EXPLAIN ANALYZE output you see `rows=10 ... actual rows=1240000`. What does this mean?

Complete the analogy

Fill in the blank: EXPLAIN is to a SQL query what a flight ________ is to a journey — the planned route, durations, and connections, printed before you actually take the trip.

Only the Plan stage is tunable — the planner costs candidate trees from statistics and picks the cheapest. EXPLAIN prints that tree; EXPLAIN ANALYZE also runs Execute and reports real timings.

Recall before you leave

01
In two sentences, what is an execution plan and why does Postgres need one?
02
What does 'actual rows' vs 'rows' mean in EXPLAIN ANALYZE output, and why does a large discrepancy matter?
03
When should you NOT run EXPLAIN ANALYZE on a query, and how do you work around the restriction?

Recap

An execution plan is the tree of operations Postgres builds to answer a SQL query — picking among scan types, join algorithms, and aggregation strategies using cost estimates from table statistics. EXPLAIN prints the plan without running the query; EXPLAIN ANALYZE runs the query and adds real timings, real row counts, and real loop counts. The single most diagnostic number is the gap between rows (estimate) and actual rows (reality) at each node: a 1000× underestimate propagates upward and causes every join and sort decision above it to be wrong. Now when you see a slow query, your first move is EXPLAIN ANALYZE — scan the plan for the node where the estimate diverges from reality, and that is where you fix.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 5 done

Connected lessons

builds on

What an index is and how it speeds up queriesjunior

unlocks

deepens into

appears again in177

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.

Apply this

Put this lesson to work on a real build.

Query plan visualizerPaste an EXPLAIN (ANALYZE, FORMAT JSON) and render the plan tree with per-node timing and row-estimate error, so a bad join jumps out visually.Crash-safe key-value store with a WALBuild a tiny on-disk KV store that survives a kill -9 mid-write by appending to a write-ahead log before touching the main file.