Databases DB · 03 · 06

Plan cache, cost-constant tuning, and planner internals

Prepared statements switch to a generic plan after 5 executions — a trap for skewed parameters. SSD systems need random_page_cost = 1.1. GEQO handles large joins. JIT helps CPU-bound queries. Each mechanism has a specific failure mode and fix.

DB Senior ◷ 18 min

Level

FoundationsJuniorMiddleSenior

A parameterised query has been fast for months. One day it starts showing P99 latency of 4 seconds. Mean is still 3ms. The same query, different parameters — the distribution is skewed. After 5 executions Postgres switched to a generic plan that does not know the parameter values. One ALTER ROLE line fixes it.

The planner’s internal structure

Before you can reason about why the planner picked a surprising plan for a complex query, you need to understand what it actually compares — paths, not plans.

For each query the planner builds paths — lightweight, partial plans for portions of the query (single-table scans, joins of small relation sets). Paths are compared by cost; only the cheapest path per “relation set + sort order” is retained (the rest are pruned). For multi-relation joins, it considers all combinations using dynamic programming — exhaustively up to geqo_threshold (default 12) relations, then switches to the Genetic Query Optimizer (heuristic search) above that threshold.

The senior mental model: when you see a surprising plan, ask “what other paths did the planner consider, and why were they pruned?” The answer is almost always a cost difference driven by row estimates.

Generic vs custom plans (prepared statements)

Every parameterised query from a driver ($1, $2, …) and every PL/pgSQL function with parameters goes through plan caching. The lifecycle:

First 5 executions: Postgres plans each with the actual parameter values — custom plans, optimal for that specific parameter.
After 5th execution: Postgres computes the average cost of the 5 custom plans and compares it to the cost of a generic plan (parameter-agnostic).
If generic cost is within ~10% of the average custom cost: Postgres switches permanently to the generic plan for the lifetime of the prepared statement.

All three steps together mean the switch is automatic, invisible, and permanent — until the connection closes or you intervene. The switch is invisible to the application. For uniformly distributed parameters, the generic plan is usually fine. For skewed distributions (workspace_id where some workspaces have 8% of rows and others have 0.01%), the generic plan optimises for a “typical” parameter and is catastrophically wrong for outliers.

The generic-plan trap does not raise the mean — it splits latency into two modes. The 3.84ms mean hides a 4,290ms tail, which is why stddev_exec_time, not mean, is the diagnostic signal.

Symptoms: high stddev_exec_time in pg_stat_statements (bimodal latency — fast for common parameters, slow for outliers).

Diagnosis (PG 16+): EXPLAIN (GENERIC_PLAN) on the prepared form — shows exactly what plan all executions after the switch get.

Fix:

ALTER ROLE app SET plan_cache_mode = 'force_custom_plan';

Every execution replans with the actual parameter. Planning cost: ~0.4–2ms per execution — negligible when the alternative is 4-second tail latency.

Debug this

A prepared statement that was fast is now slow — diagnose

log

# pg_stat_statements row for the offending query:
query: "SELECT * FROM orders WHERE workspace_id = $1 AND status = $2 ORDER BY created_at DESC LIMIT 50"
calls: 482,910
total_exec_time: 1,852,400 ms
mean_exec_time: 3.84 ms
stddev_exec_time: 412.6 ms   # HUGE stddev -- symptom of plan switching
min_exec_time: 0.8 ms
max_exec_time: 4,290 ms

# EXPLAIN (ANALYZE, BUFFERS) on the SAME query with workspace_id=42 (high-volume):
Limit (cost=0..14.2 rows=50) (actual time=2.1..2.3 rows=50 loops=1)
-> Index Scan using idx_orders_workspace_status_created on orders
     (cost=0..14000 rows=50000) (actual time=2.1..2.3 rows=50 loops=1)
     Index Cond: ((workspace_id=42) AND (status='pending'::text))
     Buffers: shared hit=12
Planning Time: 0.4 ms
Execution Time: 2.4 ms

# EXPLAIN (GENERIC_PLAN, BUFFERS) on the prepared form (PG 16+):
Limit (cost=0.43..28.5 rows=50)
-> Index Scan using idx_orders_created on orders
     (cost=0.43..15000.0 rows=27000)
     Filter: ((workspace_id = $1) AND (status = $2))
Planning Time: 0.2 ms

# Context:
# workspace_id distribution: workspace 1 has 0.01% of rows; workspace 42 has 8%.
# status distribution: 'pending' 15%, 'shipped' 80%, 'cancelled' 5%.

Why does the same prepared statement show 3.84ms mean but 412ms stddev? What is the immediate fix?

SSD cost-constant tuning

Postgres ships with random_page_cost = 4.0 and seq_page_cost = 1.0 — calibrated for spinning HDDs where random reads are 4× slower than sequential. On NVMe SSDs the ratio is 1.5–2×; on working-set-fits-in-RAM systems it approaches 1.0.

-- In postgresql.conf for SSD-backed systems:
random_page_cost = 1.1
seq_page_cost    = 1.0

-- Tell the planner how much data is cached by the OS:
effective_cache_size = '24GB'  -- ~75% of total RAM

Effect: the planner becomes more willing to use Index Scan and Index Only Scan over Seq Scan — which is the right bias for modern hardware. This is the single highest-leverage planner setting on any SSD-backed Postgres.

effective_cache_size is not a memory allocation — it is a hint telling the planner how much OS page-cache + shared_buffers is available. Setting it correctly makes the cost difference between cached index pages and uncached heap pages more accurate.

GEQO and large joins

For queries joining more than geqo_threshold (default 12) relations, the planner switches from exhaustive dynamic programming to the Genetic Query Optimizer — a heuristic that is fast but not guaranteed optimal. The heuristic is randomised — same query can produce different plans on different runs.

For analytic queries with 15–30 joined tables, GEQO can produce inconsistent plans. Workarounds:

-- Force exhaustive search (slower planning, stable plans):
SET geqo = off;

-- Lock the JOIN order to the SQL's explicit order:
SET from_collapse_limit = 1;
SET join_collapse_limit = 1;

The from_collapse_limit = 1 trick is useful when you have hand-tuned a complex query’s JOIN order and want it to stay tuned across data changes.

JIT compilation

PG 11+ supports JIT compilation of expression evaluation (filter predicates, projections, aggregation). JIT kicks in when a plan’s total cost exceeds jit_above_cost (default 100,000).

JIT helps CPU-bound queries on big scans (millions of rows, complex expressions) by 10–30%. For short OLTP queries, it adds overhead — compilation time exceeds saved execution time.

-- For OLTP workloads where JIT hurts more than it helps:
SET jit_above_cost = 500000;
-- Or disable entirely for OLTP roles:
SET jit = off;

Use EXPLAIN (ANALYZE, JIT) to see JIT timings per phase and decide whether to tune the thresholds.

Plan cache and planner tuning numbers

Custom plans before generic kicks in: 5
Generic plan cost tolerance vs custom avg: ~10%
geqo_threshold (default): 12 relations
jit_above_cost (default): 100,000
jit_inline_above_cost (default): 500,000
from_collapse_limit / join_collapse_limit: 8 / 8
Planning time per execution (typical OLTP): 0.4-2 ms
random_page_cost (SSD-tuned): 1.1
effective_cache_size recommendation: ~75% of total RAM

Quiz

A prepared statement has been running fast for 5 executions. On the 6th execution it becomes slow. What is the most likely cause?

Quiz

Which PostgreSQL 16 feature lets you inspect what plan all prepared-statement executions after the 6th will receive, without actually running 5 executions?

Quiz

You set random_page_cost = 1.1 on an SSD-backed Postgres. What happens to plans for queries with selective predicates?

First 5 executions replan with real parameters (custom plans); after the 5th, a cached generic plan can be reused, skipping the planner.

Recall before you leave

01
Walk the full lifecycle of how Postgres picks a plan for a parameterised query, from first execution to the generic-plan switch.
02
Why is effective_cache_size important for the planner even though it does not allocate any memory?

Recap

Postgres plan caching has a specific failure mode: after 5 executions of a prepared statement, Postgres compares the average custom-plan cost to a generic plan and may switch permanently to the generic plan. For skewed parameter distributions, the generic plan is optimal for a “typical” parameter and catastrophically slow for outliers — producing bimodal latency (low mean, high stddev). Diagnose with stddev_exec_time in pg_stat_statements and EXPLAIN (GENERIC_PLAN) (PG 16+). Fix with plan_cache_mode = 'force_custom_plan' for the role. For SSD-backed systems, set random_page_cost = 1.1 — the planner will correctly prefer Index Scan over Seq Scan for selective predicates. Now when you see a prepared statement that is fast most of the time but has wild tail latency, look at stddev_exec_time before anything else — that asymmetry is the generic-plan trap’s signature.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 5 done

Connected lessons

builds on

Extended statistics: fixing correlated-column estimate failuressenior

unlocks

Production failure modes and plan stabilitysenior

appears again in177

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.

Apply this

Put this lesson to work on a real build.

Query plan visualizerPaste an EXPLAIN (ANALYZE, FORMAT JSON) and render the plan tree with per-node timing and row-estimate error, so a bad join jumps out visually.