awesome-everything RU
↑ Back to the climb

Databases

Plan cache, cost-constant tuning, and planner internals

Crux Prepared statements switch to a generic plan after 5 executions — a trap for skewed parameters. SSD systems need random_page_cost = 1.1. GEQO handles large joins. JIT helps CPU-bound queries. Each mechanism has a specific failure mode and fix.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at senior altitude — in orbit
◷ 18 min

A parameterised query has been fast for months. One day it starts showing P99 latency of 4 seconds. Mean is still 3ms. The same query, different parameters — the distribution is skewed. After 5 executions Postgres switched to a generic plan that does not know the parameter values. One ALTER ROLE line fixes it.

The planner’s internal structure

For each query the planner builds paths — lightweight, partial plans for portions of the query (single-table scans, joins of small relation sets). Paths are compared by cost; only the cheapest path per “relation set + sort order” is retained (the rest are pruned). For multi-relation joins, it considers all combinations using dynamic programming — exhaustively up to geqo_threshold (default 12) relations, then switches to the Genetic Query Optimizer (heuristic search) above that threshold.

The senior mental model: when you see a surprising plan, ask “what other paths did the planner consider, and why were they pruned?” The answer is almost always a cost difference driven by row estimates.

Generic vs custom plans (prepared statements)

Every parameterised query from a driver ($1, $2, …) and every PL/pgSQL function with parameters goes through plan caching. The lifecycle:

  1. First 5 executions: Postgres plans each with the actual parameter values — custom plans, optimal for that specific parameter.
  2. After 5th execution: Postgres computes the average cost of the 5 custom plans and compares it to the cost of a generic plan (parameter-agnostic).
  3. If generic cost is within ~10% of the average custom cost: Postgres switches permanently to the generic plan for the lifetime of the prepared statement.

The switch is invisible to the application. For uniformly distributed parameters, the generic plan is usually fine. For skewed distributions (workspace_id where some workspaces have 8% of rows and others have 0.01%), the generic plan optimises for a “typical” parameter and is catastrophically wrong for outliers.

Symptoms: high stddev_exec_time in pg_stat_statements (bimodal latency — fast for common parameters, slow for outliers).

Diagnosis (PG 16+): EXPLAIN (GENERIC_PLAN) on the prepared form — shows exactly what plan all executions after the switch get.

Fix:

ALTER ROLE app SET plan_cache_mode = 'force_custom_plan';

Every execution replans with the actual parameter. Planning cost: ~0.4–2ms per execution — negligible when the alternative is 4-second tail latency.

Debug this

A prepared statement that was fast is now slow — diagnose

log
# pg_stat_statements row for the offending query:
query: "SELECT * FROM orders WHERE workspace_id = $1 AND status = $2 ORDER BY created_at DESC LIMIT 50"
calls: 482,910
total_exec_time: 1,852,400 ms
mean_exec_time: 3.84 ms
stddev_exec_time: 412.6 ms   # HUGE stddev -- symptom of plan switching
min_exec_time: 0.8 ms
max_exec_time: 4,290 ms

# EXPLAIN (ANALYZE, BUFFERS) on the SAME query with workspace_id=42 (high-volume):
Limit (cost=0..14.2 rows=50) (actual time=2.1..2.3 rows=50 loops=1)
-> Index Scan using idx_orders_workspace_status_created on orders
     (cost=0..14000 rows=50000) (actual time=2.1..2.3 rows=50 loops=1)
     Index Cond: ((workspace_id=42) AND (status='pending'::text))
     Buffers: shared hit=12
Planning Time: 0.4 ms
Execution Time: 2.4 ms

# EXPLAIN (GENERIC_PLAN, BUFFERS) on the prepared form (PG 16+):
Limit (cost=0.43..28.5 rows=50)
-> Index Scan using idx_orders_created on orders
     (cost=0.43..15000.0 rows=27000)
     Filter: ((workspace_id = $1) AND (status = $2))
Planning Time: 0.2 ms

# Context:
# workspace_id distribution: workspace 1 has 0.01% of rows; workspace 42 has 8%.
# status distribution: 'pending' 15%, 'shipped' 80%, 'cancelled' 5%.

Why does the same prepared statement show 3.84ms mean but 412ms stddev? What is the immediate fix?

SSD cost-constant tuning

Postgres ships with random_page_cost = 4.0 and seq_page_cost = 1.0 — calibrated for spinning HDDs where random reads are 4× slower than sequential. On NVMe SSDs the ratio is 1.5–2×; on working-set-fits-in-RAM systems it approaches 1.0.

-- In postgresql.conf for SSD-backed systems:
random_page_cost = 1.1
seq_page_cost    = 1.0

-- Tell the planner how much data is cached by the OS:
effective_cache_size = '24GB'  -- ~75% of total RAM

Effect: the planner becomes more willing to use Index Scan and Index Only Scan over Seq Scan — which is the right bias for modern hardware. This is the single highest-leverage planner setting on any SSD-backed Postgres.

effective_cache_size is not a memory allocation — it is a hint telling the planner how much OS page-cache + shared_buffers is available. Setting it correctly makes the cost difference between cached index pages and uncached heap pages more accurate.

GEQO and large joins

For queries joining more than geqo_threshold (default 12) relations, the planner switches from exhaustive dynamic programming to the Genetic Query Optimizer — a heuristic that is fast but not guaranteed optimal. The heuristic is randomised — same query can produce different plans on different runs.

For analytic queries with 15–30 joined tables, GEQO can produce inconsistent plans. Workarounds:

-- Force exhaustive search (slower planning, stable plans):
SET geqo = off;

-- Lock the JOIN order to the SQL's explicit order:
SET from_collapse_limit = 1;
SET join_collapse_limit = 1;

The from_collapse_limit = 1 trick is useful when you have hand-tuned a complex query’s JOIN order and want it to stay tuned across data changes.

JIT compilation

PG 11+ supports JIT compilation of expression evaluation (filter predicates, projections, aggregation). JIT kicks in when a plan’s total cost exceeds jit_above_cost (default 100,000).

JIT helps CPU-bound queries on big scans (millions of rows, complex expressions) by 10–30%. For short OLTP queries, it adds overhead — compilation time exceeds saved execution time.

-- For OLTP workloads where JIT hurts more than it helps:
SET jit_above_cost = 500000;
-- Or disable entirely for OLTP roles:
SET jit = off;

Use EXPLAIN (ANALYZE, JIT) to see JIT timings per phase and decide whether to tune the thresholds.

Plan cache and planner tuning numbers
Custom plans before generic kicks in
5
Generic plan cost tolerance vs custom avg
~10%
geqo_threshold (default)
12 relations
jit_above_cost (default)
100,000
jit_inline_above_cost (default)
500,000
from_collapse_limit / join_collapse_limit
8 / 8
Planning time per execution (typical OLTP)
0.4-2 ms
random_page_cost (SSD-tuned)
1.1
effective_cache_size recommendation
~75% of total RAM
Quiz

A prepared statement has been running fast for 5 executions. On the 6th execution it becomes slow. What is the most likely cause?

Quiz

Which PostgreSQL 16 feature lets you inspect what plan all prepared-statement executions after the 6th will receive, without actually running 5 executions?

Quiz

You set random_page_cost = 1.1 on an SSD-backed Postgres. What happens to plans for queries with selective predicates?

Recall before you leave
  1. 01
    Walk the full lifecycle of how Postgres picks a plan for a parameterised query, from first execution to the generic-plan switch.
  2. 02
    Why is effective_cache_size important for the planner even though it does not allocate any memory?
Recap

Postgres plan caching has a specific failure mode: after 5 executions of a prepared statement, Postgres compares the average custom-plan cost to a generic plan and may switch permanently to the generic plan. For skewed parameter distributions, the generic plan is optimal for a “typical” parameter and catastrophically slow for outliers — producing bimodal latency (low mean, high stddev). Diagnose with stddev_exec_time in pg_stat_statements and EXPLAIN (GENERIC_PLAN) (PG 16+). Fix with plan_cache_mode = 'force_custom_plan' for the role. For SSD-backed systems, set random_page_cost = 1.1 — the planner will correctly prefer Index Scan over Seq Scan for selective predicates. Set effective_cache_size to ~75% of total RAM. Disable or raise JIT thresholds for OLTP workloads. Use join_collapse_limit = 1 to pin the explicit JOIN order for complex analytic queries that GEQO might destabilise.

Practice

Do these to turn recognition into skill.

Connected lessons
appears again in174
Continue the climb ↑Production failure modes and plan stability
shortcuts expand
search
K
prev piece
k
next piece
j
cycle tier
t
this menu
?
sources3
expand
  1. 01
  2. 02
  3. 03

Trademarks belong to their respective owners. Editorial reference only.