Databases DB · 06 · 03

The lock-queue failure mode: why instant DDL can freeze the database

A waiting ALTER TABLE blocks every subsequent SELECT on the table — a FIFO lock queue turns one slow query into a full-table outage.

DB Middle ◷ 12 min

Level

FoundationsJuniorMiddleSenior

A team runs ALTER TABLE users ADD COLUMN status TEXT DEFAULT 'active' at 14:00. On PG 16, this statement takes 8 ms. Yet the users table is unavailable for 90 seconds, connection pool exhausts, and the site returns 503. The migration itself was instant. So what happened? This lesson explains the exact mechanism — and the two-line fix that prevents it.

The Postgres lock matrix

Before you can reason about why migrations cause outages, you need to know which operations conflict with which. Every table operation takes a lock. There are eight lock modes from weakest to strongest:

ACCESS SHARE → ROW SHARE → ROW EXCLUSIVE → SHARE UPDATE EXCLUSIVE → SHARE → SHARE ROW EXCLUSIVE → EXCLUSIVE → ACCESS EXCLUSIVE

Key facts:

SELECT takes ACCESS SHARE (weakest).
Most DDL (ALTER TABLE, DROP TABLE, TRUNCATE) takes ACCESS EXCLUSIVE (strongest — conflicts with everything).
CREATE INDEX CONCURRENTLY and VALIDATE CONSTRAINT take SHARE UPDATE EXCLUSIVE — does not block normal DML.

The lock matrix decides what can run in parallel. ACCESS EXCLUSIVE conflicts with every other lock mode, including ACCESS SHARE. That means any DDL with ACCESS EXCLUSIVE must wait for all current readers to finish — and blocks all new readers while it waits.

The FIFO lock queue — the actual failure mode

Postgres grants locks in arrival order (FIFO per object). This creates the canonical migration incident:

Time	Event	Lock state on users
T=0s	Slow analytics SELECT starts (expected: 60s)	ACCESS SHARE granted
T=10s	Migration runs ALTER TABLE (default lock_timeout = 0)	ACCESS EXCLUSIVE waiting — conflicts with held ACCESS SHARE
T=10s+	New SELECTs arrive (API requests, background jobs)	ACCESS SHARE waiting — queued BEHIND the ALTER’s ACCESS EXCLUSIVE request
T=70s	Slow SELECT finishes	ALTER acquires lock, runs in 8 ms, commits. Queue drains.
Total impact	60s of no-new-queries on users. Connection pool exhausted. 503s.	—

The critical insight: new SELECTs queue behind the waiting ALTER because granting them out of order would starve the ALTER. The ALTER’s ACCESS EXCLUSIVE request is in the queue; the lock manager refuses to grant lighter locks to later arrivals that conflict with it. The table is frozen for every new query for the full 60 seconds of the slow SELECT — even though the migration itself runs in 8 ms.

The fix: SET lock_timeout

-- At the top of every migration session:
SET lock_timeout = '2s';
SET statement_timeout = '30s';

With lock_timeout = '2s':

The migration waits at most 2 seconds for the lock.
If not acquired, the migration aborts cleanly and releases its position in the queue.
New SELECTs immediately get their lighter locks — table is unfrozen.
The migration tool retries with exponential backoff (1s, 2s, 4s, 8s…).
Eventually the slow SELECT finishes or a quiet window opens, and the migration succeeds.

Without lock_timeout (Postgres default is 0 — wait forever): one slow query can freeze a table for its entire duration.

The migration is 8 ms either way; the freeze is set by how long the ALTER waits in the queue. lock_timeout = 2s turns a 60-second table outage into a 2-second blip, then the tool retries.

Lock-queue incident numbers

Postgres default lock_timeout: 0 (wait forever)
Recommended lock_timeout for migrations: 1–5 s
Recommended statement_timeout for DDL: 30–60 s
Migration retry attempts: 3–5 with exponential backoff
Table-freeze duration without lock_timeout: = duration of blocking query
Table-freeze with lock_timeout = 2s: 2 s, then queue clears

▸Why this works

Why doesn’t Postgres let lighter locks skip ahead of a waiting ACCESS EXCLUSIVE? Because doing so could starve the DDL statement indefinitely — on a busy table, ACCESS SHARE locks arrive continuously and would never all be released simultaneously. FIFO ensures the DDL eventually gets its turn. The operational consequence is that you must limit how long it waits.

Quiz

A migration runs ALTER TABLE on a busy table. A long SELECT is in progress. What happens to new SELECTs that arrive?

Quiz

You set `SET lock_timeout = '2s'` on the migration session. The migration cannot acquire the lock within 2 seconds. What happens?

Quiz

Which lock does CREATE INDEX CONCURRENTLY take, and why is it safer for production?

The ALTER cannot jump the queue and new SELECTs cannot jump the ALTER: one slow query plus one instant migration freezes the table for the slow query's full duration. lock_timeout aborts the ALTER and unblocks the queue.

Recall before you leave

01
Walk through exactly why an instant ALTER TABLE can freeze a production table for 60+ seconds.
02
Explain the exact role of lock_timeout in preventing the lock-queue freeze, and what the migration tool does after a timeout.
03
What is the difference between lock_timeout and statement_timeout, and why does a migration session need both?

Recap

The Postgres lock matrix assigns ACCESS EXCLUSIVE to most DDL and ACCESS SHARE to SELECT. These conflict, so a waiting ALTER TABLE blocks all new SELECT statements on the same table via the FIFO lock queue — even if the migration itself would run in milliseconds. On a busy table with a slow analytical query in flight, this can freeze the table for the entire duration of that query. The mandatory fix is SET lock_timeout = '2s' on every migration session: if the lock is not available within 2 seconds, the migration aborts cleanly, the queue unblocks, and the migration tool retries with backoff. CREATE INDEX CONCURRENTLY and VALIDATE CONSTRAINT take SHARE UPDATE EXCLUSIVE instead of ACCESS EXCLUSIVE and are therefore DML-safe for production. Now when you see a migration stuck in CI or causing a table freeze on-call, the first thing to check is whether lock_timeout was set — and whether a long-running query was holding the table.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 5 done

Connected lessons

builds on

ADD COLUMN: instant in PG 11+ vs rewrite in older Postgresjunior

unlocks

deepens into

appears again in287

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.