awesome-everything RU
↑ Back to the climb

Databases

The lock-queue failure mode: why instant DDL can freeze the database

Crux A waiting ALTER TABLE blocks every subsequent SELECT on the table — a FIFO lock queue turns one slow query into a full-table outage.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at middle altitude — in the sky
◷ 12 min

A team runs ALTER TABLE users ADD COLUMN status TEXT DEFAULT 'active' at 14:00. On PG 16, this statement takes 8 ms. Yet the users table is unavailable for 90 seconds, connection pool exhausts, and the site returns 503. The migration itself was instant. So what happened?

The Postgres lock matrix

Every table operation takes a lock. There are eight lock modes from weakest to strongest:

ACCESS SHAREROW SHAREROW EXCLUSIVESHARE UPDATE EXCLUSIVESHARESHARE ROW EXCLUSIVEEXCLUSIVEACCESS EXCLUSIVE

Key facts:

  • SELECT takes ACCESS SHARE (weakest).
  • Most DDL (ALTER TABLE, DROP TABLE, TRUNCATE) takes ACCESS EXCLUSIVE (strongest — conflicts with everything).
  • CREATE INDEX CONCURRENTLY and VALIDATE CONSTRAINT take SHARE UPDATE EXCLUSIVE — does not block normal DML.

The lock matrix decides what can run in parallel. ACCESS EXCLUSIVE conflicts with every other lock mode, including ACCESS SHARE. That means any DDL with ACCESS EXCLUSIVE must wait for all current readers to finish — and blocks all new readers while it waits.

The FIFO lock queue — the actual failure mode

Postgres grants locks in arrival order (FIFO per object). This creates the canonical migration incident:

TimeEventLock state on users
T=0sSlow analytics SELECT starts (expected: 60s)ACCESS SHARE granted
T=10sMigration runs ALTER TABLE (default lock_timeout = 0)ACCESS EXCLUSIVE waiting — conflicts with held ACCESS SHARE
T=10s+New SELECTs arrive (API requests, background jobs)ACCESS SHARE waiting — queued BEHIND the ALTER’s ACCESS EXCLUSIVE request
T=70sSlow SELECT finishesALTER acquires lock, runs in 8 ms, commits. Queue drains.
Total impact60s of no-new-queries on users. Connection pool exhausted. 503s.

The critical insight: new SELECTs queue behind the waiting ALTER because granting them out of order would starve the ALTER. The ALTER’s ACCESS EXCLUSIVE request is in the queue; the lock manager refuses to grant lighter locks to later arrivals that conflict with it. The table is frozen for every new query for the full 60 seconds of the slow SELECT — even though the migration itself runs in 8 ms.

The fix: SET lock_timeout

-- At the top of every migration session:
SET lock_timeout = '2s';
SET statement_timeout = '30s';

With lock_timeout = '2s':

  • The migration waits at most 2 seconds for the lock.
  • If not acquired, the migration aborts cleanly and releases its position in the queue.
  • New SELECTs immediately get their lighter locks — table is unfrozen.
  • The migration tool retries with exponential backoff (1s, 2s, 4s, 8s…).
  • Eventually the slow SELECT finishes or a quiet window opens, and the migration succeeds.

Without lock_timeout (Postgres default is 0 — wait forever): one slow query can freeze a table for its entire duration.

Lock-queue incident numbers
Postgres default lock_timeout
0 (wait forever)
Recommended lock_timeout for migrations
1–5 s
Recommended statement_timeout for DDL
30–60 s
Migration retry attempts
3–5 with exponential backoff
Table-freeze duration without lock_timeout
= duration of blocking query
Table-freeze with lock_timeout = 2s
2 s, then queue clears
Why this works

Why doesn’t Postgres let lighter locks skip ahead of a waiting ACCESS EXCLUSIVE? Because doing so could starve the DDL statement indefinitely — on a busy table, ACCESS SHARE locks arrive continuously and would never all be released simultaneously. FIFO ensures the DDL eventually gets its turn. The operational consequence is that you must limit how long it waits.

Quiz

A migration runs ALTER TABLE on a busy table. A long SELECT is in progress. What happens to new SELECTs that arrive?

Quiz

You set `SET lock_timeout = '2s'` on the migration session. The migration cannot acquire the lock within 2 seconds. What happens?

Quiz

Which lock does CREATE INDEX CONCURRENTLY take, and why is it safer for production?

Recall before you leave
  1. 01
    Walk through exactly why an instant ALTER TABLE can freeze a production table for 60+ seconds.
  2. 02
    Explain the exact role of lock_timeout in preventing the lock-queue freeze, and what the migration tool does after a timeout.
  3. 03
    What is the difference between lock_timeout and statement_timeout, and why does a migration session need both?
Recap

The Postgres lock matrix assigns ACCESS EXCLUSIVE to most DDL and ACCESS SHARE to SELECT. These conflict, so a waiting ALTER TABLE blocks all new SELECT statements on the same table via the FIFO lock queue — even if the migration itself would run in milliseconds. On a busy table with a slow analytical query in flight, this can freeze the table for the entire duration of that query. The mandatory fix is SET lock_timeout = '2s' on every migration session: if the lock is not available within 2 seconds, the migration aborts cleanly, the queue unblocks, and the migration tool retries with backoff. CREATE INDEX CONCURRENTLY and VALIDATE CONSTRAINT take SHARE UPDATE EXCLUSIVE instead of ACCESS EXCLUSIVE and are therefore DML-safe for production.

Connected lessons
appears again in258
Continue the climb ↑Safe DDL patterns: NOT VALID, CONCURRENTLY, and unsafe-op fixes
shortcuts expand
search
K
prev piece
k
next piece
j
cycle tier
t
this menu
?
sources3
expand
  1. 01
  2. 02
  3. 03

Trademarks belong to their respective owners. Editorial reference only.