Databases DB · 06 · 04

Safe DDL patterns: NOT VALID, CONCURRENTLY, and unsafe-op fixes

Every unsafe DDL has a safe multi-step equivalent; the discipline is to know the mapping and use it instead of the textbook one-liner.

DB Middle ◷ 14 min

Level

FoundationsJuniorMiddleSenior

A team runs CREATE INDEX ON orders(user_id) at 10:00 AM. On a 200M-row table this holds ACCESS EXCLUSIVE for 20 minutes — reads and writes blocked. The same index built with CREATE INDEX CONCURRENTLY takes 45 minutes but blocks nothing. Same index, same data, wildly different production impact.

CREATE INDEX CONCURRENTLY

Regular CREATE INDEX takes ACCESS EXCLUSIVE for the full build duration. On large tables this blocks reads and writes for minutes.

CREATE INDEX CONCURRENTLY builds the index in two phases, both under SHARE UPDATE EXCLUSIVE (does not block DML):

Phase 1: scan the table and build a draft index, while normal traffic continues.
Phase 2: catch up on writes that happened during phase 1; mark index valid.

Trade-offs:

Takes 2–3× longer than non-concurrent.
Cannot run inside a transaction block — if your migration tool wraps migrations in BEGIN..COMMIT, you need to annotate this migration as non-transactional.
Can fail mid-build and leave an INVALID index (visible in pg_indexes where indisvalid = false).

CONCURRENTLY trades 2.25x the wall-clock build time for zero DML downtime — the longer column is the safe one.

Production runbook for CONCURRENT index:

Always use CREATE INDEX CONCURRENTLY in production.
After the migration, verify: SELECT indexname, indisvalid FROM pg_indexes JOIN pg_class ... WHERE indisvalid = false — should return nothing.
If you find an INVALID index: DROP INDEX CONCURRENTLY (also non-blocking), then retry.

Operation	Lock	Blocks DML	In transaction?
`CREATE INDEX`	ACCESS EXCLUSIVE	Yes	Yes
`CREATE INDEX CONCURRENTLY`	SHARE UPDATE EXCLUSIVE	No	No — must run outside BEGIN
`DROP INDEX CONCURRENTLY`	SHARE UPDATE EXCLUSIVE	No	No

ADD CONSTRAINT NOT VALID + VALIDATE CONSTRAINT

Why does adding a constraint block production? Because by default Postgres must prove that every existing row satisfies it — scanning the full table while holding an exclusive lock. The two-step pattern breaks that into an instant enforcement for new rows and a DML-safe scan for existing ones. Adding a CHECK or FOREIGN KEY constraint normally acquires ACCESS EXCLUSIVE and scans the entire table.

The two-step pattern:

-- Step 1: add constraint, enforce only for NEW rows (instant, brief ACCESS EXCLUSIVE)
ALTER TABLE orders
  ADD CONSTRAINT orders_status_valid
  CHECK (status IN ('pending', 'active', 'closed')) NOT VALID;

-- Step 2: verify existing rows (SHARE UPDATE EXCLUSIVE — does not block DML)
ALTER TABLE orders VALIDATE CONSTRAINT orders_status_valid;

NOT VALID means: enforce for new rows from this moment; existing rows are not checked. VALIDATE CONSTRAINT then scans all rows under SHARE UPDATE EXCLUSIVE, which does not block INSERT, UPDATE, or DELETE. The scan can run while traffic is live.

This pattern applies to:

CHECK constraints
FOREIGN KEY constraints: ADD CONSTRAINT ... FOREIGN KEY ... NOT VALID + VALIDATE CONSTRAINT
NOT NULL via CHECK (col IS NOT NULL) NOT VALID (see lesson 02)

Unsafe operations and their safe equivalents

Unsafe DDL → safe replacement

ADD COLUMN volatile DEFAULT → ADD no default + backfill + SET DEFAULT: Fix
ALTER COLUMN TYPE (non-coercible) → add col + dual-write + backfill + swap + drop: Fix
ADD NOT NULL on existing nullable → NOT VALID + VALIDATE + SET NOT NULL: Fix
ADD FOREIGN KEY → ADD CONSTRAINT NOT VALID + VALIDATE CONSTRAINT: Fix
CREATE INDEX → CREATE INDEX CONCURRENTLY: Fix
DROP COLUMN (code still reading) → deploy code first, drain, then DROP: Fix
RENAME COLUMN → full expand-contract (lesson 05): Fix

ALTER COLUMN TYPE: the binary-coercible exception

Most type changes rewrite the table. Exceptions are binary-coercible types — the same physical representation:

varchar(50) → varchar(100) (wider, same bytes)
varchar → text (equivalent storage)
text → varchar (same)

Postgres detects these and skips the rewrite — brief ACCESS EXCLUSIVE, no rows written. Non-coercible changes (int → bigint, text → int, json → jsonb) require either a full rewrite or the shadow-column expand-contract pattern (lesson 05).

▸Why this works

Why does VALIDATE CONSTRAINT use SHARE UPDATE EXCLUSIVE instead of ACCESS EXCLUSIVE? VALIDATE is read-only — it scans rows to confirm they pass the constraint, but does not change any row. Only DDL operations that modify the table structure or rewrite rows need ACCESS EXCLUSIVE. Read-only validation needs only enough isolation to prevent concurrent DDL from changing the constraint definition mid-scan.

Quiz

CREATE INDEX CONCURRENTLY fails halfway on a 300M-row table. What state is the index in, and what is the correct fix?

Quiz

Which is the right way to add a CHECK constraint to a 100M-row table without blocking writes?

Order the steps

Order the unsafe-DDL fixes from most to least common in production:

1 ADD COLUMN with volatile DEFAULT → ADD COLUMN without default + backfill in batches + SET DEFAULT
2 ALTER COLUMN TYPE (non-coercible) → add new column + dual-write + backfill + swap reads + drop old
3 ADD NOT NULL → ADD CHECK NOT VALID + VALIDATE + SET NOT NULL + DROP CHECK
4 ADD FOREIGN KEY → ADD CONSTRAINT NOT VALID + VALIDATE CONSTRAINT
5 CREATE INDEX → CREATE INDEX CONCURRENTLY (+ verify indisvalid after)
6 DROP COLUMN (code still reading) → deploy code first to stop reading, drain old pods, then DROP
7 RENAME COLUMN → full expand-contract over days (lesson 05)

Top row: the unsafe one-shot blocks all traffic for the scan. Bottom row: NOT VALID enforces for new rows instantly, then VALIDATE scans existing rows under a lighter lock that does not block DML.

Recall before you leave

01
What are the two failure modes of CREATE INDEX CONCURRENTLY and how do you handle each?
02
Why must CREATE INDEX CONCURRENTLY run outside a transaction block, and how do migration tools handle this?
03
Explain the difference between ADD CONSTRAINT and ADD CONSTRAINT NOT VALID, and why VALIDATE CONSTRAINT does not block DML.

Recap

Every unsafe DDL has a production-safe equivalent. CREATE INDEX CONCURRENTLY takes SHARE UPDATE EXCLUSIVE instead of ACCESS EXCLUSIVE, leaving DML unblocked during the build — but it cannot run inside a transaction and leaves INVALID indexes on failure (fix: DROP INDEX CONCURRENTLY + retry). ADD CONSTRAINT ... NOT VALID instantly enforces a constraint for new rows without scanning existing ones; VALIDATE CONSTRAINT then scans under SHARE UPDATE EXCLUSIVE, blocking only concurrent DDL, not DML. Binary-coercible type changes (varchar(50) → varchar(100)) skip the rewrite; non-coercible ones require the shadow-column expand-contract pattern. The Squawk linter flags unsafe forms in CI so teams catch them before merge. Now when you review a migration PR that adds an index or a constraint, you know to ask: is it CONCURRENT? Is it NOT VALID? If not, it is a production incident waiting to happen.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 5 done

Connected lessons

builds on

The lock-queue failure mode: why instant DDL can freeze the databasemiddle

unlocks

Expand-contract: zero-downtime for breaking schema changesmiddle

deepens into

appears again in287

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.