Databases DB · 01 · 06

Schema integrity: deferral, versioning, and production failure modes

Temporal invariants and state-transition enforcement, constraint deferral for complex transactions, schema-as-code with expand-then-contract, and the five failure modes that recur across production schemas.

DB Senior ◷ 15 min

Level

FoundationsJuniorMiddleSenior

A team runs a batch migration — 3 million rows relinked to new parents. A foreign key violation fires at row 2,847,221. The transaction aborts. All three million updates roll back. The migration runs again at 3 AM. The same violation fires. No one thought to check for orphan rows before starting.

State-transition enforcement

SQL CHECK constraints are evaluated on every write and must be deterministic. They can reference other columns of the same row but not other rows. This means pure-column CHECK can enforce “status must be one of these values” but cannot enforce “completed orders cannot transition back to pending.”

For state-transition invariants, the right tools are:

Row-level triggers. A BEFORE UPDATE trigger that raises an exception if the transition is disallowed:

CREATE OR REPLACE FUNCTION enforce_order_transitions()
RETURNS TRIGGER AS $$
BEGIN
  IF OLD.status = 'completed' AND NEW.status != 'completed' THEN
    RAISE EXCEPTION 'order % cannot leave completed state', OLD.id;
  END IF;
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER trg_order_status
  BEFORE UPDATE ON orders
  FOR EACH ROW EXECUTE FUNCTION enforce_order_transitions();

Generated columns with CHECK. A GENERATED ALWAYS AS (status IN ('completed','cancelled')) STORED column named is_final, plus a trigger or application check that refuses updates where OLD.is_final = true. The generated column documents the finality rule in the schema.

Row-level security. Postgres RLS policies can encode access rules that depend on row state — for example, USING (status != 'archived') on UPDATE prevents updating archived rows without requiring application-level checks.

The discipline: business rules that can be expressed declaratively (in CHECK or RLS) live in the schema. Rules that require procedural logic (state transitions, cross-row checks) live in triggers. Application-level-only enforcement is the weakest — every consumer must know the rules.

Constraint deferral

By default, Postgres checks constraints after each individual statement. For some complex multi-table operations, intermediate states would violate a constraint even though the final state is valid.

-- A circular FK: A references B, B references A
-- Inserting A requires B to exist; inserting B requires A to exist
-- Neither can be inserted first with IMMEDIATE constraints

ALTER TABLE a ALTER CONSTRAINT fk_a_to_b DEFERRABLE INITIALLY DEFERRED;
ALTER TABLE b ALTER CONSTRAINT fk_b_to_a DEFERRABLE INITIALLY DEFERRED;

BEGIN;
  INSERT INTO a (id, b_id) VALUES (1, 1); -- b_id=1 does not exist yet; deferred: OK
  INSERT INTO b (id, a_id) VALUES (1, 1); -- a_id=1 exists now; deferred check: consistent
COMMIT; -- Both FK checks run here; both pass

DEFERRABLE INITIALLY DEFERRED means the check is deferred until COMMIT. DEFERRABLE INITIALLY IMMEDIATE declares the constraint deferrable but immediate by default; you can switch it per-transaction with SET CONSTRAINTS ... DEFERRED.

Use deferral sparingly. Deferred constraints are checked at COMMIT. A failing deferred constraint aborts the entire transaction with no in-flight indication — unlike an immediate constraint, which fails the specific statement. This makes debugging harder. Production hygiene: use DEFERRABLE only for FKs with genuine circular or multi-step dependencies, and only in transactions where the temporary violation is well-understood.

Mode	When checked	Failure point	Use case
IMMEDIATE (default)	After each statement	At the violating statement	All standard constraints
DEFERRABLE INITIALLY IMMEDIATE	After each statement (unless overridden)	At the violating statement	Conditionally deferrable; rare
DEFERRABLE INITIALLY DEFERRED	At COMMIT	Transaction aborts at COMMIT	Circular FK, graph relinks

Schema versioning: expand-then-contract

A relational schema is a contract between the database and every consumer. Adding a nullable column is non-breaking. Adding a NOT NULL column without a default breaks every existing INSERT. Dropping a column breaks every reader. Renaming breaks both.

Senior teams manage schema as code — migration files in version control (Flyway, Liquibase, golang-migrate, Prisma Migrate, sqlx) — and follow the expand-then-contract pattern for breaking changes:

Expand: add the new shape alongside the old. New column nullable, no change to existing code.
Deploy code that writes both. Application writes to both old and new column. Old column still readable.
Backfill the new column for existing rows (in batches to avoid long locks).
Deploy code that reads from new. Application reads from new column; falls back to old during transition.
Validate parity. A check query compares old and new values; zero mismatches before proceeding.
Contract: remove the old shape in a subsequent migration.

The pattern applies to column renames, type changes, and table splits. It adds one deployment cycle per breaking change; the alternative is downtime or silent data loss.

▸Why this works

Why not just run a migration in a maintenance window? For small tables, a maintenance window is fine. For a 100M-row table, ALTER TABLE can hold an exclusive lock for hours while it rewrites the table. Modern Postgres supports ALTER TABLE ... ADD COLUMN ... DEFAULT ... without a rewrite (as of Postgres 11), but column-type changes still require a rewrite. Online schema change tools (pg_repack, pgroll, schema-change-as-migration patterns) let you make breaking changes with zero downtime at the cost of operational complexity. Expand-then-contract is the code-level version of the same discipline.

Five production failure modes

What unites all five? Each one is a place where the schema or query machinery quietly diverges from what the developer assumed — and the divergence is invisible until it hits production at scale. These failures recur regardless of how carefully the schema was designed.

1. Implicit type coercion killing an index. A query WHERE user_id = '42' (string literal instead of integer) triggers an implicit cast on every row. Postgres cannot use the integer B-tree index because the comparison is between the indexed integer and a text value. EXPLAIN ANALYZE shows a sequential scan. Fix: use typed query builders or ORMs that bind the correct parameter type.

2. NULL semantics in joins causing silent row drops. A LEFT JOIN where the joined column contains NULL silently drops those rows in downstream inner JOINs. NULL != NULL (NULL is not equal to anything, including itself). Senior pattern: use IS DISTINCT FROM for null-safe comparison and explicit COALESCE for null-tolerant logic in WHERE clauses.

3. Cascade cascade cascade — unbounded transactions. A DELETE on a heavily-cascaded root row creates a multi-million-row transaction that holds locks for minutes. All writes to cascaded tables are blocked. Fix: soft-delete the root row (deleted_at = now()), then batch-clean the descendants with explicit DELETE loops limited to 10,000 rows at a time.

4. Schema drift via ad-hoc ALTER. A developer runs ALTER TABLE directly in production outside the migration system. The migration tool no longer matches reality; future migrations fail mysteriously or produce inconsistent state. Fix: every schema change goes through the migration system, period. CI enforces the rule.

5. JSONB columns that grew into relational shape. A “metadata” JSONB column accumulated 30 typed fields over two years. Each new feature adds IF metadata->>'feature_x' = .... The column now has schema-on-read with no enforcement. The right move is to extract columns — but the migration is expensive and keeps getting postponed. Fix: the “promote-to-column” rule: the moment a field appears in a WHERE clause, schedule a column-extraction migration.

Each failure is a silent divergence between the schema and the developer's assumption, invisible until production scale — and each has one concrete fix.

Quiz

A team needs to delete a workspace and all its data (projects, tasks, comments). There are 5 million comments cascaded from tasks. What is wrong with `DELETE FROM workspaces WHERE id = X` with CASCADE FKs?

Pick the best fit

A column needs to be renamed from `user_name` to `display_name` on a 50M-row table with zero downtime. Pick the approach.

Quiz

A query `WHERE user_id = '42'` (string literal) on a column declared as BIGINT runs a sequential scan on a 30M-row table despite a B-tree index on user_id. Why?

The old shape stays readable through every step until parity is confirmed, so no consumer breaks before it has been updated — the price is one extra deployment cycle per breaking change instead of downtime.

Recall before you leave

01
Describe the expand-then-contract pattern and explain why it is required for renaming a column on a large shared-schema table.
02
What is the specific risk of DEFERRABLE INITIALLY DEFERRED constraints compared to IMMEDIATE constraints?
03
Name three of the five production failure modes and give the fix for each.

Recap

State-transition enforcement belongs in triggers or RLS when the rule requires procedural logic; pure CHECK handles per-row invariants. DEFERRABLE INITIALLY DEFERRED delays FK checks until COMMIT — useful for circular dependencies, dangerous because violations abort the whole transaction. Schema versioning uses migration files in version control; breaking changes use expand-then-contract to avoid downtime. The five production failure modes (implicit coercion killing indexes, NULL semantics in joins, unbounded cascade transactions, schema drift via ad-hoc ALTER, JSONB sprawl) each have specific fixes that are cheaper than discovering them in production. Schema decisions are sticky — the cost of the wrong design is paid at migration time. Now when you see a string literal in a WHERE clause comparing to an integer column, you know to check the query plan before it reaches production.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 5 done

Connected lessons

builds on

Heap storage, TOAST, and column alignmentsenior

unlocks

deepens into

Relational vs document, wide-column, graph, and key-valuesenior

appears again in190

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.