awesome-everything RU
↑ Back to the climb

Databases

Normal forms, denormalization, and why schemas stick

Crux 1NF through BCNF in plain terms, when to deliberately denormalize, the ORM traps that quietly break schema discipline, and naming conventions that pay back across years.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at middle altitude — in the sky
◷ 16 min

A junior engineer stores the customer’s city on every order row — “it’s convenient for reports.” Two years later, the customer moves cities. Now half the historical orders show the old city and half show the new one. The same fact lives in two places and they disagree. Normalization is the discipline that prevents this.

Normal forms in plain terms

Normalization is a series of progressively stricter rules for splitting tables so no fact is stored twice.

1NF — atomic cells. Every cell must hold a single value. No comma-separated lists inside a column, no repeating groups, no arrays disguised as column names (tag1, tag2, tag3). If you can parse a cell with split(','), the table violates 1NF.

2NF — no partial key dependency. Every non-key column must depend on the whole primary key, not part of it. Only matters for composite primary keys. Example violation: a table order_items(order_id, product_id, product_name)product_name depends on product_id alone, not on the composite key. Fix: move product_name to a products table.

3NF — no transitive dependency. Every non-key column must depend only on the primary key, not on another non-key column. Example violation: orders(order_id, customer_id, customer_city)customer_city depends on customer_id, not on order_id. Fix: move customer_city to a customers (or addresses) table.

BCNF — every determinant is a candidate key. Stricter than 3NF; matters when a table has multiple overlapping candidate keys. Rare in production; common in academic examples. Violation example: two candidate keys share an attribute and one is not a superkey. In practice, target 3NF; go to BCNF only if you can state the specific overlapping-candidate-key scenario.

Normal formRuleViolation shapeFix
1NFAtomic cells, no repeating groupsComma-separated list in one columnSeparate table or typed array
2NFFull key dependencyNon-key col depends on part of composite PKMove to the owning table
3NFNo transitive dependencyNon-key col depends on another non-key colExtract to its own table
BCNFEvery determinant is a candidate keyMultiple overlapping candidate keysDecompose; rare in practice

The point is not following rules — it is eliminating the failure mode where two rows can disagree about the same fact. If you can update Anna’s email in three tables, two will be wrong within a year.

Denormalization: when and why

Normalization optimises for write correctness. It can be expensive on the read side because answering a question requires joining many tables. Denormalization deliberately duplicates data to avoid joins on hot read paths.

Stored aggregates. An order_total_cents column on orders that caches SUM(order_items.unit_price_cents * quantity). Refreshed by a trigger or application code on each line-item write. Reporting hits a single column instead of aggregating millions of rows.

Wide read models. A user_profile table that combines name, email, current address, latest order count — fed by triggers or application code from the normalised source tables. Dashboard queries hit one row.

Materialised paths. Storing the full ancestry path of a tree node as a text column (/root/section/subsection/) to avoid recursive joins when reading hierarchies. Classic pattern for comment trees or org charts.

The discipline: denormalize on the read side, keep the authoritative facts normalised, and accept that the duplicates must be reconciled by something — a trigger, a scheduled job, application code. The failure mode is denormalize-everywhere with no reconciliation plan. Every column eventually disagrees with every other.

Why this works

Why 3NF and not 4NF, 5NF, or DKNF? Normal forms above 3NF address multi-valued dependencies and join dependencies that appear in highly academic schemas. Real production tables almost never exhibit those shapes. 3NF is the point of diminishing returns: beyond it, you split tables that do not need splitting, add joins that do not add correctness, and slow reads for schemas that had no integrity problem to begin with.

ORM traps that quietly break schema discipline

ORMs abstract away SQL but can introduce schema-level mistakes.

N+1 queries. An ORM loads a list of 100 orders, then for each order fetches the customer separately. Result: 101 queries where 1 join would do. In many ORMs this is the default unless you declare an eager-load (include, with, select_related). The schema implication: every FK relation must be annotated with the correct default loading strategy; leaving it to implicit lazy-load accumulates invisible N+1s.

Implicit schema generation. Many ORMs can generate the schema from the model definition. The generated schema often lacks: CHECK constraints, composite indexes, index on the FK column, named constraints. The result is a syntactically valid schema that violates the discipline of this lesson — no constraints, no format enforcement, generic index strategy. Treat ORM-generated schemas as a starting point, not the finished artifact; add constraints and indexes manually or via migration files.

Missing updated_at refresh. ORMs frequently auto-manage created_at but not updated_at. A missing updated_at makes incremental replication and auditing impossible. Enforce via a trigger (BEFORE UPDATE ... SET updated_at = now()) or ORM lifecycle hook; never trust the application to remember.

Naming conventions that pay back

These look pedantic; they pay back across every dashboard, every migration, every cross-team query.

  • Plural table names (users, orders, order_items) — singular is also defensible; pick one and enforce it everywhere.
  • snake_case — Postgres folds unquoted identifiers to lower case; snake_case is the production default.
  • id as the primary key name on every table; foreign keys named <referenced_table_singular>_id (e.g., user_id, order_id).
  • created_at, updated_at, deleted_at for temporal columns. deleted_at IS NOT NULL is the soft-delete pattern.
  • Boolean columns prefixed is_, has_, can_ or named as a clear property (active, not flag).
  • CHECK constraints naming — name every constraint: CONSTRAINT chk_orders_status CHECK (status IN (...)). Unnamed constraints produce cryptic error messages on violation.

Mature platforms CI-lint these conventions so they are enforced across teams, not remembered by individuals.

Order the steps

Order these schema evolution actions from cheapest to most expensive:

  1. 1 Add a CHECK constraint on an existing column
  2. 2 Add a UNIQUE index (CREATE INDEX CONCURRENTLY)
  3. 3 Add a FOREIGN KEY (requires validation pass over existing rows)
  4. 4 Change a column type (TEXT → INTEGER) — table rewrite
  5. 5 Split a JSONB column into typed columns — backfill migration
  6. 6 Split one table into two with FK — dual-write window, parity validation
  7. 7 Change a primary key — cascades through every FK and index; almost never done
Quiz

A table stores `orders(order_id, customer_id, customer_city)`. Which normal form does it violate and why?

Pick the best fit

A reporting dashboard needs the total revenue per workspace, recalculated every 5 minutes. Pick the implementation.

Quiz

An ORM-generated schema has no named CHECK constraints and no index on FK columns. What are the two concrete failure modes this causes in production?

Recall before you leave
  1. 01
    Explain in your own words why 3NF is the production-default normal form, and when teams legitimately go above (BCNF) or below (deliberate denormalization).
  2. 02
    Name three concrete ORM traps that quietly violate schema discipline and the fix for each.
  3. 03
    What is the materialised-path pattern and when does it beat recursive CTEs for tree queries?
Recap

Normal forms eliminate redundant facts: 1NF (atomic cells), 2NF (full-key dependency), 3NF (no transitive dependency), BCNF (every determinant is a candidate key). Target 3NF for production schemas. Denormalize deliberately — stored aggregates, wide read models, materialised paths — with explicit reconciliation; never denormalize without a plan for keeping the copies consistent. ORM-generated schemas skip constraints and indexes; add them. Naming conventions (snake_case, plural tables, id PK, created_at/updated_at/deleted_at, named CHECK constraints) are enforced by CI. Schema decisions are sticky; the cost of the wrong normal form is paid at migration time, not design time.

Connected lessons
appears again in140
Continue the climb ↑JSONB, arrays, and when a side table wins
shortcuts expand
search
K
prev piece
k
next piece
j
cycle tier
t
this menu
?
sources3
expand
  1. 01
  2. 02
  3. 03

Trademarks belong to their respective owners. Editorial reference only.