Databases DB · 01 · 02

Constraints, keys, and Postgres data types

The five constraint kinds in depth, surrogate vs natural keys, SQL''''s gap from relational algebra, and how picking the narrowest Postgres type is itself a constraint.

DB Middle ◷ 16 min

Level

FoundationsJuniorMiddleSenior

A team stores money as REAL. After a year of accumulated transactions they find cents-level discrepancies they cannot explain. The type was the bug — float arithmetic loses precision. The fix is a schema change that touches every row.

Codd’s relational model vs SQL

Edgar Codd’s 1970 paper formalised relations as sets of tuples drawn from typed domains, and defined a closed algebra of operations (selection, projection, join, union, intersection, difference) — closed because each operation takes relations as input and produces a relation as output. SQL is a non-strict implementation of that algebra; it adds NULL (which Codd disliked), ORDER (rows are conceptually unordered), and duplicate rows (relations have no duplicates). Knowing the gap explains the rough edges:

NULL = NULL is NULL, not true — three-valued logic.
ORDER BY is required to guarantee row order — the engine may return any order without it.
DISTINCT exists because the engine must keep duplicates unless you ask.

Treat the gap as “SQL = relational algebra plus practical compromises” and the surprises stop being surprising.

The five constraint kinds

Why bother memorising all five? Because each one catches a different class of bug at write time — before bad data ever reaches your application. When you see a production data inconsistency, ask which of these five would have prevented it.

Constraint	What it enforces	Key detail
PRIMARY KEY	Unique non-null identifier per row	One per table; implicitly creates a unique B-tree index
UNIQUE	No duplicate values in this column set	Multiple NULLs allowed (standard SQL); opt into `UNIQUE NULLS NOT DISTINCT` (SQL:2023 / Postgres 15+) to block that
NOT NULL	Column always has a value	Per-column; the first line of data quality
FOREIGN KEY	Column references an existing PK/UNIQUE key in another table	ON DELETE / ON UPDATE clauses: NO ACTION, RESTRICT, CASCADE, SET NULL, SET DEFAULT
CHECK	Arbitrary boolean expression on each row at write time	`CHECK (amount >= 0)`, `CHECK (status IN (‘open’,‘closed’))` — can reference other columns of the same row

Surrogate vs natural keys

A natural key is data that already exists in the business domain (user email, product SKU, order number). A surrogate key is database-generated, opaque, and meaningful only inside the database (BIGSERIAL, UUID).

Production default in 2026: surrogate key as the primary key, plus a UNIQUE NOT NULL constraint on the business-natural key. Why: natural keys change (a customer changes their email), and a primary key change cascades through every foreign key referencing it — operationally expensive and often impossible at scale. Surrogate keys never change.

The exception: pure join tables (favourites: user_id, item_id) often use the composite of foreign keys as the PK — the relationship itself is the identity, no surrogate needed.

UUID vs BIGSERIAL. UUIDs are globally unique (good for distributed inserts, multi-region, offline-first clients) but bigger (16 bytes vs 8) and worse for index locality (random UUIDv4 fragments B-trees). UUIDv7 (time-ordered, RFC 9562) fixes the locality issue and is the modern default where UUID is wanted. BIGSERIAL is smaller, sequential, and cache-friendly — pick it when globally-unique IDs are not needed.

The surrogate-key tradeoff in one number: UUID buys global uniqueness at 2x the index-entry bytes of BIGSERIAL, plus worse B-tree locality unless you use the time-ordered UUIDv7.

Postgres data types: pick the narrowest that fits

Postgres has the richest type system of any mainstream database. The type is the first line of constraint — a correctly-typed column catches 80% of bad data before any CHECK constraint runs.

Category	Production defaults	Avoid
Integers	BIGINT (8B) for IDs; INTEGER (4B) when domain is bounded < ~2B	SMALLINT unless you know domain is < 32,767
Strings	TEXT (no length limit, no padding)	CHAR(n) — pads to length, trailing-space surprises; VARCHAR(n) adds a check but no storage benefit
Money	NUMERIC(p,s) or BIGINT cents — exact arithmetic	REAL or DOUBLE PRECISION — IEEE 754 loses cents
Time	TIMESTAMPTZ (stores UTC, displays in session timezone)	TIMESTAMP (no zone) — a footgun; DATE for date-only
IDs	UUID native type (16 bytes)	UUID as TEXT — wastes bytes, loses type enforcement
Booleans	BOOLEAN	SMALLINT or TEXT for booleans — semantically wrong
Semi-structured	JSONB (binary, indexable)	JSON (text only, not indexable)

Key and type numbers

Codd's paper: 1970
BIGSERIAL per index entry: 8 bytes
UUID per index entry: 16 bytes
UUIDv7 vs UUIDv4 index locality: ordered vs random
FK constraint check overhead: ~5-50 μs / row
JSONB GIN index size vs B-tree: ~5-20x larger
Typical column storage overhead: ~1-4 bytes / column
Composite PK index entry size: ~24-48 bytes

Design a minimal e-commerce schema (users, products, orders)

1/3

Quiz

A team stores money as REAL (single-precision float) and notices that a year of accumulated transactions has cents-level discrepancies. The fix?

Quiz

Which is the strongest argument for using a surrogate primary key (BIGSERIAL or UUID) over a natural key (email)?

ON DELETE is chosen per relationship: RESTRICT refuses deleting a parent that still has children; CASCADE deletes the owned children. The composite PK on order_items makes the relationship its own identity.

Recall before you leave

01
Why is NULL = NULL not TRUE in SQL, and what does it equal?
02
Name the ON DELETE options for a foreign key and when you use each.
03
What is the production default for storing currency in Postgres, and why not REAL?

Recap

SQL is relational algebra plus practical compromises: NULL, ordering, and duplicates. The five constraint kinds (PRIMARY KEY, FOREIGN KEY, NOT NULL, UNIQUE, CHECK) encode business rules the engine refuses to break. The production default for primary keys is a surrogate (BIGSERIAL or UUIDv7) plus a UNIQUE NOT NULL on the business natural key — natural keys change, surrogate keys never do. Postgres types are the first line of constraint: NUMERIC for money, TIMESTAMPTZ for timestamps, TEXT for strings, JSONB (not JSON) for semi-structured data. Now when you see a money column declared as REAL or a timestamp stored without a timezone, you will know exactly what drift accumulates and which types to reach for instead. Lesson 3 covers normalization — the discipline for removing redundancy from a schema.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 5 done

Connected lessons

builds on

What a relation is: tables, rows, keys, and constraintsjunior

unlocks

Normal forms, denormalization, and why schemas stickmiddle

deepens into

appears again in190

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.

Apply this

Put this lesson to work on a real build.

Mini CRUD APIBuild your first real backend: a tiny HTTP API that creates, reads, updates, and deletes notes — backed by SQLite so the data survives a restart. You go from a one-line 'hello' server to a small service that validates input and stores rows, one honest step at a time.URL shortener at scaleBuild a URL shortener that survives real traffic — then run it: deploy it, watch it, and work the incident when one hot link melts your cache.