Databases
Constraints, keys, and Postgres data types
A team stores money as REAL. After a year of accumulated transactions they find cents-level discrepancies they cannot explain. The type was the bug — float arithmetic loses precision. The fix is a schema change that touches every row.
Codd’s relational model vs SQL
Edgar Codd’s 1970 paper formalised relations as sets of tuples drawn from typed domains, and defined a closed algebra of operations (selection, projection, join, union, intersection, difference) — closed because each operation takes relations as input and produces a relation as output. SQL is a non-strict implementation of that algebra; it adds NULL (which Codd disliked), ORDER (rows are conceptually unordered), and duplicate rows (relations have no duplicates). Knowing the gap explains the rough edges:
NULL = NULLis NULL, not true — three-valued logic.ORDER BYis required to guarantee row order — the engine may return any order without it.DISTINCTexists because the engine must keep duplicates unless you ask.
Treat the gap as “SQL = relational algebra plus practical compromises” and the surprises stop being surprising.
The five constraint kinds
| Constraint | What it enforces | Key detail |
|---|---|---|
| PRIMARY KEY | Unique non-null identifier per row | One per table; implicitly creates a unique B-tree index |
| UNIQUE | No duplicate values in this column set | Multiple NULLs allowed (standard SQL); opt into UNIQUE NULLS NOT DISTINCT (SQL:2023 / Postgres 15+) to block that |
| NOT NULL | Column always has a value | Per-column; the first line of data quality |
| FOREIGN KEY | Column references an existing PK/UNIQUE key in another table | ON DELETE / ON UPDATE clauses: NO ACTION, RESTRICT, CASCADE, SET NULL, SET DEFAULT |
| CHECK | Arbitrary boolean expression on each row at write time | CHECK (amount >= 0), CHECK (status IN (‘open’,‘closed’)) — can reference other columns of the same row |
Surrogate vs natural keys
A natural key is data that already exists in the business domain (user email, product SKU, order number). A surrogate key is database-generated, opaque, and meaningful only inside the database (BIGSERIAL, UUID).
Production default in 2026: surrogate key as the primary key, plus a UNIQUE NOT NULL constraint on the business-natural key. Why: natural keys change (a customer changes their email), and a primary key change cascades through every foreign key referencing it — operationally expensive and often impossible at scale. Surrogate keys never change.
The exception: pure join tables (favourites: user_id, item_id) often use the composite of foreign keys as the PK — the relationship itself is the identity, no surrogate needed.
UUID vs BIGSERIAL. UUIDs are globally unique (good for distributed inserts, multi-region, offline-first clients) but bigger (16 bytes vs 8) and worse for index locality (random UUIDv4 fragments B-trees). UUIDv7 (time-ordered, RFC 9562) fixes the locality issue and is the modern default where UUID is wanted. BIGSERIAL is smaller, sequential, and cache-friendly — pick it when globally-unique IDs are not needed.
Postgres data types: pick the narrowest that fits
Postgres has the richest type system of any mainstream database. The type is the first line of constraint — a correctly-typed column catches 80% of bad data before any CHECK constraint runs.
| Category | Production defaults | Avoid |
|---|---|---|
| Integers | BIGINT (8B) for IDs; INTEGER (4B) when domain is bounded < ~2B | SMALLINT unless you know domain is < 32,767 |
| Strings | TEXT (no length limit, no padding) | CHAR(n) — pads to length, trailing-space surprises; VARCHAR(n) adds a check but no storage benefit |
| Money | NUMERIC(p,s) or BIGINT cents — exact arithmetic | REAL or DOUBLE PRECISION — IEEE 754 loses cents |
| Time | TIMESTAMPTZ (stores UTC, displays in session timezone) | TIMESTAMP (no zone) — a footgun; DATE for date-only |
| IDs | UUID native type (16 bytes) | UUID as TEXT — wastes bytes, loses type enforcement |
| Booleans | BOOLEAN | SMALLINT or TEXT for booleans — semantically wrong |
| Semi-structured | JSONB (binary, indexable) | JSON (text only, not indexable) |
- Codd's paper
- 1970
- BIGSERIAL per index entry
- 8 bytes
- UUID per index entry
- 16 bytes
- UUIDv7 vs UUIDv4 index locality
- ordered vs random
- FK constraint check overhead
- ~5-50 μs / row
- JSONB GIN index size vs B-tree
- ~5-20x larger
- Typical column storage overhead
- ~1-4 bytes / column
- Composite PK index entry size
- ~24-48 bytes
Design a minimal e-commerce schema (users, products, orders)
1/3A team stores money as REAL (single-precision float) and notices that a year of accumulated transactions has cents-level discrepancies. The fix?
Which is the strongest argument for using a surrogate primary key (BIGSERIAL or UUID) over a natural key (email)?
- 01Why is NULL = NULL not TRUE in SQL, and what does it equal?
- 02Name the ON DELETE options for a foreign key and when you use each.
- 03What is the production default for storing currency in Postgres, and why not REAL?
SQL is relational algebra plus practical compromises: NULL, ordering, and duplicates. The five constraint kinds (PRIMARY KEY, FOREIGN KEY, NOT NULL, UNIQUE, CHECK) encode business rules the engine refuses to break. The production default for primary keys is a surrogate (BIGSERIAL or UUIDv7) plus a UNIQUE NOT NULL on the business natural key — natural keys change, surrogate keys never do. Postgres types are the first line of constraint: NUMERIC for money, TIMESTAMPTZ for timestamps, TEXT for strings, JSONB (not JSON) for semi-structured data. Lesson 3 covers normalization — the discipline for removing redundancy from a schema.
- Heap storage, TOAST, and column alignmentsenior
- Schema integrity: deferral, versioning, and production failure modessenior
- Relational vs document, wide-column, graph, and key-valuesenior
- Relational model: design and migrate a multi-tenant schemasenior
- Relational model: multiple-choice reviewsenior
- Relational model: free-recall reviewsenior
appears again in164
- Why GraphQL gets N+1junior
- DataLoader mechanics: tick-boundary batchingmiddle
- Batch function contracts: ordering, shapes, errorsmiddle
- Federation and lookahead: batching beyond DataLoadermiddle
- Query complexity defences: depth, cost, persisted queriesmiddle
- Senior GraphQL API: scheduling contract, tenant isolation, observabilitysenior
- Why idempotency: making retries safejunior
- Server-side state machine: four states of an idempotency keymiddle
- Outbox and inbox: effectively-once across the dual-write boundarymiddle
- Concurrency and cache architecture for idempotency at scalesenior
- Observability, production failures, and global-scale designsenior
- The event loop: one thread, three queuesjunior
- Tasks, microtasks, and scheduler.yield()middle
- Microtask starvation, Long Tasks, and LoAFsenior
- Node.js event loop: phases, nextTick, and loop lagsenior
- React, Vue, and INP observability in productionsenior
- The render pipeline: six stages from bytes to pixelsjunior
- Stage costs and the renderer process modelmiddle
- Invalidation, dirty bits, and containmiddle
- Compositor layers: promotion, overlap, and GPU memorymiddle
- DevTools flame strip and the frame lifecyclemiddle
- Layout thrash: forced synchronous layoutsenior
- BeginMainFrame, compositor-driven animations, and GPU memorysenior
- Production observability: LoAF, INP, and the full attack surfacesenior
- What V8 is and why performance varies 100×junior
- V8''''s four-tier JIT pipeline and profile-guided tieringmiddle
- Hidden classes, transition trees, and memory layoutmiddle
- Inline caches, IC states, and deoptimizationmiddle
- Orinoco GC: parallel scavenger, concurrent marking, and write barriersmiddle
- TurboFan''''s speculative engine and the deopt-loop trapsenior
- V8 in production: isolates, pointer compression, and real failuressenior
- What workers are and why they existjunior
- Web worker mechanics: dedicated, shared, and OffscreenCanvasmiddle
- Structured clone and transferablesmiddle
- Service worker lifecycle and cache strategiesmiddle
- SharedArrayBuffer, Atomics, and cross-origin isolationsenior
- Service worker edge cases: version skew, durability, and navigation trapssenior
- Worker pools, Comlink, and production observabilitysenior
- What the reconciler does: render vs commitjunior
- The fiber object and the double-buffer treemiddle
- Render phase purity and commit phase sub-stepsmiddle
- Reconciliation: diffing heuristics and the key trapmiddle
- Priority lanes, time-slicing, and useTransitionmiddle
- Bailout, memoisation, and tearingsenior
- React Profiler, the Compiler, and production observabilitysenior
- Rendering strategies: SSG, SSR, ISR, streaming, and hydrationjunior
- SSG, SSR, ISR, streaming, and RSC — how each worksmiddle
- Hydration cost: selective, progressive, islands, resumabilitymiddle
- Hydration mismatch: causes, detection, and the determinism rulesenior
- RSC, per-route strategy, and production observabilitysenior
- Core Web Vitals: what LCP, INP, and CLS measurejunior
- CLS: why layout shifts happen and how to stop themmiddle
- Metric tradeoffs, RUM attribution, and the CI+field loopsenior
- The full picture: URL to LCP to INP as a relay racejunior
- Eight layers traced: from the service worker to the second navigationmiddle
- Five canonical breaks: where production reliably diessenior
- The three-track method: reading traces and building a monitored systemsenior
- What is a cache stampede and why it makes things worsejunior
- Lock and single-flight: bounding concurrent rebuildsmiddle
- XFetch: coordination-free probabilistic early expirationmiddle
- Stale-while-revalidate and CDN request coalescingmiddle
- Detecting stampedes and designing TTL for productionmiddle
- Metastable failure, fencing tokens, and production postmortemssenior
- Raft roles, terms, and why majority quorums prevent split brainjunior
- How Raft replicates a log entry and decides it is safe to commitmiddle
- Raft leader election: timeouts, voting rules, and the four safety propertiesmiddle
- Raft in the real world: partitions, slow disks, and client routingmiddle
- Raft extensions: pre-vote, learners, snapshots, and linearizable readssenior
- Raft in production: membership changes, Multi-Raft, and observabilitysenior
- Where data fetching happens — and why it decides LCPjunior
- Fetch waterfalls — diagnosis and the Promise.all curemiddle
- React Server Components and Suspense streamingmiddle
- Client-side cache: TanStack Query, SWR, and stale-while-revalidatemiddle
- LCP, prefetch, and race conditions in interactive fetchingmiddle
- Senior internals: RSC payload, caching layers, and production failure modessenior
- The IP envelopejunior
- Reading the IP headermiddle
- The three-way handshakejunior
- Sequence numbers and connection statemiddle
- DNS: what it does and why it existsjunior
- The resolver walk: referrals, record types, and gluemiddle
- TTL, caching, and DNS propagationmiddle
- What TLS does and why it existsjunior
- The 1-RTT handshake: key shares and ECDHEmiddle
- Session resumption and 0-RTTmiddle
- Key schedule, SNI, ALPN, and extensionssenior
- 0-RTT defenses, ECH, hybrid PQ, and production TLSsenior
- WebSocket: the HTTP upgrade handshakejunior
- WebSocket frame format: opcodes, masking, fragmentationmiddle
- WebSocket backpressure: when clients can''''t keep upmiddle
- Reconnection: jittered backoff, thundering herd, message resumptionsenior
- WebSocket at scale: HTTP/2 multiplexing, permessage-deflate, C10Msenior
- WebSocket in production: proxies, security, and distributed architecturesenior
- What reverse proxies dojunior
- Health checks, connection draining, and slow startmiddle
- Session affinity, consistent hashing, and the right fixmiddle
- Retry storms, circuit breakers, and load sheddingsenior
- Resilient LB architecture: anycast, zone-aware routing, and observabilitysenior
- Why QUIC and not TCP+TLSjunior
- Connection IDs and network migrationmiddle
- 0-RTT resumption and packet encryptionsenior
- DDoS: what it is and why it worksjunior
- Amplification attacks and state exhaustionmiddle
- Rate limiting: algorithms and architecturemiddle
- WAFs, firewalls, mTLS, and HSTSmiddle
- DNS cache poisoning and BGP hijackingsenior
- Defense-in-depth architecture and attack economicssenior
- The twelve layers: one URL, seven actorsjunior
- DNS, TCP, TLS in sequence: where the milliseconds gomiddle
- Proxy intercepts and security gates: rate limiters, WAF, mTLSmiddle
- Alternate paths: QUIC 0-RTT, WebSocket upgrade, connection migrationmiddle
- Observability: distributed traces, USE/RED, and samplingsenior
- Resilience: cascading retries, circuit breakers, and error budgetssenior
- What the three signals are: logs, metrics, and tracesjunior
- Why structured logs exist: the diary vs the spreadsheetjunior
- The production log schema: fields every line must carrymiddle
- PII redaction and log injectionsenior
- OTel Logs Data Model and audit logs as a subsystemsenior
- What is OpenTelemetry: API, SDK, Collector, OTLPjunior
- OTel signals, Semantic Conventions, and the OTLP wire formatmiddle
- The OTel Collector: receivers, processors, exporters, and deployment patternsmiddle
- Vendor neutrality, eBPF instrumentation, the Operator, and browser/serverless OTelsenior
- Operating the OTel Collector: reliability, version skew, failure modes, and governancesenior
- SLI, SLO, and the error budget: reliability by the numbersjunior
- Error budget policy, latency SLOs, and composite journeysmiddle
- Production SLO failures, self-observability, security, and the big picturesenior
- What is trace propagation and why broken propagation is worse than nonejunior
- traceparent and tracestate: the W3C header format in fullmiddle
- Baggage and async boundaries: carrying context across queues and callbacksmiddle
- Async context per language, service mesh, B3 migration, and securitysenior
- Production propagation failures, span links, and platform designsenior
- The debugging funnel: SLO → RED → trace → profilejunior
- OTel architecture: one SDK, four signals, one wire formatmiddle
- The incident loop: from pager to postmortem to preventionmiddle
- Scale, security, and the ROI of observable systemssenior
- Cache lines, struct layout, and false sharingmiddle
- SIMD, SoA vs AoS, and memory bandwidthmiddle
- Cache-oblivious algorithms, PGO, and production failuressenior
- GC in production: observability, security, edge cases, and fleet governancesenior
- Batching: amortize fixed cost per operationjunior
- The batching window: size and wait timemiddle
- Batching in Kafka and Postgresmiddle
- io_uring and observability of batchingmiddle
- From Nagle to io_uring: evolution of batchingmiddle
- Backpressure, failure isolation, and batch security in productionsenior
- CI enforcement and RUM: making budgets stickmiddle
- V8 JIT pipeline, HTTP priorities, and bundle securitysenior
- The performance loop: discipline, not a projectjunior
- Classify and fix: matching bottleneck families to remediesmiddle
- Observability stack and CI gates: catching regressions before they shipmiddle
- Incident to enforcement: SLO burn to verified fix in 35 minutesmiddle
- Culture, economics, and org-scale performancesenior
- At-most-once, at-least-once, exactly-once: the three delivery contractsjunior
- The three failure legs — where duplicates and losses actually happenmiddle
- Consumer-side dedup: the cheapest path to exactly-once processingmiddle
- Kafka exactly-once semantics: idempotent producer and transactionsmiddle
- SQS visibility timeout, DLQ, and the outbox patternmiddle
- Exactly-once in production: impossibility proof, hybrid patterns, and real incidentssenior
- What OAuth is and why passwords are not the answerjunior
- Authorization code flow with PKCEmiddle
- ID token validation and JWKS cache managementmiddle
- Refresh token rotation and scope-based least privilegemiddle
- Sender-constrained tokens: DPoP and mTLSsenior
- OAuth in production: audience attacks, observability, and real failuressenior