Databases
Row versions and snapshots: the on-disk mechanics
You know that MVCC gives each transaction a snapshot. The question is: what is a snapshot, exactly, and how does Postgres decide — for every single row on every single page scan — whether that row belongs in your snapshot or not?
Every row carries a header
When Postgres stores a tuple on a heap page, it prefixes the user columns with a 23-byte system header containing:
t_xmin— the transaction id that inserted this versiont_xmax— the transaction id that deleted or updated this version (zero if still live)t_cid— the command id within a transactiont_ctid— a pointer to the next version of this row (used for chaining updates)t_infomask/t_infomask2— bitmaps flagging commit status, lock state, freeze state, and HOT bits
These fields are how visibility decisions get made on every single tuple Postgres looks at.
| Field | Size | Meaning |
|---|---|---|
t_xmin | 4 B | Transaction id that created this version |
t_xmax | 4 B | Transaction id that deleted/updated; 0 = still live |
t_ctid | 6 B | Pointer to next version of this row |
t_infomask | 2 B | Commit status hint bits, lock flags |
How a snapshot decides what to read
When a transaction begins, Postgres builds a snapshot — a small structure with three numbers:
xmin— the oldest still-running transaction idxmax— one past the newest committed id at snapshot timexip— the list of in-progress transactions betweenxminandxmax
The visibility rule applied to every tuple is roughly: a tuple is visible if its t_xmin is committed and not in the in-progress list (xip), and its t_xmax is either zero, rolled back, or in the in-progress list. That is it. No locks involved.
- Under READ COMMITTED (the Postgres default), a fresh snapshot is taken at the start of each statement.
- Under REPEATABLE READ, the snapshot is taken once at the first statement of the transaction.
INSERT, UPDATE, DELETE — what actually happens on disk
INSERT places a fresh tuple with t_xmin = current_txid and t_xmax = 0.
UPDATE is two operations: marks the old tuple’s t_xmax = current_txid and inserts a new tuple with the new values (t_xmin = current_txid, t_xmax = 0). The old tuple’s t_ctid is chained to the new tuple’s page-and-offset.
DELETE just marks the existing tuple’s t_xmax = current_txid. Nothing is physically removed. All operations are logically reversible until commit and physically present until VACUUM.
A visibility walkthrough
Connection A runs BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = 42; — does not commit.
Connection B runs BEGIN; SELECT balance FROM accounts WHERE id = 42; at the same time.
Row id=42 now has two heap versions:
- v1: balance=100,
t_xmin= prior committed tx,t_xmax= A - v2: balance=0,
t_xmin= A,t_xmax= 0
B’s snapshot was taken at B’s BEGIN. A is in B’s xip.
- v1:
t_xminis committed (passes lower bound),t_xmax= A (in xip, deletion not visible) → v1 is visible - v2:
t_xmin= A (in xip, fails lower bound) → v2 is invisible
B reads balance=100. No locks, no waiting.
Trace one row across an UPDATE and a concurrent SELECT
1/3Trace what happens to a row's MVCC state under SELECT then UPDATE then DELETE in three separate transactions, with a fourth long-running SELECT keeping an old snapshot alive.
An UPDATE sets the row's t_xmax to the current transaction id and inserts a new tuple. What happens to the old tuple immediately after the transaction commits?
Under READ COMMITTED, two SELECT statements in the same transaction query the same row. Transaction A updates and commits between the two SELECTs. What does the second SELECT return?
- 01What are t_xmin and t_xmax in a Postgres tuple header, and what do they each mean?
- 02What three numbers make up a Postgres snapshot, and what is each for?
- 03Why is READ COMMITTED surprising if you do two SELECTs on the same row in one transaction?
Every Postgres tuple carries a 23-byte system header with t_xmin (who inserted it), t_xmax (who deleted it; zero if live), t_ctid (pointer to the next version), and t_infomask hint bits. A snapshot captures three numbers — xmin, xmax, xip — and the visibility rule is applied tuple-by-tuple: is this tuple’s creation transaction committed and not in xip? Is its deletion transaction absent, rolled back, or in xip? No locks are taken. Under READ COMMITTED a fresh snapshot is taken per statement; under REPEATABLE READ once per transaction. Every UPDATE creates a new tuple and marks the old one dead; dead tuples persist until VACUUM confirms no snapshot still references them.
- CLOG, XID wraparound, and MultiXact: deep visibility internalssenior
- SSI internals and production autovacuum tuningsenior
- Real-world MVCC failures, deployment patterns, and distributed snapshotssenior
- MVCC and isolation: diagnose bloat and a write-skew anomalysenior
- MVCC and isolation: multiple-choice reviewsenior
- MVCC and isolation: free-recall reviewsenior
appears again in140
- Why GraphQL gets N+1junior
- DataLoader mechanics: tick-boundary batchingmiddle
- Batch function contracts: ordering, shapes, errorsmiddle
- Federation and lookahead: batching beyond DataLoadermiddle
- Query complexity defences: depth, cost, persisted queriesmiddle
- Senior GraphQL API: scheduling contract, tenant isolation, observabilitysenior
- Why idempotency: making retries safejunior
- Server-side state machine: four states of an idempotency keymiddle
- Outbox and inbox: effectively-once across the dual-write boundarymiddle
- Concurrency and cache architecture for idempotency at scalesenior
- Observability, production failures, and global-scale designsenior
- The event loop: one thread, three queuesjunior
- Tasks, microtasks, and scheduler.yield()middle
- Microtask starvation, Long Tasks, and LoAFsenior
- Node.js event loop: phases, nextTick, and loop lagsenior
- React, Vue, and INP observability in productionsenior
- The render pipeline: six stages from bytes to pixelsjunior
- Stage costs and the renderer process modelmiddle
- Invalidation, dirty bits, and containmiddle
- Compositor layers: promotion, overlap, and GPU memorymiddle
- DevTools flame strip and the frame lifecyclemiddle
- Layout thrash: forced synchronous layoutsenior
- BeginMainFrame, compositor-driven animations, and GPU memorysenior
- Production observability: LoAF, INP, and the full attack surfacesenior
- What V8 is and why performance varies 100×junior
- V8''''s four-tier JIT pipeline and profile-guided tieringmiddle
- Hidden classes, transition trees, and memory layoutmiddle
- Inline caches, IC states, and deoptimizationmiddle
- Orinoco GC: parallel scavenger, concurrent marking, and write barriersmiddle
- TurboFan''''s speculative engine and the deopt-loop trapsenior
- V8 in production: isolates, pointer compression, and real failuressenior
- Service worker lifecycle and cache strategiesmiddle
- Service worker edge cases: version skew, durability, and navigation trapssenior
- What the reconciler does: render vs commitjunior
- The fiber object and the double-buffer treemiddle
- Render phase purity and commit phase sub-stepsmiddle
- Reconciliation: diffing heuristics and the key trapmiddle
- Priority lanes, time-slicing, and useTransitionmiddle
- Bailout, memoisation, and tearingsenior
- React Profiler, the Compiler, and production observabilitysenior
- Rendering strategies: SSG, SSR, ISR, streaming, and hydrationjunior
- SSG, SSR, ISR, streaming, and RSC — how each worksmiddle
- Hydration cost: selective, progressive, islands, resumabilitymiddle
- Hydration mismatch: causes, detection, and the determinism rulesenior
- RSC, per-route strategy, and production observabilitysenior
- Core Web Vitals: what LCP, INP, and CLS measurejunior
- CLS: why layout shifts happen and how to stop themmiddle
- Metric tradeoffs, RUM attribution, and the CI+field loopsenior
- The full picture: URL to LCP to INP as a relay racejunior
- Eight layers traced: from the service worker to the second navigationmiddle
- Five canonical breaks: where production reliably diessenior
- The three-track method: reading traces and building a monitored systemsenior
- What is a cache stampede and why it makes things worsejunior
- Lock and single-flight: bounding concurrent rebuildsmiddle
- XFetch: coordination-free probabilistic early expirationmiddle
- Stale-while-revalidate and CDN request coalescingmiddle
- Detecting stampedes and designing TTL for productionmiddle
- Metastable failure, fencing tokens, and production postmortemssenior
- Raft roles, terms, and why majority quorums prevent split brainjunior
- How Raft replicates a log entry and decides it is safe to commitmiddle
- Raft leader election: timeouts, voting rules, and the four safety propertiesmiddle
- Raft in the real world: partitions, slow disks, and client routingmiddle
- Raft extensions: pre-vote, learners, snapshots, and linearizable readssenior
- Raft in production: membership changes, Multi-Raft, and observabilitysenior
- Where data fetching happens — and why it decides LCPjunior
- Fetch waterfalls — diagnosis and the Promise.all curemiddle
- React Server Components and Suspense streamingmiddle
- Client-side cache: TanStack Query, SWR, and stale-while-revalidatemiddle
- LCP, prefetch, and race conditions in interactive fetchingmiddle
- Senior internals: RSC payload, caching layers, and production failure modessenior
- The three-way handshakejunior
- Sequence numbers and connection statemiddle
- DNS: what it does and why it existsjunior
- The resolver walk: referrals, record types, and gluemiddle
- TTL, caching, and DNS propagationmiddle
- The 1-RTT handshake: key shares and ECDHEmiddle
- Session resumption and 0-RTTmiddle
- WebSocket: the HTTP upgrade handshakejunior
- WebSocket frame format: opcodes, masking, fragmentationmiddle
- WebSocket backpressure: when clients can''''t keep upmiddle
- Reconnection: jittered backoff, thundering herd, message resumptionsenior
- WebSocket at scale: HTTP/2 multiplexing, permessage-deflate, C10Msenior
- WebSocket in production: proxies, security, and distributed architecturesenior
- What reverse proxies dojunior
- Health checks, connection draining, and slow startmiddle
- Session affinity, consistent hashing, and the right fixmiddle
- Retry storms, circuit breakers, and load sheddingsenior
- Resilient LB architecture: anycast, zone-aware routing, and observabilitysenior
- Why QUIC and not TCP+TLSjunior
- Connection IDs and network migrationmiddle
- 0-RTT resumption and packet encryptionsenior
- DDoS: what it is and why it worksjunior
- Amplification attacks and state exhaustionmiddle
- Rate limiting: algorithms and architecturemiddle
- WAFs, firewalls, mTLS, and HSTSmiddle
- DNS cache poisoning and BGP hijackingsenior
- Defense-in-depth architecture and attack economicssenior
- DNS, TCP, TLS in sequence: where the milliseconds gomiddle
- Proxy intercepts and security gates: rate limiters, WAF, mTLSmiddle
- Alternate paths: QUIC 0-RTT, WebSocket upgrade, connection migrationmiddle
- Observability: distributed traces, USE/RED, and samplingsenior
- Resilience: cascading retries, circuit breakers, and error budgetssenior
- What the three signals are: logs, metrics, and tracesjunior
- Why structured logs exist: the diary vs the spreadsheetjunior
- The production log schema: fields every line must carrymiddle
- PII redaction and log injectionsenior
- OTel Logs Data Model and audit logs as a subsystemsenior
- SLI, SLO, and the error budget: reliability by the numbersjunior
- Error budget policy, latency SLOs, and composite journeysmiddle
- Production SLO failures, self-observability, security, and the big picturesenior
- The incident loop: from pager to postmortem to preventionmiddle
- Cache lines, struct layout, and false sharingmiddle
- SIMD, SoA vs AoS, and memory bandwidthmiddle
- Cache-oblivious algorithms, PGO, and production failuressenior
- GC in production: observability, security, edge cases, and fleet governancesenior
- Batching: amortize fixed cost per operationjunior
- The batching window: size and wait timemiddle
- Batching in Kafka and Postgresmiddle
- io_uring and observability of batchingmiddle
- From Nagle to io_uring: evolution of batchingmiddle
- Backpressure, failure isolation, and batch security in productionsenior
- CI enforcement and RUM: making budgets stickmiddle
- V8 JIT pipeline, HTTP priorities, and bundle securitysenior
- The performance loop: discipline, not a projectjunior
- Classify and fix: matching bottleneck families to remediesmiddle
- Observability stack and CI gates: catching regressions before they shipmiddle
- Incident to enforcement: SLO burn to verified fix in 35 minutesmiddle
- Culture, economics, and org-scale performancesenior
- At-most-once, at-least-once, exactly-once: the three delivery contractsjunior
- The three failure legs — where duplicates and losses actually happenmiddle
- Consumer-side dedup: the cheapest path to exactly-once processingmiddle
- Kafka exactly-once semantics: idempotent producer and transactionsmiddle
- SQS visibility timeout, DLQ, and the outbox patternmiddle
- Exactly-once in production: impossibility proof, hybrid patterns, and real incidentssenior
- What OAuth is and why passwords are not the answerjunior
- Authorization code flow with PKCEmiddle
- ID token validation and JWKS cache managementmiddle
- Refresh token rotation and scope-based least privilegemiddle
- Sender-constrained tokens: DPoP and mTLSsenior
- OAuth in production: audience attacks, observability, and real failuressenior