Distributed Systems
Quorums: multiple-choice review
Six questions that cut across the whole unit. Each is a decision you make when you size a cluster or debug a stale-read ticket — not a definition to recite, but R, W, and N arithmetic under real failures.
Confirm you can derive the overlap guarantee from R + W > N, place a config on the consistency/availability/latency triangle, and predict exactly which guarantee sloppy quorum forfeits during a partition.
A cluster runs RF=5. You need every read guaranteed to see the latest successful write while tolerating as many node failures as possible. What is the minimal (R, W) that still gives strong consistency, and how many simultaneous node failures does it survive on each path?
A payments service at RF=3 runs writes at W=1 'for latency.' One night a node acks a payout-account update, then crashes during a deploy before replicating. The next read serves the old account number. What actually went wrong?
Same RF=3 cluster, configured strict W=2, R=2. During a network partition a write lands via sloppy quorum on hint-holding substitute nodes, and a strict read on the home replicas returns stale data. Why does W=2, R=2 not save you here?
A team argues 'R=1, W=2 is fine at N=3 because the write is already on a majority.' What is wrong with the reasoning?
At RF=3 a QUORUM read (R=2) shows p99 latency far worse than its p50, while the cluster's per-node latencies look healthy. What is the mechanism, and what mitigation targets it directly?
Overlap (R + W > N) guarantees a quorum read sees the latest committed write. Which of these does it NOT guarantee?
The through-line is one inequality: R + W > N forces the read and write replica sets to intersect, so a quorum read sees the latest committed write — and the moment you tune below it (R=1,W=2 summing to N, or W=1), stale reads and lost writes become legal, not bugs. QUORUM (W=2,R=2 at N=3; W=3,R=3 at N=5) is the production default because it keeps overlap while tolerating one (or two at N=5) node down on both paths, unlike ALL. R is also a p99 knob — a quorum read waits on its slowest required replica, which is why hedging exists. And sloppy quorum buys write availability during partitions by parking writes on hint-holders outside the read set, suspending overlap exactly when a node is down; read repair and Merkle-tree anti-entropy converge it afterward, usually.