Distributed Systems
Quorums: build a tunable-consistency store
Reading that R + W > N forces overlap is not the same as watching a stale read appear the instant you tune below it. Build a small replicated store with R and W as dials, then make the guarantee — and each way it breaks — happen on demand, with evidence.
Turn the overlap invariant into something you can reproduce: implement quorum reads and writes over N=3 simulated replicas, prove R + W > N gives the latest write, then deliberately induce a W=1 lost write and a sloppy-quorum stale read and capture both.
Build a minimal N=3 replicated key-value store with per-operation tunable R and W, and produce a test report that empirically demonstrates the R + W > N overlap guarantee plus two of its silent failure modes — a W=1 lost write and a sloppy-quorum stale read.
- An automated test suite where each scenario (overlap-holds, W=1-lost-write, sub-overlap-drift, sloppy-stale-read, handoff-converges) is a named, repeatable test asserting the exact returned version — not a manual demo.
- A short report table: for each (R, W) pair tried, list R + W, whether R + W > 3, and the observed result (latest / possibly-stale / lost), matching theory to observation.
- The W=1 lost-write test reliably reproduces data loss of an acknowledged write, proving the failure is a configuration consequence and not a bug in your store.
- A one-paragraph write-up explaining, for each failure scenario, which guarantee was forfeited and the exact arithmetic (R + W vs N) or membership change (hint-holder outside read set) that caused it.
- Add read repair: when a quorum read sees disagreeing versions, push the newest to the stale replicas inline, and add a test showing the next read of the same key is consistent even at R=1.
- Add a quorum-read latency probe: make one replica slow and show that a QUORUM (R=2) read's latency tracks the second-fastest replica, then implement request hedging (fire a duplicate after a delay, take the first) and show the p99 drop.
- Add concurrent writers to the same key at W=2 and show overlap does NOT order them — both 'succeed' and you get sibling versions / last-write-wins — illustrating that overlap guarantees visibility, not serialisation.
- Map your dials to a real store: write the equivalent Cassandra consistency levels (ONE / QUORUM / ALL) or DynamoDB ConsistentRead flags for each scenario, and note where managed defaults would have hidden the bug.
This is the experiment that turns the overlap invariant from a formula into intuition: implement N=3 with tunable R and W, watch R + W > N reliably serve the latest write across any single-node failure, then tune below it and watch the silent failures appear on cue — a W=1 ack lost when its lone replica dies, a sub-overlap read picking the lagging replica, and a sloppy-quorum write hiding on a hint-holder outside the read set until hinted handoff converges it. Building it once, with the arithmetic mapped to the observed result, makes you the engineer who picks R and W deliberately instead of discovering the gap through a 3 a.m. stale-data ticket.