Distributed Systems DIST · 03 · 10

Quorums: build a tunable-consistency store

Hands-on project: build a tiny N=3 replicated key-value store with tunable R/W, then empirically demonstrate the overlap guarantee, a W=1 lost write, and a sloppy-quorum stale read.

DIST Senior ◷ 240 min

Level

FoundationsJuniorMiddleSenior

Reading that R + W > N forces overlap is not the same as watching a stale read appear the instant you tune below it. Build a small replicated store with R and W as dials, then make the guarantee — and each way it breaks — happen on demand, with evidence.

Goal

Turn the overlap invariant into something you can reproduce: implement quorum reads and writes over N=3 simulated replicas, prove R + W > N gives the latest write, then deliberately induce a W=1 lost write and a sloppy-quorum stale read and capture both.

Project

0 of 7

Objective

Build a minimal N=3 replicated key-value store with per-operation tunable R and W, and produce a test report that empirically demonstrates the R + W > N overlap guarantee plus two of its silent failure modes — a W=1 lost write and a sloppy-quorum stale read.

Requirements

Acceptance criteria

An automated test suite where each scenario (overlap-holds, W=1-lost-write, sub-overlap-drift, sloppy-stale-read, handoff-converges) is a named, repeatable test asserting the exact returned version — not a manual demo.
A short report table: for each (R, W) pair tried, list R + W, whether R + W > 3, and the observed result (latest / possibly-stale / lost), matching theory to observation.
The W=1 lost-write test reliably reproduces data loss of an acknowledged write, proving the failure is a configuration consequence and not a bug in your store.
A one-paragraph write-up explaining, for each failure scenario, which guarantee was forfeited and the exact arithmetic (R + W vs N) or membership change (hint-holder outside read set) that caused it.

Senior stretch

Add read repair: when a quorum read sees disagreeing versions, push the newest to the stale replicas inline, and add a test showing the next read of the same key is consistent even at R=1.
Add a quorum-read latency probe: make one replica slow and show that a QUORUM (R=2) read's latency tracks the second-fastest replica, then implement request hedging (fire a duplicate after a delay, take the first) and show the p99 drop.
Add concurrent writers to the same key at W=2 and show overlap does NOT order them — both 'succeed' and you get sibling versions / last-write-wins — illustrating that overlap guarantees visibility, not serialisation.
Map your dials to a real store: write the equivalent Cassandra consistency levels (ONE / QUORUM / ALL) or DynamoDB ConsistentRead flags for each scenario, and note where managed defaults would have hidden the bug.

Recap

This is the experiment that turns the overlap invariant from a formula into intuition: implement N=3 with tunable R and W, watch R + W > N reliably serve the latest write across any single-node failure, then tune below it and watch the silent failures appear on cue — a W=1 ack lost when its lone replica dies, a sub-overlap read picking the lagging replica, and a sloppy-quorum write hiding on a hint-holder outside the read set until hinted handoff converges it. Building it once, with the arithmetic mapped to the observed result, makes you the engineer who picks R and W deliberately instead of discovering the gap through a 3 a.m. stale-data ticket.

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.