Distributed Systems
CAP in practice: build and partition a replicated store
Reading that ‘a partition forces a CP-or-AP choice’ is not the same as watching your own writes diverge and reconciling them by hand. Build a tiny replicated key-value store, sever the link between replicas, and make the same system behave as CP and then as AP — with the request log as your evidence.
Turn the unit’s mental model into something you can run: prove that P is not optional, that the CP and AP stances are observable behaviours, that strong consistency costs latency even when healthy (PACELC), and that AP’s conflict-resolution tax is real.
Build a 3-node replicated key-value store you can run locally, inject a controllable network partition between replicas, and demonstrate — with logged reads, writes, and latencies — both a CP configuration and an AP configuration on the same code, then reconcile the AP divergence correctly.
- A request log (or transcript) showing the CP run: minority side errors/timeouts during the partition, majority serves, and no stale/divergent read is ever observed.
- A request log showing the AP run: both sides accept writes during the partition, producing two divergent values for the same key — captured, not described.
- A reconciliation transcript proving concurrent writes are merged (vector-clock siblings or CRDT merge), with an explicit demonstration that naive timestamp LWW would have dropped one of them.
- A latency table: healthy-state p50 for strong (EC) vs weak (EL) settings, showing the consistency-for-latency trade PACELC predicts.
- The README correctly attributing each behaviour to the CAP/PACELC mechanism that caused it.
- Add an asymmetric partition (A reaches B, but B's replies to A are dropped) and show how it differs from a clean split — e.g. a node that keeps trying to act on a one-way view.
- Inject a logical partition: pause one replica's process (SIGSTOP) for longer than the heartbeat timeout and show peers treating a healthy-but-slow node as failed, triggering a re-election or eviction.
- Add a quorum-leader (Raft-style) mode with pre-vote and a tunable election timeout, and demonstrate how a too-low timeout under jittery latency manufactures false elections.
- Expose per-request consistency (like Cassandra's CONSISTENCY level) and show a single client sliding the same key between CP-ish and AP-ish behaviour query by query.
This is the loop behind every real distributed-systems design review: P is not a choice, so you decide CP-or-AP per partition by setting quorum and consistency knobs; you prove the stance with observed reads and writes, not adjectives; you pay PACELC’s latency tax whenever you demand consistency in the healthy state; and AP’s reconciliation is only safe with real merge logic, never wall-clock LWW. Building it once on a toy store turns the theorem into instinct.