Distributed Systems DIST · 01 · 10

CAP in practice: build and partition a replicated store

Hands-on project — build a small replicated KV store, inject a partition, and demonstrate CP and AP behaviour from the same code, proving each stance with observed reads, writes, and reconciliation.

DIST Senior ◷ 240 min

Level

FoundationsJuniorMiddleSenior

Reading that ‘a partition forces a CP-or-AP choice’ is not the same as watching your own writes diverge and reconciling them by hand. Build a tiny replicated key-value store, sever the link between replicas, and make the same system behave as CP and then as AP — with the request log as your evidence.

Goal

Turn the unit’s mental model into something you can run: prove that P is not optional, that the CP and AP stances are observable behaviours, that strong consistency costs latency even when healthy (PACELC), and that AP’s conflict-resolution tax is real.

Project

0 of 7

Objective

Build a 3-node replicated key-value store you can run locally, inject a controllable network partition between replicas, and demonstrate — with logged reads, writes, and latencies — both a CP configuration and an AP configuration on the same code, then reconcile the AP divergence correctly.

Requirements

Acceptance criteria

A request log (or transcript) showing the CP run: minority side errors/timeouts during the partition, majority serves, and no stale/divergent read is ever observed.
A request log showing the AP run: both sides accept writes during the partition, producing two divergent values for the same key — captured, not described.
A reconciliation transcript proving concurrent writes are merged (vector-clock siblings or CRDT merge), with an explicit demonstration that naive timestamp LWW would have dropped one of them.
A latency table: healthy-state p50 for strong (EC) vs weak (EL) settings, showing the consistency-for-latency trade PACELC predicts.
The README correctly attributing each behaviour to the CAP/PACELC mechanism that caused it.

Senior stretch

Add an asymmetric partition (A reaches B, but B's replies to A are dropped) and show how it differs from a clean split — e.g. a node that keeps trying to act on a one-way view.
Inject a logical partition: pause one replica's process (SIGSTOP) for longer than the heartbeat timeout and show peers treating a healthy-but-slow node as failed, triggering a re-election or eviction.
Add a quorum-leader (Raft-style) mode with pre-vote and a tunable election timeout, and demonstrate how a too-low timeout under jittery latency manufactures false elections.
Expose per-request consistency (like Cassandra's CONSISTENCY level) and show a single client sliding the same key between CP-ish and AP-ish behaviour query by query.

Recap

This is the loop behind every real distributed-systems design review: P is not a choice, so you decide CP-or-AP per partition by setting quorum and consistency knobs; you prove the stance with observed reads and writes, not adjectives; you pay PACELC’s latency tax whenever you demand consistency in the healthy state; and AP’s reconciliation is only safe with real merge logic, never wall-clock LWW. Building it once on a toy store turns the theorem into instinct. Now when you review a system’s consistency claims, you can ask for the request log — and you know exactly what to look for.

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.