Distributed Systems DIST · 01 · 01

CAP in practice: consistency vs availability during inevitable partitions

Network partitions are inevitable. The only choice you get is to trade off consistency (correctness) against availability (responsiveness) during partitions. PACELC maps this trade-off when the system is healthy, trading latency for consistency.

DIST Junior ◷ 15 min

Level

FoundationsJuniorMiddleSenior

Already know this unit? Take a 1-minute quick check →

An undersea fiber cable between Virginia and Dublin is severed by a deep-sea trawler. Instantly, cross-Atlantic round-trip latency spikes from 67 ms to a black hole of 100% packet loss. In Virginia, database replicas continue to accept customer orders, incrementing account balances. In Dublin, users click “Buy Now” and are met with either a spinning loader that eventually times out with an HTTP 504, or they successfully place orders that read stale, pre-partition balances—causing massive overdrafts. The engineers on-call did not choose to have a partition; physics forced it upon them. Their only actual choice was how their code behaved during those three hours of network silence.

In the next fifteen minutes you will see why “pick two” is a dangerous myth, how the formal definitions of C and A differ from marketing copy, and what PACELC adds for the 99.9% of time when the network is healthy.

The Myth of “Pick Two”

For decades, the CAP Theorem has been taught as a simple menu: pick any two out of Consistency (C), Availability (A), and Partition Tolerance (P). This is a dangerous simplification.

In a physical network, you do not get to choose “Partition Tolerance”. The network is a shared, imperfect medium. Fibers are cut, switches experience bufferbloat, BGP routes flap, and virtualized network interfaces experience transient pauses during hypervisor migrations. Because partitions are inevitable, P is mandatory.

Therefore, the only real choice is binary and only arises when a partition occurs:

Choose Consistency (CP): Deny the request or return an error to ensure that no stale or conflicting data is ever read or written. Correctness is absolute.
Choose Availability (AP): Accept the request and return whatever local data is available, even if it is stale or will conflict with another partition. Responsiveness is absolute.

P is mandatory. During a partition the only real choice is per-side: CP returns an error to stay linearizable; AP serves stale local data to stay responsive.

The binary choice in full: CP buys correctness by returning an error; AP buys responsiveness by serving stale data and paying the conflict-resolution tax later.

Formal Calibration: What “C” and “A” Actually Mean

To make sound architectural decisions, a senior engineer must look past the acronyms and understand the exact mathematical definitions established by Seth Gilbert and Nancy Lynch in their formal proof of Brewer’s conjecture:

Consistency (C) is Linearizability: This is a very strict safety guarantee. It requires that there exists a global, real-time ordering of all read and write operations. The moment a write completes successfully, any subsequent read—no matter where in the world it is executed—must return that new value or a newer one. It prevents any reader from observing stale state.
Availability (A) is High Responsiveness: Every non-failing node in the cluster must return a non-error response to every received request. Crucially, the system cannot block indefinitely, nor can it return an error (such as a database timeout or an HTTP 503 Service Unavailable). Returning stale data is considered fully “available” in the CAP sense.
Partition Tolerance (P): The system continues to function despite arbitrary message loss or delays across network boundaries.

Under this formal definition, most databases that claim to be “highly available” in marketing materials are actually CP systems. For example, in a Raft consensus cluster (like etcd or Consul), if a network partition isolates the leader from the majority of the nodes, those minority nodes will reject all writes and reads to preserve linearizability. They choose correctness over responsiveness—they are CP.

The PACELC Theorem: The Healthy-State Trade-off

While CAP is useful, it has a glaring limitation: it only describes system behavior during a partition. In production environments, networks are healthy 99.9% of the time.

To address this, Daniel Abadi formulated the PACELC Theorem in 2012. It extends CAP by stating:

Partition → choose Availability or Consistency; Else → choose Latency or Consistency.

PACELC forces us to evaluate the cost of our consistency models during normal, healthy operations. If you require strong consistency normally (EC), every write or read must undergo synchronous coordination (e.g., waiting for acknowledgments from multiple replicas across datacenters). This directly adds network round-trip times (RTT) to the request latency. If you choose low latency (EL), you allow replicas to be updated asynchronously, meaning you accept eventual consistency during normal operations.

Database	PACELC Type	Behavior During Partition (P/A vs C)	Behavior During Normal Operations (E/L vs C)
Google Spanner	PC/EC	Chooses Consistency. Rejects writes if a quorum cannot be reached.	Chooses Consistency. Uses TrueTime and Paxos groups synchronously, adding minor latency.
Amazon DynamoDB	PA/EL	Chooses Availability. Replicas accept writes independently.	Chooses Latency. Reads are eventually consistent by default to achieve sub-10ms latency.
MongoDB	PC/EC (Default)	Chooses Consistency. Minority partitions step down primary, blocking writes.	Chooses Consistency. By default, directs reads/writes to primary for linearizability.

Production Failure Modes & False Partitions

When you are on-call at 3 am, the partition you get is rarely the clean textbook cut. Ask yourself: does your system handle these three scenarios, or does it silently misbehave?

In the real world, partitions are rarely clean, binary cuts where half the nodes can talk and half cannot. A senior engineer must design for these subtle production failure modes:

Asymmetric Partitions: Node A can send packets to Node B, but Node B’s replies are dropped due to a faulty switch port. Consensus systems like Raft must use mechanisms like “Pre-Vote” to prevent Node B from endlessly incrementing its term and disrupting the healthy nodes.
Logical Partitions (High Latency): If a Java virtual machine hosting a database node experiences a 10-second stop-the-world Garbage Collection pause, or if a background cron job saturates the CPU, the node stops responding to heartbeats. To its peers, this node is partitioned. The remaining nodes will trigger a costly leader re-election, even though the old leader was healthy and just temporarily busy.
The AP Resolution Tax: If you choose AP (like Cassandra or DynamoDB), you must pay the write conflict resolution tax. If a user modifies their shopping cart in two different partitions, your system must eventually reconcile those divergent states. If you use simple Last-Write-Wins (LWW) based on wall-clock timestamps, slightly unsynchronized system clocks can cause you to silently delete legitimate customer writes. The alternative is using complex conflict-free replicated data types (CRDTs) or vector clocks, which adds substantial architectural complexity.

Together these three modes mean that the boundary between “partition” and “healthy slowness” is blurred by design; if you size your failure detectors only for clean cuts, asymmetric and logical partitions will erode both CP safety and AP conflict budgets in ways that never trigger a clean alert.

▸Why this works

Eric Brewer clarified in 2012 that the classic “choose two” framing is misleading. The goal is to maximize consistency and availability, but when a physical partition occurs, you must explicitly manage the trade-off. Modern systems are highly tunable; they allow you to set write and read concerns per query, shifting the system dynamically along the CAP/PACELC spectrum.

Pick the best fit

You are designing a globally distributed multi-region inventory system for a high-volume ticket booking platform. Overselling a seat is a catastrophic business failure. Which architectural choice aligns with PACELC?

Quiz

Under the formal CAP theorem proof by Gilbert & Lynch, what is the precise definition of 'Availability'?

Quiz

Why can a severe garbage collection pause or high CPU saturation trigger a 'logical' network partition in a CP consensus cluster?

P is mandatory once the link is cut. The only real choice is per-side: CP returns an error to stay correct, AP returns stale data to stay responsive.

Recall before you leave

01
Explain why 'Partition Tolerance' is not a choice you can make during the design phase of a distributed system.
02
What is the key difference between the CAP theorem and the PACELC theorem, and how does PACELC apply to healthy states?

Recap

Physical network partitions are inevitable, making Partition Tolerance (P) a non-optional requirement for any distributed system. The CAP theorem formally dictates that when a partition occurs, you must choose between Consistency (linearizable correctness) and Availability (every non-failing node responding successfully). Systems like etcd or Spanner choose CP, blocking operations to guarantee truth; systems like Cassandra or DynamoDB choose AP, remaining responsive at the cost of eventual consistency and conflict resolution. The PACELC theorem extends this model to healthy states: when no partition exists, you must still trade off Latency against Consistency. A senior engineer avoids marketing generalizations, calibrates systems using these strict definitions, and tunes quorums to match the exact durability and liveness requirements of the workload. Now when you evaluate a vendor’s “fully consistent and always available” claim, you have the formal tools to immediately identify the partition scenario where that promise must break — and to ask which side of the split they chose to sacrifice.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 5 done

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.

Apply this

Put this lesson to work on a real build.

Collaborative cursorsShow every connected user's live cursor and selection in a shared document, conflict-free, over WebSocket.