awesome-everything RU
↑ Back to the climb

Distributed Systems

Raft roles, terms, and why majority quorums prevent split brain

Crux Raft''''s three node roles, the monotonic term counter, and the quorum rule that makes it impossible for two leaders to exist at once.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at junior altitude — the surface
◷ 10 min

Your Kubernetes cluster is backed by a 5-node etcd cluster. One node loses power mid-morning. You run kubectl get pods and it works fine — no error, no stall. How is there still agreement with a node down?

The job: one machine from many

A distributed system’s hardest problem is agreement. If five nodes each accept writes independently, you get five conflicting histories. Raft’s job is to make those five nodes behave like one: same order of changes, same state, no writes lost. It does this by electing exactly one leader at a time and routing all changes through that leader.

Three roles

Every Raft node is in exactly one of three states:

  • Follower — the default state. Receives and stores log entries from the leader. Does not accept client writes directly.
  • Candidate — a follower that has stopped hearing from a leader and is now running for leadership. Temporary state, lasts until the election resolves.
  • Leader — the node clients send writes to. Drives replication to all followers. At most one leader per term exists in a healthy cluster.

A node starts as a follower. It becomes a candidate when its election timeout fires. It becomes a leader if it wins a majority vote.

The term: a monotonic logical clock

Raft tracks time not with wall clocks but with terms — monotonically increasing integers. Each term begins with an election. If a leader wins, it leads for the whole term. If no leader emerges (split vote), the term ends and a new one starts.

The term has two jobs:

  1. Deduplication. When a message arrives, nodes compare the sender’s term to their own. A higher term always wins — the receiver updates its term and steps down to follower if needed. This resolves stale-leader confusion instantly.
  2. Ordering. Every log entry is tagged with the term it was written in. This tag is used later to detect log divergence.
TermWhat happened
1Node A elected leader. Served 30 s.
2A lost network briefly. B won election.
3B crashed. C won election.
4C still leader — no new election needed.

Terms are never reused. If you see term 7, every message from term 6 is stale.

Majority quorum: the split-brain barrier

Raft requires a majority (more than half the cluster) for two operations: elections and commits. In a 5-node cluster the majority is 3.

Why majority specifically? The key property is overlap: any two majorities of the same set share at least one node. In a 5-node cluster, if one set of 3 commits an entry and a different set of 3 elects a new leader, those two sets cannot be disjoint — they share a node. Through that shared node, the new leader is guaranteed to have seen the committed entry.

If Raft used simple plurality (2 of 5) instead of majority, two separate groups of 2 could each believe they are authoritative — split brain. Majority prevents this.

Failure tolerance: a cluster of N nodes tolerates floor((N-1)/2) simultaneous failures. 5 nodes → 2 failures. 3 nodes → 1 failure. This is why Raft clusters are 3, 5, or 7 nodes — odd numbers maximize tolerance for a given size.

Quiz

A 5-node Raft cluster is split: DC A has the leader and 2 followers (3 nodes), DC B has 2 followers. The link between DCs is cut. What happens?

Quiz

Why does Raft require a majority (3 of 5), not just any 2 of 5, for both elections and commits?

Order the steps

Put the Raft leader-election steps in order:

  1. 1 Followers stop receiving heartbeats for longer than the election timeout
  2. 2 A follower transitions to candidate, increments its term, and votes for itself
  3. 3 The candidate sends RequestVote RPCs to all other nodes
  4. 4 Each node grants its vote at most once per term, to the first eligible candidate
  5. 5 The candidate collects a majority of votes and becomes leader for the new term
  6. 6 The new leader starts sending heartbeats to assert its authority
Complete the analogy

Fill in the blank: Raft uses a council of N members where only one member at a time holds the _______ and proposes new laws.

Recall before you leave
  1. 01
    Why does a 5-node Raft cluster survive 2 simultaneous failures but not 3?
  2. 02
    What is a Raft term and why does it replace wall-clock time?
  3. 03
    A node in a Raft cluster has been offline for 10 minutes. It comes back with term 4, but the cluster is now on term 9. What happens when it sends a message?
Recap

Raft assigns each node one of three roles — follower, candidate, or leader — and exactly one leader exists per term. The term is a monotonic logical clock that resolves stale-leader confusion: higher term always wins. Both elections and commits require a majority quorum, which guarantees that any two quorums share at least one node — making it impossible for two separate leaders to both commit conflicting entries. A 5-node cluster tolerates 2 simultaneous failures; 3 failures drop the surviving 2 below the majority threshold and halt progress until recovery. The next lesson covers how the leader actually replicates writes to followers.

Connected lessons
appears again in178
Continue the climb ↑How Raft replicates a log entry and decides it is safe to commit
shortcuts expand
search
K
prev piece
k
next piece
j
cycle tier
t
this menu
?
sources3
expand
  1. 01
  2. 02
  3. 03

Trademarks belong to their respective owners. Editorial reference only.