Distributed Systems DIST · 02 · 01

Raft roles, terms, and why majority quorums prevent split brain

Raft''''s three node roles, the monotonic term counter, and the quorum rule that makes it impossible for two leaders to exist at once.

DIST Junior ◷ 10 min

Level

FoundationsJuniorMiddleSenior

Already know this unit? Take a 1-minute quick check →

Your Kubernetes cluster is backed by a 5-node etcd cluster. One node loses power mid-morning. You run kubectl get pods and it works fine — no error, no stall. How is there still agreement with a node down?

The job: one machine from many

A distributed system’s hardest problem is agreement. If five nodes each accept writes independently, you get five conflicting histories. Raft’s job is to make those five nodes behave like one: same order of changes, same state, no writes lost. It does this by electing exactly one leader at a time and routing all changes through that leader.

Three roles

Understanding the three roles is what lets you read a Raft status page — or a post-mortem — and immediately know which invariant held and which one broke.

Every Raft node is in exactly one of three states:

Follower — the default state. Receives and stores log entries from the leader. Does not accept client writes directly.
Candidate — a follower that has stopped hearing from a leader and is now running for leadership. Temporary state, lasts until the election resolves.
Leader — the node clients send writes to. Drives replication to all followers. At most one leader per term exists in a healthy cluster.

A node starts as a follower. It becomes a candidate when its election timeout fires. It becomes a leader if it wins a majority vote.

The term: a monotonic logical clock

Raft tracks time not with wall clocks but with terms — monotonically increasing integers. Each term begins with an election. If a leader wins, it leads for the whole term. If no leader emerges (split vote), the term ends and a new one starts.

The term has two jobs:

Deduplication. When a message arrives, nodes compare the sender’s term to their own. A higher term always wins — the receiver updates its term and steps down to follower if needed. This resolves stale-leader confusion instantly.
Ordering. Every log entry is tagged with the term it was written in. This tag is used later to detect log divergence.

Together, these two jobs mean the term is the single shared clock that makes “who is in charge right now?” always answerable without any wall-clock agreement — drop either job and you get either a zombie leader or an undetectable log split.

Term	What happened
1	Node A elected leader. Served 30 s.
2	A lost network briefly. B won election.
3	B crashed. C won election.
4	C still leader — no new election needed.

Terms are never reused. If you see term 7, every message from term 6 is stale.

Majority quorum: the split-brain barrier

Raft requires a majority (more than half the cluster) for two operations: elections and commits. In a 5-node cluster the majority is 3.

Why majority specifically? The key property is overlap: any two majorities of the same set share at least one node. In a 5-node cluster, if one set of 3 commits an entry and a different set of 3 elects a new leader, those two sets cannot be disjoint — they share a node. Through that shared node, the new leader is guaranteed to have seen the committed entry.

If Raft used simple plurality (2 of 5) instead of majority, two separate groups of 2 could each believe they are authoritative — split brain. Majority prevents this.

Failure tolerance: a cluster of N nodes tolerates floor((N-1)/2) simultaneous failures. 5 nodes → 2 failures. 3 nodes → 1 failure. This is why Raft clusters are 3, 5, or 7 nodes — odd numbers maximize tolerance for a given size.

Each step of 2 added nodes buys exactly one more tolerated failure — the reason production Raft clusters are 3, 5, or 7 nodes, never even.

Quiz

A 5-node Raft cluster is split: DC A has the leader and 2 followers (3 nodes), DC B has 2 followers. The link between DCs is cut. What happens?

Quiz

Why does Raft require a majority (3 of 5), not just any 2 of 5, for both elections and commits?

Order the steps

Put the Raft leader-election steps in order:

1 Followers stop receiving heartbeats for longer than the election timeout
2 A follower transitions to candidate, increments its term, and votes for itself
3 The candidate sends RequestVote RPCs to all other nodes
4 Each node grants its vote at most once per term, to the first eligible candidate
5 The candidate collects a majority of votes and becomes leader for the new term
6 The new leader starts sending heartbeats to assert its authority

Complete the analogy

Fill in the blank: Raft uses a council of N members where only one member at a time holds the _______ and proposes new laws.

A node starts as Follower, becomes Candidate when its election timeout fires, and becomes Leader only after winning a majority vote. A higher term seen by a leader forces it back to Follower.

Recall before you leave

01
Why does a 5-node Raft cluster survive 2 simultaneous failures but not 3?
02
What is a Raft term and why does it replace wall-clock time?
03
A node in a Raft cluster has been offline for 10 minutes. It comes back with term 4, but the cluster is now on term 9. What happens when it sends a message?

Recap

Raft assigns each node one of three roles — follower, candidate, or leader — and exactly one leader exists per term. The term is a monotonic logical clock that resolves stale-leader confusion: higher term always wins. Both elections and commits require a majority quorum, which guarantees that any two quorums share at least one node — making it impossible for two separate leaders to both commit conflicting entries. A 5-node cluster tolerates 2 simultaneous failures; 3 failures drop the surviving 2 below the majority threshold and halt progress until recovery. The next lesson covers how the leader actually replicates writes to followers. Now when you see etcd report “leader changed” or “no leader elected,” you know to check which of these three invariants — one leader per term, majority quorum, term monotonicity — was violated and why.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 6 done

Connected lessons

unlocks

How Raft replicates a log entry and decides it is safe to commitmiddle

deepens into

How Raft replicates a log entry and decides it is safe to commitmiddle

appears again in204

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.