Distributed Systems DIST · 02 · 03

Raft leader election: timeouts, voting rules, and the four safety properties

How randomised timeouts prevent repeated split votes, why the candidate log-completeness rule preserves committed entries across leader changes, and the four invariants that make Raft correct.

DIST Middle ◷ 14 min

Level

FoundationsJuniorMiddleSenior

Five followers all notice the leader is silent at the same instant. All five start an election simultaneously. Each votes for itself. Nobody wins. They all start again. This is why Raft elections use randomised timeouts — and why the voting rule is more nuanced than “first come, first served.”

The election timeout and heartbeats

A leader asserts its authority by sending AppendEntries heartbeats to every follower at a fixed interval — typically 50 ms. Each follower has an election timeout that resets on every valid heartbeat. If the timeout fires (no heartbeat received), the follower assumes the leader is dead and starts an election.

The timeout is randomised — typically in a range like 150–300 ms. Without randomisation, all followers would time out simultaneously, each vote for itself, and the vote would split. With a wide-enough range, one follower fires first with high probability, sends RequestVote to all others, and collects a majority before anyone else times out. Ties are still possible but become rare; any tie just starts a fresh term and a new random timeout resolves it quickly.

Fixed timeouts cause every follower to fire at once, splitting votes. A random range (150–300 ms) makes one node fire first with high probability, collecting a majority before anyone else starts.

The RequestVote voting rule

A node grants a vote in RequestVote only if:

It has not already voted in this term (one vote per term per node).
The candidate’s log is at least as up-to-date as the voter’s own log.

“At least as up-to-date” is compared by: higher lastLogTerm wins; if terms are equal, higher lastLogIndex wins. This rule is the key to safety.

Why the log-completeness rule matters

Without the up-to-date check, a candidate with a stale log could win an election and become leader despite missing committed entries. Those entries would then be overwritten, violating the guarantee that a committed entry is durable forever.

The quorum-overlap argument shows why the rule works: every committed entry was acknowledged by a majority. Any election quorum overlaps with that commit quorum in at least one node. Through that shared node, the candidate must have the committed entry (or a voter refuses). Combined, the quorum overlap and the log-completeness rule give Leader Completeness: every leader in term T+1 has all entries committed in terms 1–T.

The four safety properties

Raft’s correctness proof reduces to four invariants:

Election Safety — at most one leader per term. Follows from “one vote per node per term + majority required.”
Leader Append-Only — a leader never overwrites or deletes its own log entries; it only appends.
Log Matching — if two logs share an entry at index i with term t, they are identical for all indices up to i. Follows from the AppendEntries consistency check.
Leader Completeness — any entry committed in some term is in the log of every leader of higher terms. Follows from quorum overlap + the voting rule.

These four invariants work as a chain: Safety prevents two leaders (1), Append-Only keeps a leader’s own log intact (2), Log Matching makes followers converge to it (3), and Completeness ensures no new leader starts with a gap (4). Without any single link, the chain breaks — which is why implementations like etcd treat each invariant as a non-negotiable protocol requirement, not a best-effort guideline.

Each invariant is not a standalone rule but the consequence of one concrete mechanism; remove the mechanism and the named failure follows. That is the difference between memorising four properties and seeing why the proof holds.

State Machine Safety (derived): no two nodes ever apply different commands at the same index.

Quiz

Why does Raft randomise the election timeout (150–300 ms) rather than using a fixed value?

Quiz

Why does the RequestVote voting rule require the candidate's log to be at least as up-to-date as the voter's?

Trace it

1/5

Trace a clean leader election after the leader crashes.

Step 1 of 5

Setup: 5-node cluster A, B, C, D, E. A is leader at term 7. A crashes.

Locked

B's timer fires first at 187 ms. What does B do?

Locked

C, D, E receive RequestVote. They check: did I already vote in term 8? Is B's log as up-to-date as mine?

Locked

B collects 3 votes (itself + C + D). What happens next?

Locked

A comes back online with stale term 7.

The candidate gets votes from C and D — with its own vote that is 3 of 5, a majority — so it becomes leader for term 8 and starts sending heartbeats. A voter grants only if it has not voted this term and the candidate's log is at least as up-to-date.

Recall before you leave

01
Why does Raft need a majority for elections, not just any 2-of-5?
02
A node just returned from 10 minutes offline. It has lastLogTerm=5, lastLogIndex=200. The cluster is at term 12, with the leader having lastLogIndex=9500. Can this returning node win an election?
03
What is the difference between Election Safety and Leader Completeness?

Recap

Raft prevents repeated split votes by randomising the election timeout — the first follower to fire likely collects a majority before others even start. The RequestVote voting rule requires candidates to have logs at least as up-to-date as any voter’s, which combined with quorum overlap ensures the winning leader has every previously committed entry (Leader Completeness). The four safety properties — Election Safety, Leader Append-Only, Log Matching, and Leader Completeness — collectively guarantee that no two nodes ever apply different commands at the same index. A typical leader election on a healthy cluster resolves in 100–500 ms. Now when you see a candidate lose an election despite having a higher node ID or longer uptime, you will know why: it is the log completeness check, not any notion of seniority, that decides who wins.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 5 done

Connected lessons

builds on

How Raft replicates a log entry and decides it is safe to commitmiddle

unlocks

Raft in the real world: partitions, slow disks, and client routingmiddle

deepens into

Raft in the real world: partitions, slow disks, and client routingmiddle

appears again in204

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.