awesome-everything RU
↑ Back to the climb

Distributed Systems

How Raft replicates a log entry and decides it is safe to commit

Crux The AppendEntries RPC, the consistency check that keeps logs identical, the commit index, and the replicated state machine pattern built on top.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at middle altitude — in the sky
◷ 14 min

A client sends SET x = 42 to a Raft cluster. The leader appends it to its log. But the leader alone cannot commit that write — it needs confirmation from enough followers that they have it too. If the leader crashes right after appending but before any follower responds, what happens to the write?

The log and AppendEntries

The Raft leader maintains an ordered log of entries. Each entry is a tuple: (term, index, command). When a client sends a write:

  1. Leader appends the entry to its own log and fsyncs to disk.
  2. Leader sends AppendEntries RPC to every follower, carrying: the new entry, plus prevIndex and prevTerm — the index and term of the entry immediately before the new one.
  3. Each follower checks: does my log have an entry at prevIndex with prevTerm? If yes, append and fsync. If no, reject with a mismatch reply.
  4. Once the leader hears acknowledgement from a majority (including itself), it marks the entry committed.
  5. The leader piggybacks its commit index on the next heartbeat. Followers apply entries up to that index to their state machines.
StepWho actsWhat happens
1LeaderAppends (term=7, idx=101, cmd=“SET x=42”), fsyncs
2LeaderSends AppendEntries to B, C, D, E with prevIdx=100, prevTerm=7
3Followers B, C, D, EEach checks log at idx=100 has term=7. Appends, fsyncs, replies success
4LeaderReceives 2 successes (plus own = 3 of 5). Marks idx=101 committed
5LeaderReplies success to client
6Next heartbeatCarries commitIndex=101 — followers apply “SET x=42” to state machine

The consistency check: Log Matching

The prevIndex/prevTerm check is not bureaucracy — it is the mechanism that keeps every follower’s log identical to the leader’s. If a follower’s log diverges (because a previous leader wrote different entries before crashing), the mismatch reply tells the leader to back up and retry with an earlier prevIndex. The leader keeps decrementing until it finds the last point of agreement, then overwrites the follower’s divergent tail with its own.

This converges to identical logs because: any entry that was committed in a prior term was stored by a majority of nodes. The new leader’s log (elected by majority) overlaps with that prior majority, so the new leader has those committed entries. Uncommitted entries on the old leader’s disk get overwritten — they were never acknowledged to any client, so no data is lost.

The replicated state machine pattern

Raft does not care what commands mean. Your application defines the state machine — a key-value map, a config tree, a relation — and the commands that mutate it. Raft guarantees: every node applies every committed entry in the same order. Since the commands are deterministic, every state machine ends up in the same state. This is the replicated state machine pattern: ordered replicated log + deterministic state machine = consistent distributed service.

Important consequences:

  • Do not use NOW(), random(), or external API calls inside command handlers — these break determinism across replicas.
  • The state machine is application-specific; Raft is agnostic to it.
  • Reads for linearizability must go through the leader (covered in a later lesson); follower reads may return stale data.
Quiz

A Raft follower receives AppendEntries with prevIndex=50, prevTerm=4, but its own log at index 50 has term 3. What does the follower do?

Trace a client write through a 5-node Raft cluster

1/3
Order the steps

Put the AppendEntries consistency-check steps in order:

  1. 1 Leader picks the next log index to send to a follower
  2. 2 Leader sends AppendEntries with prevIndex, prevTerm, and the new entry
  3. 3 Follower checks: does my log have an entry at prevIndex with prevTerm?
  4. 4 If yes, follower appends and replies success
  5. 5 If no, follower replies mismatch with its conflicting index/term
  6. 6 Leader decrements nextIndex for this follower and retries with an earlier prevIndex
  7. 7 Eventually leader finds the last matching point; follower truncates its tail and accepts the leader's version
Recall before you leave
  1. 01
    Why is a Raft log entry not committed the instant the leader writes it to disk?
  2. 02
    A follower was offline for 2 minutes and missed 500 log entries. It comes back. How does it catch up?
  3. 03
    Why must command handlers in a Raft state machine be deterministic?
Recap

The leader replicates writes via AppendEntries, which carries a prevIndex/prevTerm consistency check that forces every follower’s log to converge to the leader’s. An entry is committed when a majority has persisted it; the leader then broadcasts the commit index so followers can apply the entry to their state machines. This is the replicated state machine pattern: identical ordered logs plus deterministic handlers equals identical state on every replica. Uncommitted entries from a crashed leader are safe to discard — they were never acknowledged. The fsync cost on every commit is the dominant operational expense, which is why Raft workloads need dedicated NVMe, not shared cloud volumes.

Connected lessons
appears again in178
Continue the climb ↑Raft leader election: timeouts, voting rules, and the four safety properties
shortcuts expand
search
K
prev piece
k
next piece
j
cycle tier
t
this menu
?
sources3
expand
  1. 01
  2. 02
  3. 03

Trademarks belong to their respective owners. Editorial reference only.