Distributed Systems DIST · 02 · 09

Raft: code and log reading

Read Raft RPC handlers, a state-machine step, and a real AppendEntries log, then pick the behaviour, the bug, or the highest-leverage fix a senior engineer makes first.

DIST Senior ◷ 14 min

Level

FoundationsJuniorMiddleSenior

Raft bugs hide in four places: the AppendEntries consistency check, the commit-index advance, the vote-granting rule, and the state machine that applies the log. Read each snippet the way you would in a code review or an incident, then pick what a senior engineer flags first.

Goal

Practise reading the protocol where it actually lives — in RPC handlers and log lines — and spot the off-by-one, the missing check, and the divergence signature that separate a correct Raft from a corrupting one.

Snippet 1 — the AppendEntries consistency check

// follower handling AppendEntries
func (r *Raft) handleAppend(a AppendEntries) AppendReply {
    if a.Term < r.currentTerm {
        return AppendReply{Term: r.currentTerm, Success: false}
    }
    // BUG IS HERE: appends without the prevLog check
    r.log = append(r.log[:a.PrevLogIndex+1], a.Entries...)
    r.fsync()
    return AppendReply{Term: r.currentTerm, Success: true}
}

Quiz

This follower handler is missing one check that Raft safety depends on. Which one, and what breaks without it?

Snippet 2 — advancing the commit index

// leader, after collecting matchIndex[] from followers
func (r *Raft) advanceCommit() {
    for n := r.commitIndex + 1; n <= r.lastIndex(); n++ {
        count := 1 // count self
        for _, m := range r.matchIndex {
            if m >= n {
                count++
            }
        }
        if count >= r.majority() {
            r.commitIndex = n   // committed by majority
        }
    }
}

Quiz

A majority has replicated entry n, so this code commits it. But the Raft paper adds one more condition before a leader may advance commitIndex to n. What is it, and why?

Snippet 3 — granting a vote

// handling RequestVote
func (r *Raft) handleVote(v RequestVote) VoteReply {
    if v.Term < r.currentTerm {
        return VoteReply{Term: r.currentTerm, Granted: false}
    }
    if r.votedFor == nil || r.votedFor == v.CandidateID {
        r.votedFor = v.CandidateID
        return VoteReply{Term: v.Term, Granted: true}
    }
    return VoteReply{Term: v.Term, Granted: false}
}

Quiz

This vote handler enforces one-vote-per-term but skips a check. What can a candidate now do, and which safety property fails?

Snippet 4 — a real AppendEntries log

15:42:08 INFO  raft: AppendEntries -> D failure (mismatch at idx=400100 term=12 vs follower idx=400100 term=11)
15:42:09 INFO  raft: AppendEntries -> D retry prevIdx=400050 (decremented)
15:42:09 INFO  raft: AppendEntries -> D failure (mismatch at idx=400050 term=12 vs follower idx=400050 term=10)
15:42:10 INFO  raft: AppendEntries -> D retry prevIdx=399000 (decremented)
15:42:10 INFO  raft: AppendEntries -> D failure (mismatch at idx=399000 term=12 vs follower idx=399000 term=8)
15:42:11 WARN  raft: D has diverged extensively; consider InstallSnapshot

Quiz

The leader keeps decrementing nextIndex for follower D one step at a time. Reading this log, what is the diagnosis and the right operational response?

Recap

Four handlers carry Raft’s safety. AppendEntries must gate every splice on a prevLogIndex/prevLogTerm match or logs diverge silently. A leader may only directly commit an entry from its own current term — counting replicas on a prior-term entry reopens the figure-8 overwrite. RequestVote must compare lastLogTerm/lastLogIndex, or a stale candidate wins and breaks Leader Completeness. And a long run of one-step nextIndex decrements with falling terms is the divergence signature that means switch to InstallSnapshot. Read the protocol where it lives — in the handlers and the logs — and these are the exact lines a senior engineer checks first.

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.