Distributed Systems
Raft: multiple-choice review
Six questions that cut across the whole unit. Each one is a decision you make while operating a real Raft cluster — etcd, TiKV, Consul — not a definition to recite. Pick the answer a senior engineer defends on the incident call.
Confirm you can connect the pieces the lessons built separately: why majority quorum is a safety requirement, what “committed” really means, how the voting rule preserves history, why Raft halts under partition, and the production discipline that keeps the cluster alive.
A 5-node cluster commits entry 101 with a quorum of {A, B, C}. Leader A then crashes, and the new leader is elected by the quorum {C, D, E}. Why is entry 101 guaranteed to survive into the new term?
A leader appends entry 200 to its own disk and fsyncs it, then crashes before any AppendEntries leaves the box. What is the correct statement about entry 200?
A node returns from a 30-minute partition. During that time it kept timing out and incrementing its term, so it now has a much higher term than the live leader but a stale log. Without pre-vote, what happens — and what does pre-vote change?
A network partition puts the current leader on the minority side, reaching only 1 of 4 other nodes. Clients on the leader's side get write timeouts; the majority side elects a new leader. Is this a bug?
An operator needs to grow a 3-node cluster to 5. They push the new 5-node config to every node in one atomic step, skipping joint consensus. What is the actual risk?
A cluster shows leader_changes_total climbing to several per minute, and clients see periodic 200–400 ms write stalls. wal_fsync_duration_seconds p99 is 1.8 s. What is the diagnosis and first fix?
The unit’s spine is one safety argument and one ops discipline. Safety: majority quorum guarantees any two quorums overlap, so a committed entry — persisted by a majority — is always inherited by the next leader, while the log-completeness voting rule blocks a stale candidate; together that is Leader Completeness, and “committed” only ever means majority-persisted. Discipline: Raft is CP, so the minority halts under partition by design; pre-vote stops rejoining nodes from triggering spurious elections; joint consensus keeps membership changes from splitting the cluster; and the WAL belongs on dedicated NVMe so fsync never starves the heartbeat. Almost every production incident traces back to one of those three — bypassed membership protocol, disabled pre-vote, or a slow disk.