Distributed Systems
Leader election: free-recall review
Retrieval beats re-reading. For each prompt, say or write a full answer from memory before you open the model answer — the effort of reconstructing the safety argument is what makes it stick when you are paged at 3 a.m.
Reconstruct the unit’s spine without looking back: why a single leader exists, how Raft elects one, why a lease cannot be trusted across a pause, what split-brain is, and exactly how a fencing token closes the window.
- 01Why elect a single leader at all, and what is the cost you accept in exchange?
- 02Walk through how Raft elects a leader, and why the election timeout is randomized.
- 03Explain why a lease does not, by itself, prevent two nodes from both writing as leader.
- 04Define split-brain, and name its two distinct causes from the unit.
- 05Describe the fencing-token mechanism end to end, and state the one condition without which it is worthless.
- 06Compare how ZooKeeper and etcd represent and detect lost leadership during failover.
If you could reconstruct each answer from memory you hold the unit’s spine: a single leader serializes writes but creates a single point of failure that must be re-elected safely; Raft does it with terms and randomized 150–300 ms timeouts; leadership is a renewable lease or session (etcd lease TTL, ZooKeeper ephemeral znode) whose detection window bounds failover; a lease cannot stop a paused leader from waking and writing, which is one of the two faces of split-brain (the other being a partition, handled by quorum); and only a monotonic fencing token, enforced at the resource, makes the stale write impossible. Elect for liveness, fence for safety.