In Search of an Understandable Consensus Algorithm (Extended Version), to end of Section 5
- Does Raft sacrifice anything for simplicity?
- Is raft used in real-world software, or do companies generally roll their own flavor of Paxos (or use a different consensus protocol)?
- What is Paxos? In what sense is Raft simpler?
- How long had Paxos existed before the authors created Raft? How widespread is Raft's usage in production now?
- How does Raft's performance compare to Paxos in real-world applications?
- Why are we learning/implementing Raft instead of Paxos?
- Are there systems like Raft that can survive and continue to operate when only a minority of the cluster is active?
- In Raft, the service which is being replicated is not available to the clients during an election process. In practice how much of a problem does this cause?
- Are there other consensus systems that don't have leader-election pauses?
- How are Raft and VMware FT related?
- Why can't a malicious person take over a Raft server, or forge incorrect Raft messages?
- The paper mentions that Raft works under all non-Byzantine conditions. What are Byzantine conditions and why could they make Raft fail?
- In Figure 1, what does the interface between client and server look like?
- What if a client sends a request to a leader, the the leader crashes before sending the client request to all followers, and the new leader doesn't have the request in its log? Won't that cause the client request to be lost?
- If there's a network partition, can Raft end up with two leaders and split brain?
- Suppose a new leader is elected while the network is partitioned, but the old leader is in a different partition. How will the old leader know to stop committing new entries?
- When some servers have failed, does "majority" refer to a majority of the live servers, or a majority of all servers (even the dead ones)?
- What if the election timeout is too short? Will that cause Raft to malfunction?
- Why randomize election timeouts?
- Can a candidate declare itself the leader as soon as it receives votes from a majority, and not bother waiting for further RequestVote replies?
- Can a leader in Raft ever stop being a leader except by crashing?
- When are followers' log entries sent to their state machines?
- Should the leader wait for replies to AppendEntries RPCs?
- What happens if more than half of the servers die?
- Why is the Raft log 1-indexed?
- When network partition happens, wouldn't client requests in minority partitions be lost?
- Is the argument in 5.4.3 a complete proof?
Suppose we have the scenario shown in the Raft paper's Figure 7: a cluster of seven servers, with the log contents shown. The first server crashes (the one at the top of the figure), and cannot be contacted. A leader election ensues. For each of the servers marked (a), (d), and (f), could that server be elected? If yes, which servers would vote for it? If no, what specific Raft mechanism(s) would prevent it from being elected?