🤝Key Takeaways
- 1Consensus = getting distributed nodes to agree on a single value despite failures
- 2FLP impossibility: no deterministic consensus algorithm can guarantee progress in an async system with even one crash
- 3Raft: leader-based, understandable design — used by etcd, CockroachDB, Consul
- 4Paxos: mathematically rigorous but notoriously hard to implement — used by Google Chubby, Amazon DynamoDB
Agreement in the Face of Failure
Consensus is the fundamental problem in distributed systems: how do N nodes agree on a value when some might crash and messages might be delayed? Consensus algorithms (Paxos, Raft, Zab) solve this, enabling features like distributed locks, leader election, and consistent replication.
Leader Election
Nodes start as followers. If no heartbeat from the leader, a follower becomes a candidate, requests votes. Whoever gets majority becomes leader. Leaders send heartbeats to prevent new elections.
Consensus Algorithms Compared
| Algorithm | Model | Fault Tolerance | Real-World Use |
|---|---|---|---|
| Raft | Leader-based | N/2 failures (N odd) | etcd, CockroachDB, Consul, TiKV |
| Paxos | Proposer/Acceptor | N/2 failures | Google Chubby, Amazon (DynamoDB Paxos) |
| Zab | Leader-based (Paxos variant) | N/2 failures | Apache ZooKeeper |
| PBFT | Byzantine fault tolerant | N/3 Byzantine failures | Blockchain, financial systems |
Advantages
- •Raft is designed to be understandable
- •Enables strong consistency in distributed systems
- •Well-tested implementations available
Disadvantages
- •FLP impossibility limits theoretical guarantees
- •All writes go through the leader (bottleneck)
- •Network partitions can cause temporary unavailability
🧪 Test Your Understanding
Knowledge Check1/1
How many failures can Raft tolerate with 5 nodes?