Consensus algorithms

🤝Key Takeaways

1
Consensus = getting distributed nodes to agree on a single value despite failures
2
FLP impossibility: no deterministic consensus algorithm can guarantee progress in an async system with even one crash
3
Raft: leader-based, understandable design — used by etcd, CockroachDB, Consul
4
Paxos: mathematically rigorous but notoriously hard to implement — used by Google Chubby, Amazon DynamoDB

Agreement in the Face of Failure

Consensus is the fundamental problem in distributed systems: how do N nodes agree on a value when some might crash and messages might be delayed? Consensus algorithms (Paxos, Raft, Zab) solve this, enabling features like distributed locks, leader election, and consistent replication.

Leader Election

Nodes start as followers. If no heartbeat from the leader, a follower becomes a candidate, requests votes. Whoever gets majority becomes leader. Leaders send heartbeats to prevent new elections.

Consensus Algorithms Compared

Algorithm	Model	Fault Tolerance	Real-World Use
Raft	Leader-based	N/2 failures (N odd)	etcd, CockroachDB, Consul, TiKV
Paxos	Proposer/Acceptor	N/2 failures	Google Chubby, Amazon (DynamoDB Paxos)
Zab	Leader-based (Paxos variant)	N/2 failures	Apache ZooKeeper
PBFT	Byzantine fault tolerant	N/3 Byzantine failures	Blockchain, financial systems

Advantages

•Raft is designed to be understandable
•Enables strong consistency in distributed systems
•Well-tested implementations available

Disadvantages

•FLP impossibility limits theoretical guarantees
•All writes go through the leader (bottleneck)
•Network partitions can cause temporary unavailability

🧪 Test Your Understanding

Knowledge Check1/1

How many failures can Raft tolerate with 5 nodes?