Intermediate → Advanced30 min read· Topic 6.2

Consensus algorithms

FLP impossibility, Paxos, Raft, Zab, Viewstamped Replication

🤝Key Takeaways

  • 1
    Consensus = getting distributed nodes to agree on a single value despite failures
  • 2
    FLP impossibility: no deterministic consensus algorithm can guarantee progress in an async system with even one crash
  • 3
    Raft: leader-based, understandable design — used by etcd, CockroachDB, Consul
  • 4
    Paxos: mathematically rigorous but notoriously hard to implement — used by Google Chubby, Amazon DynamoDB

Agreement in the Face of Failure

Consensus is the fundamental problem in distributed systems: how do N nodes agree on a value when some might crash and messages might be delayed? Consensus algorithms (Paxos, Raft, Zab) solve this, enabling features like distributed locks, leader election, and consistent replication.

Leader Election

Nodes start as followers. If no heartbeat from the leader, a follower becomes a candidate, requests votes. Whoever gets majority becomes leader. Leaders send heartbeats to prevent new elections.

Consensus Algorithms Compared

AlgorithmModelFault ToleranceReal-World Use
RaftLeader-basedN/2 failures (N odd)etcd, CockroachDB, Consul, TiKV
PaxosProposer/AcceptorN/2 failuresGoogle Chubby, Amazon (DynamoDB Paxos)
ZabLeader-based (Paxos variant)N/2 failuresApache ZooKeeper
PBFTByzantine fault tolerantN/3 Byzantine failuresBlockchain, financial systems

Advantages

  • Raft is designed to be understandable
  • Enables strong consistency in distributed systems
  • Well-tested implementations available

Disadvantages

  • FLP impossibility limits theoretical guarantees
  • All writes go through the leader (bottleneck)
  • Network partitions can cause temporary unavailability

🧪 Test Your Understanding

Knowledge Check1/1

How many failures can Raft tolerate with 5 nodes?