Coordination and leader election

👑Key Takeaways

1
Leader election: one node coordinates work — all others are followers. If the leader fails, a new one is elected.
2
ZooKeeper/etcd provide distributed coordination primitives: locks, leader election, configuration
3
Fencing tokens prevent 'split brain' — a token monotonically increasing ensures stale leaders can't write
4
Distributed locks are inherently dangerous — prefer lease-based locks with TTL and fencing tokens

Coordinating Distributed Nodes

Many distributed systems need a single coordinator: one node that assigns work, holds a lock, or acts as the single writer. Leader election algorithms ensure exactly one leader exists, and that a new leader is elected quickly if the current one fails.

Coordination Services

Service	Consensus	Use Case	API Style
ZooKeeper	Zab (Paxos variant)	Leader election, config, distributed locks	Hierarchical znodes, watchers
etcd	Raft	K8s config store, leader election, service discovery	Key-value with watch, lease, transactions
Consul	Raft	Service discovery, health checks, KV, leader election	HTTP API, DNS interface

⚠️The Distributed Lock Problem

A lock acquired from Redis/etcd can fail silently: network partition causes the lock holder to lose connectivity while still believing it holds the lock. Always use fencing tokens — a monotonically increasing number attached to each lock acquisition. Resource servers reject requests with tokens older than the latest seen.

Advantages

•Leader election provides a clear coordination point
•ZooKeeper/etcd are battle-tested for coordination
•Fencing tokens prevent split-brain writes

Disadvantages

•Leader is a bottleneck and SPOF (need fast re-election)
•Distributed locks are complex to use correctly
•Coordination services add infrastructure overhead

🧪 Test Your Understanding

Knowledge Check1/1

What is a fencing token used for?