🌐Key Takeaways
- 1Partial failures are the defining characteristic — some nodes fail while others continue
- 2The 8 Fallacies of Distributed Computing: the network is NOT reliable, latency is NOT zero, etc.
- 3Two Generals problem: no protocol can guarantee agreement over unreliable channels
- 4Byzantine faults: nodes may lie or behave arbitrarily (addressed by BFT protocols like PBFT)
What Makes Distributed Systems Hard
Distributed systems are hard because of three fundamental challenges: (1) partial failure — any component can fail independently, (2) unreliable networks — messages can be lost, delayed, duplicated, or reordered, (3) no global clock — nodes cannot agree on 'what time is it right now.'
Every other topic in this module builds on understanding these challenges and the theoretical limits they impose.
The 8 Fallacies of Distributed Computing
Networks lose packets, cables get cut, switches fail. Design for network failures. Use retries, timeouts, circuit breakers.
Even same-datacenter calls take 0.5ms. Cross-region is 50-150ms. Design for latency: cache locally, batch requests, minimize network hops.
Large data transfers saturate links. Use compression, pagination, CDNs, and streaming for large payloads.
Every network call is potentially intercepted. Use TLS everywhere, mTLS between services, zero-trust architecture.
Services move, IPs change, new nodes join. Use service discovery and avoid hardcoded addresses.
Multiple teams, multiple clouds, multiple regions. Design for operational independence and clear ownership boundaries.
Cloud providers charge for data transfer, especially cross-region and internet egress. Design for data locality.
Different services use different protocols, versions, and message formats. Design for interoperability.
Advantages
- •Understanding fundamentals prevents naive design decisions
- •Prepares you for advanced distributed topics
- •Applicable to every distributed system
Disadvantages
- •Theoretical foundations can feel abstract
- •Many problems have no perfect solution, only trade-offs
- •Distributed debugging is inherently complex
🧪 Test Your Understanding
What is a partial failure in a distributed system?