Intermediate → Advanced18 min read· Topic 6.1

Distributed systems fundamentals

Partial failures, fallacies of distributed computing, Two Generals problem, Byzantine faults

🌐Key Takeaways

  • 1
    Partial failures are the defining characteristic — some nodes fail while others continue
  • 2
    The 8 Fallacies of Distributed Computing: the network is NOT reliable, latency is NOT zero, etc.
  • 3
    Two Generals problem: no protocol can guarantee agreement over unreliable channels
  • 4
    Byzantine faults: nodes may lie or behave arbitrarily (addressed by BFT protocols like PBFT)

What Makes Distributed Systems Hard

Distributed systems are hard because of three fundamental challenges: (1) partial failure — any component can fail independently, (2) unreliable networks — messages can be lost, delayed, duplicated, or reordered, (3) no global clock — nodes cannot agree on 'what time is it right now.'

Every other topic in this module builds on understanding these challenges and the theoretical limits they impose.

The 8 Fallacies of Distributed Computing

Networks lose packets, cables get cut, switches fail. Design for network failures. Use retries, timeouts, circuit breakers.

Even same-datacenter calls take 0.5ms. Cross-region is 50-150ms. Design for latency: cache locally, batch requests, minimize network hops.

Large data transfers saturate links. Use compression, pagination, CDNs, and streaming for large payloads.

Every network call is potentially intercepted. Use TLS everywhere, mTLS between services, zero-trust architecture.

Services move, IPs change, new nodes join. Use service discovery and avoid hardcoded addresses.

Multiple teams, multiple clouds, multiple regions. Design for operational independence and clear ownership boundaries.

Cloud providers charge for data transfer, especially cross-region and internet egress. Design for data locality.

Different services use different protocols, versions, and message formats. Design for interoperability.

💡Two Generals Problem
Two armies on opposite sides of a valley must attack simultaneously to win. They communicate by messenger who might be captured. No number of acknowledgments can guarantee both generals know the plan is confirmed. This proves that perfect agreement over unreliable channels is impossible — the theoretical basis for many distributed systems limitations.

Advantages

  • Understanding fundamentals prevents naive design decisions
  • Prepares you for advanced distributed topics
  • Applicable to every distributed system

Disadvantages

  • Theoretical foundations can feel abstract
  • Many problems have no perfect solution, only trade-offs
  • Distributed debugging is inherently complex

🧪 Test Your Understanding

Knowledge Check1/1

What is a partial failure in a distributed system?