๐ฌKey Takeaways
- 1WhatsApp handles 100B+ messages/day using Erlang (BEAM VM) for massive concurrency and fault tolerance
- 2WebSocket-based real-time chat requires: connection gateway, message queue (Kafka), presence service, message storage
- 3Notification systems need multi-channel fanout: push (APNs/FCM), email, SMS, in-app โ with user preference routing
- 4Video conferencing: SFU (Selective Forwarding Unit) architecture scales better than mesh or MCU
Designing Communication Systems
Communication systems (chat, notifications, video) are among the most commonly asked system design questions. They share core challenges: real-time delivery, presence (who's online), message ordering, offline delivery, and multi-device sync.
The key insight: real-time communication requires persistent connections (WebSocket), not HTTP request-response. This fundamentally changes the architecture.
System Breakdowns
Connection Layer: WebSocket gateway servers โ each handles 100K+ persistent connections. Stateful โ need sticky sessions or connection registry.
Message Flow: Client โ WebSocket Gateway โ Message Service โ Kafka โ Recipient's Gateway โ Recipient.
Storage: messages stored in Cassandra (write-heavy, time-ordered). Group messages fan out at the server side.
Presence: heartbeat-based (client pings every 30s). Status stored in Redis. Fan out changes to friends list.
Key design decision: store messages on server (WhatsApp) vs e2e encrypt and the server only forwards (Signal model โ WhatsApp adopted this too).
Multi-channel: push (APNs, FCM), email (SES, SendGrid), SMS (Twilio), in-app (WebSocket).
Architecture: Event producers โ Notification Service โ Priority Queue โ Channel-specific workers โ Delivery APIs.
User preferences: respect notification settings (opt-out, quiet hours, frequency capping). Rate limiting to prevent notification overload.
Delivery guarantees: at-least-once for critical notifications (order updates), best-effort for social (likes).
SFU (Selective Forwarding Unit): each participant sends their stream to the SFU; the SFU forwards to all others. N uploads, N downloads per user.
Advantages over mesh (Nยฒ connections) or MCU (expensive transcoding): SFU balances quality and cost.
Simulcast: each client sends multiple quality levels (720p, 360p, 180p). SFU forwards the best quality each receiver can handle.
TURN/STUN servers for NAT traversal when direct peer connections fail.
Advantages
- โขWebSocket enables true real-time communication
- โขSFU scales video to large meetings efficiently
- โขKafka provides durable, ordered message delivery
Disadvantages
- โขWebSocket connections are stateful (harder to scale than stateless HTTP)
- โขPresence systems create high fan-out load
- โขEnd-to-end encryption complicates server-side features
๐งช Test Your Understanding
Why do chat systems use WebSockets instead of HTTP?