All Levels60 min readยท Topic 12.1

Communication systems

WhatsApp, notification system, real-time chat, email service, video conferencing

๐Ÿ’ฌKey Takeaways

  • 1
    WhatsApp handles 100B+ messages/day using Erlang (BEAM VM) for massive concurrency and fault tolerance
  • 2
    WebSocket-based real-time chat requires: connection gateway, message queue (Kafka), presence service, message storage
  • 3
    Notification systems need multi-channel fanout: push (APNs/FCM), email, SMS, in-app โ€” with user preference routing
  • 4
    Video conferencing: SFU (Selective Forwarding Unit) architecture scales better than mesh or MCU

Designing Communication Systems

Communication systems (chat, notifications, video) are among the most commonly asked system design questions. They share core challenges: real-time delivery, presence (who's online), message ordering, offline delivery, and multi-device sync.

The key insight: real-time communication requires persistent connections (WebSocket), not HTTP request-response. This fundamentally changes the architecture.

System Breakdowns

Connection Layer: WebSocket gateway servers โ€” each handles 100K+ persistent connections. Stateful โ€” need sticky sessions or connection registry.

Message Flow: Client โ†’ WebSocket Gateway โ†’ Message Service โ†’ Kafka โ†’ Recipient's Gateway โ†’ Recipient.

Storage: messages stored in Cassandra (write-heavy, time-ordered). Group messages fan out at the server side.

Presence: heartbeat-based (client pings every 30s). Status stored in Redis. Fan out changes to friends list.

Key design decision: store messages on server (WhatsApp) vs e2e encrypt and the server only forwards (Signal model โ€” WhatsApp adopted this too).

Multi-channel: push (APNs, FCM), email (SES, SendGrid), SMS (Twilio), in-app (WebSocket).

Architecture: Event producers โ†’ Notification Service โ†’ Priority Queue โ†’ Channel-specific workers โ†’ Delivery APIs.

User preferences: respect notification settings (opt-out, quiet hours, frequency capping). Rate limiting to prevent notification overload.

Delivery guarantees: at-least-once for critical notifications (order updates), best-effort for social (likes).

SFU (Selective Forwarding Unit): each participant sends their stream to the SFU; the SFU forwards to all others. N uploads, N downloads per user.

Advantages over mesh (Nยฒ connections) or MCU (expensive transcoding): SFU balances quality and cost.

Simulcast: each client sends multiple quality levels (720p, 360p, 180p). SFU forwards the best quality each receiver can handle.

TURN/STUN servers for NAT traversal when direct peer connections fail.

Chat System Architecture
ClientsWebSocketWS GatewayConnection mgmtMessage SvcValidate & routeKafkaMessage queueCassandraMessage storePresence SvcOnline status

Advantages

  • โ€ขWebSocket enables true real-time communication
  • โ€ขSFU scales video to large meetings efficiently
  • โ€ขKafka provides durable, ordered message delivery

Disadvantages

  • โ€ขWebSocket connections are stateful (harder to scale than stateless HTTP)
  • โ€ขPresence systems create high fan-out load
  • โ€ขEnd-to-end encryption complicates server-side features

๐Ÿงช Test Your Understanding

Knowledge Check1/2

Why do chat systems use WebSockets instead of HTTP?