All Levels90 min readยท Topic 12.2

Content and social systems

Twitter/X, Instagram, YouTube, news feed, Reddit, TikTok

๐Ÿ“ฑKey Takeaways

  • 1
    News Feed: fan-out on write (pre-compute feeds) for normal users; fan-out on read for celebrities (avoid hot keys)
  • 2
    Twitter/X timeline: hybrid approach โ€” pre-compute for users with < 5K followers, on-read for high-follower users
  • 3
    YouTube: video processing pipeline (transcode โ†’ multiple resolutions), adaptive bitrate streaming (HLS/DASH)
  • 4
    TikTok: recommendation-driven (not social graph) โ€” content-based + engagement signals drive the For You feed

Building Content and Social Platforms

Social and content systems share a common challenge: generating personalized feeds for hundreds of millions of users from billions of content items. The core design decisions revolve around fan-out strategy, content ranking, and storage for heterogeneous content types.

Fan-Out Strategies

When a user posts, immediately push the post to all followers' pre-computed feed caches (Redis/Memcached).

Pros: reading the feed is fast (just read the cache). Cons: celebrity posts trigger millions of writes (hot key problem).

Used for: most users with typical follower counts (< 5K).

Platform Deep Dives

Upload โ†’ Transcode (multiple resolutions: 144p to 4K, multiple codecs: H.264, VP9, AV1) โ†’ Generate thumbnails โ†’ Run content moderation (ML-based) โ†’ Update search index โ†’ CDN distribution.

Adaptive bitrate streaming (HLS/DASH): client requests segments in the best quality its bandwidth supports. Seamless quality switching.

Storage: raw videos in object storage (GFS/Colossus). Metadata in Bigtable. Comments in Spanner.

Upload โ†’ generate multiple sizes (thumbnail, feed-size, full-res) โ†’ CDN distribution โ†’ store metadata in PostgreSQL/Cassandra.

Feed generation: ranked by ML model (engagement signals, recency, relationship strength).

Stories: 24-hour expiring content โ†’ separate storage with TTL. Viewed state tracked per user.

Unlike Instagram/Twitter, TikTok's feed is NOT social-graph-based. It's recommendation-driven.

Signals: video watch time, replays, shares, likes, follows, search, device type, location.

Two-tower model: user embedding + video embedding โ†’ candidate retrieval โ†’ ML ranking โ†’ diversity injection โ†’ serve.

New users get cold-start recommendations from trending content; the algorithm learns preferences within ~100 interactions.

Advantages

  • โ€ขHybrid fan-out balances read and write performance
  • โ€ขCDN + adaptive bitrate enables global video delivery
  • โ€ขML-based feeds maximize engagement

Disadvantages

  • โ€ขFeed ranking algorithms require constant tuning
  • โ€ขCelebrity accounts create hot-key challenges
  • โ€ขVideo transcoding is computationally expensive

๐Ÿงช Test Your Understanding

Knowledge Check1/1

Why does Twitter use a hybrid fan-out strategy?