๐ฑKey Takeaways
- 1News Feed: fan-out on write (pre-compute feeds) for normal users; fan-out on read for celebrities (avoid hot keys)
- 2Twitter/X timeline: hybrid approach โ pre-compute for users with < 5K followers, on-read for high-follower users
- 3YouTube: video processing pipeline (transcode โ multiple resolutions), adaptive bitrate streaming (HLS/DASH)
- 4TikTok: recommendation-driven (not social graph) โ content-based + engagement signals drive the For You feed
Building Content and Social Platforms
Social and content systems share a common challenge: generating personalized feeds for hundreds of millions of users from billions of content items. The core design decisions revolve around fan-out strategy, content ranking, and storage for heterogeneous content types.
Fan-Out Strategies
When a user posts, immediately push the post to all followers' pre-computed feed caches (Redis/Memcached).
Pros: reading the feed is fast (just read the cache). Cons: celebrity posts trigger millions of writes (hot key problem).
Used for: most users with typical follower counts (< 5K).
Platform Deep Dives
Upload โ Transcode (multiple resolutions: 144p to 4K, multiple codecs: H.264, VP9, AV1) โ Generate thumbnails โ Run content moderation (ML-based) โ Update search index โ CDN distribution.
Adaptive bitrate streaming (HLS/DASH): client requests segments in the best quality its bandwidth supports. Seamless quality switching.
Storage: raw videos in object storage (GFS/Colossus). Metadata in Bigtable. Comments in Spanner.
Upload โ generate multiple sizes (thumbnail, feed-size, full-res) โ CDN distribution โ store metadata in PostgreSQL/Cassandra.
Feed generation: ranked by ML model (engagement signals, recency, relationship strength).
Stories: 24-hour expiring content โ separate storage with TTL. Viewed state tracked per user.
Unlike Instagram/Twitter, TikTok's feed is NOT social-graph-based. It's recommendation-driven.
Signals: video watch time, replays, shares, likes, follows, search, device type, location.
Two-tower model: user embedding + video embedding โ candidate retrieval โ ML ranking โ diversity injection โ serve.
New users get cold-start recommendations from trending content; the algorithm learns preferences within ~100 interactions.
Advantages
- โขHybrid fan-out balances read and write performance
- โขCDN + adaptive bitrate enables global video delivery
- โขML-based feeds maximize engagement
Disadvantages
- โขFeed ranking algorithms require constant tuning
- โขCelebrity accounts create hot-key challenges
- โขVideo transcoding is computationally expensive
๐งช Test Your Understanding
Why does Twitter use a hybrid fan-out strategy?