๐งKey Takeaways
- 1URL shortener: hash URL โ base62 encode โ store mapping. Key decision: hash collision handling, analytics, expiration
- 2Rate limiter: token bucket or sliding window algorithm. Distributed: Redis-based counters with TTL
- 3Web crawler: BFS with URL frontier queue, politeness (respect robots.txt), deduplication (URL + content hash)
- 4Key-value store: design Dynamo โ consistent hashing, replication, quorum reads/writes, gossip protocol
Classic Infrastructure Design Problems
These are the most-asked system design questions because they test fundamental concepts with manageable scope. Each one touches specific trade-offs: hashing, distributed state, consistency, throttling, and crawling.
System Designs
Write: hash long URL (MD5/SHA) โ take first 7 chars โ base62 encode โ store mapping in KV store. Handle collisions: if short URL exists, append counter and rehash.
Read: look up short URL in KV store โ redirect (HTTP 301/302). 301 = permanent (browser caches), 302 = temporary (server tracks clicks).
Analytics: log each redirect with timestamp, user agent, IP โ batch process for click analytics.
Scale: billions of URLs โ partition by short URL hash. Read-heavy โ CDN for popular links.
Advantages
- โขURL shortener teaches hashing and distributed KV stores
- โขRate limiter teaches throttling and distributed state
- โขWeb crawler teaches BFS, dedup, and politeness
Disadvantages
- โขThese systems seem simple but have subtle edge cases
- โขDistributed rate limiting has consistency challenges
- โขWeb crawlers face dynamic content (JS rendering) challenges
๐งช Test Your Understanding
In a URL shortener, why might you use HTTP 302 instead of 301?