All Levels120 min readยท Topic 12.4

Data and infrastructure systems

URL shortener, rate limiter, web crawler, KV store, search engine, Google Docs, Kafka, S3

๐Ÿ”งKey Takeaways

  • 1
    URL shortener: hash URL โ†’ base62 encode โ†’ store mapping. Key decision: hash collision handling, analytics, expiration
  • 2
    Rate limiter: token bucket or sliding window algorithm. Distributed: Redis-based counters with TTL
  • 3
    Web crawler: BFS with URL frontier queue, politeness (respect robots.txt), deduplication (URL + content hash)
  • 4
    Key-value store: design Dynamo โ€” consistent hashing, replication, quorum reads/writes, gossip protocol

Classic Infrastructure Design Problems

These are the most-asked system design questions because they test fundamental concepts with manageable scope. Each one touches specific trade-offs: hashing, distributed state, consistency, throttling, and crawling.

System Designs

Write: hash long URL (MD5/SHA) โ†’ take first 7 chars โ†’ base62 encode โ†’ store mapping in KV store. Handle collisions: if short URL exists, append counter and rehash.

Read: look up short URL in KV store โ†’ redirect (HTTP 301/302). 301 = permanent (browser caches), 302 = temporary (server tracks clicks).

Analytics: log each redirect with timestamp, user agent, IP โ†’ batch process for click analytics.

Scale: billions of URLs โ†’ partition by short URL hash. Read-heavy โ†’ CDN for popular links.

Advantages

  • โ€ขURL shortener teaches hashing and distributed KV stores
  • โ€ขRate limiter teaches throttling and distributed state
  • โ€ขWeb crawler teaches BFS, dedup, and politeness

Disadvantages

  • โ€ขThese systems seem simple but have subtle edge cases
  • โ€ขDistributed rate limiting has consistency challenges
  • โ€ขWeb crawlers face dynamic content (JS rendering) challenges

๐Ÿงช Test Your Understanding

Knowledge Check1/1

In a URL shortener, why might you use HTTP 302 instead of 301?