Beginner → Intermediate25 min read· Topic 2.5

Database sharding and partitioning

Horizontal vs vertical partitioning, consistent hashing, hot spots, cross-shard transactions

🧩Key Takeaways

  • 1
    Sharding = splitting data across multiple database nodes to distribute load
  • 2
    Partition key choice determines data distribution and query performance — most critical decision
  • 3
    Consistent hashing enables adding/removing nodes with minimal data movement
  • 4
    Cross-shard queries and transactions are expensive — design to minimize them

When You Need Sharding

A single database node has finite capacity: ~50K QPS for reads, ~10-20K QPS for writes, and terabytes of storage. When your data or load exceeds these limits, you must distribute data across multiple nodes — this is sharding (horizontal partitioning).

Sharding is one of the most impactful and complex decisions in system design. Getting the shard key wrong can cause hot spots, expensive cross-shard queries, and painful re-sharding.

Sharding Strategies

Data is partitioned based on value ranges. Example: users A-M on Shard 1, N-Z on Shard 2. Or orders from January on Shard 1, February on Shard 2.

Pros: Range queries (find all orders in date range) are efficient — only hit relevant shards.

Cons: Hot spots if data is unevenly distributed. Time-based sharding makes the 'current month' shard a bottleneck.

Consistent Hashing Ring
RingNode ANode BNode CNode DNEW
⚠️Shard Key Anti-Patterns
Don't shard on: auto-incrementing IDs (all new writes go to the last shard), timestamps (current time is always hot), country codes (USA shard will be 10x larger). Good shard keys: user_id (uniform distribution, most queries are per-user), tenant_id (multi-tenant SaaS), composite keys (user_id + date).

Advantages

  • Enables horizontal scaling beyond single-node limits
  • Distributes both storage and query load
  • Consistent hashing minimizes data movement

Disadvantages

  • Cross-shard joins and transactions are expensive
  • Wrong shard key creates hot spots
  • Re-sharding is operationally painful
  • Adds significant application complexity

🧪 Test Your Understanding

Knowledge Check1/1

Why is consistent hashing preferred over simple hash-mod sharding?