Beginner18 min readยท Topic 1.1

Scalability fundamentals

Vertical vs horizontal scaling, diagonal scaling, measuring scalability with throughput and latency percentiles

๐Ÿ“ˆKey Takeaways

  • 1
    Vertical scaling (scale up) = bigger machine; Horizontal scaling (scale out) = more machines
  • 2
    Horizontal scaling is almost always preferred for production systems
  • 3
    Scalability is measured by throughput (QPS) and latency (p50, p95, p99)
  • 4
    Diagonal scaling = scale up first, then scale out when you hit limits

What Is Scalability?

Scalability is a system's ability to handle increased load by adding resources. A scalable system can grow from serving 100 users to 100 million users without a fundamental redesign. It's the most discussed topic in system design because every architecture decision either helps or hinders scalability.

There are two fundamental approaches to scaling: vertical (scaling up) and horizontal (scaling out). Understanding when to use each โ€” and their trade-offs โ€” is foundational to every system design discussion.

Vertical vs Horizontal Scaling
distributeadd moreVertical ScalingScale Up๐Ÿ–ฅ๏ธ BIG64 CPU / 512 GBHorizontal ScalingScale Out๐Ÿ–ฅ๏ธ8 CPU๐Ÿ–ฅ๏ธ8 CPU๐Ÿ–ฅ๏ธ8 CPU๐Ÿ–ฅ๏ธ8 CPU

Scaling Strategy Comparison

FactorVertical ScalingHorizontal Scaling
ComplexityLow โ€” just upgrade hardwareHigh โ€” distributed systems challenges
CostExpensive โ€” high-end servers cost a lotCheaper โ€” commodity hardware
LimitHardware ceiling (biggest machine available)Virtually unlimited
DowntimeOften requires downtime to upgradeZero-downtime with rolling updates
Data consistencySimple โ€” single nodeComplex โ€” distributed consensus needed
Failure impactSingle point of failurePartial failure only affects some nodes
Use caseDatabase primary, specialized computeStateless web/app servers, CDNs

Deep Dive: Measuring Scalability

Throughput measures how many requests your system handles per second. A well-scaled web tier might handle 100K QPS across 20 servers, while a database might handle 30K QPS on a single instance.

Key insight: throughput should scale linearly (or near-linearly) with the number of nodes. If adding a second server only gives you 1.3x throughput instead of 2x, you have a scaling bottleneck.

Average latency is misleading โ€” use percentiles. p50 = 50% of requests are faster. p95 = 95% are faster. p99 = 99% are faster.

Example: p50 = 50ms, p95 = 200ms, p99 = 1.2s. The p99 is often 10-20x worse than p50 due to GC pauses, contention, or unlucky cache misses.

SLAs are typically defined at p99: 'p99 latency must be under 500ms.'

In practice, the best approach is diagonal: start by scaling up (it's simpler and faster) until you hit the limits of a single machine or the cost becomes unreasonable. Then scale out.

Example: Start with a single PostgreSQL on a 64-core machine. When you hit 30K QPS, add read replicas (horizontal). When writes become the bottleneck, shard the database.

โœ…Interview Tip
When discussing scaling in an interview, always mention the specific bottleneck you're solving. Don't just say 'we scale horizontally.' Say 'Read QPS exceeds what a single DB can handle, so we add read replicas. Write QPS is manageable, so the primary remains a single node.'

Advantages

  • โ€ขHorizontal scaling offers virtually unlimited capacity
  • โ€ขCommodity hardware is cheaper than high-end machines
  • โ€ขNo single point of failure with proper horizontal design
  • โ€ขOn-demand scaling reduces wasted resources

Disadvantages

  • โ€ขHorizontal scaling adds distributed systems complexity
  • โ€ขData consistency becomes much harder across nodes
  • โ€ขNetwork partitions create new failure modes
  • โ€ขState management requires careful design

๐Ÿงช Test Your Understanding

Knowledge Check1/2

What is diagonal scaling?