Beginner15 min read· Topic 1.3

Latency and performance

Latency numbers every engineer should know, percentiles, Little's Law, performance vs scalability

Key Takeaways

  • 1
    Latency = time to complete one operation; Throughput = operations completed per unit time
  • 2
    Use percentiles (p50, p95, p99), never averages, to measure latency
  • 3
    Little's Law: concurrent requests = throughput × avg latency
  • 4
    Memory is ~100x faster than SSD, SSD is ~100x faster than HDD, network adds ~0.5ms per DC hop

Understanding Latency

Latency is the time between sending a request and receiving a response. It's the single most user-visible performance metric. Google found that adding 0.5 seconds to search results page load time reduced traffic by 20%. Amazon found that every 100ms of latency cost them 1% in sales.

Performance is not just about being fast — it's about being consistently fast. A system with 50ms p50 but 5s p99 gives a terrible experience to 1 in 100 users.

Latency Numbers Reference

OperationLatencyRelative
L1 cache~1 ns1x
L2 cache~4 ns4x
RAM access~100 ns100x
SSD read~16 µs16,000x
HDD seek~2 ms2,000,000x
Same-DC network~0.5 ms500,000x
CA → Netherlands~150 ms150,000,000x
Little's Law in Practice
Little's Law: L = λ × W

Where:
  L = number of concurrent requests in the system
  λ = throughput (requests/second)  
  W = average time per request (seconds)

Example:
  If your service handles 1,000 QPS with 200ms avg latency:
  L = 1,000 × 0.2 = 200 concurrent connections

  If each server handles 50 concurrent connections:
  You need 200 / 50 = 4 servers (minimum)
  With 2x headroom: 8 servers
⚠️Never Use Averages for Latency
A system with average latency of 100ms might have p50 = 20ms and p99 = 2000ms. The average hides that fact that 1% of users wait 100x longer. Always report p50, p95, and p99.

Advantages

  • Directly impacts user experience and revenue
  • Measurable with standard tooling
  • Optimization techniques are well-documented

Disadvantages

  • Tail latency (p99) is extremely hard to reduce
  • Optimization often trades off against other concerns
  • Network latency has physical limits (speed of light)

🧪 Test Your Understanding

Knowledge Check1/1

Using Little's Law, if your system handles 2,000 QPS with 100ms average latency, how many concurrent connections do you have?