Latency and performance

⚡Key Takeaways

1
Latency = time to complete one operation; Throughput = operations completed per unit time
2
Use percentiles (p50, p95, p99), never averages, to measure latency
3
Little's Law: concurrent requests = throughput × avg latency
4
Memory is ~100x faster than SSD, SSD is ~100x faster than HDD, network adds ~0.5ms per DC hop

Understanding Latency

Latency is the time between sending a request and receiving a response. It's the single most user-visible performance metric. Google found that adding 0.5 seconds to search results page load time reduced traffic by 20%. Amazon found that every 100ms of latency cost them 1% in sales.

Performance is not just about being fast — it's about being consistently fast. A system with 50ms p50 but 5s p99 gives a terrible experience to 1 in 100 users.

Latency Numbers Reference

Operation	Latency	Relative
L1 cache	~1 ns	1x
L2 cache	~4 ns	4x
RAM access	~100 ns	100x
SSD read	~16 µs	16,000x
HDD seek	~2 ms	2,000,000x
Same-DC network	~0.5 ms	500,000x
CA → Netherlands	~150 ms	150,000,000x

Little's Law in Practice

Little's Law: L = λ × W

Where:
  L = number of concurrent requests in the system
  λ = throughput (requests/second)  
  W = average time per request (seconds)

Example:
  If your service handles 1,000 QPS with 200ms avg latency:
  L = 1,000 × 0.2 = 200 concurrent connections

  If each server handles 50 concurrent connections:
  You need 200 / 50 = 4 servers (minimum)
  With 2x headroom: 8 servers

⚠️Never Use Averages for Latency

A system with average latency of 100ms might have p50 = 20ms and p99 = 2000ms. The average hides that fact that 1% of users wait 100x longer. Always report p50, p95, and p99.

Advantages

•Directly impacts user experience and revenue
•Measurable with standard tooling
•Optimization techniques are well-documented

Disadvantages

•Tail latency (p99) is extremely hard to reduce
•Optimization often trades off against other concerns
•Network latency has physical limits (speed of light)

🧪 Test Your Understanding

Knowledge Check1/1

Using Little's Law, if your system handles 2,000 QPS with 100ms average latency, how many concurrent connections do you have?