⚡Key Takeaways
- 1Latency = time to complete one operation; Throughput = operations completed per unit time
- 2Use percentiles (p50, p95, p99), never averages, to measure latency
- 3Little's Law: concurrent requests = throughput × avg latency
- 4Memory is ~100x faster than SSD, SSD is ~100x faster than HDD, network adds ~0.5ms per DC hop
Understanding Latency
Latency is the time between sending a request and receiving a response. It's the single most user-visible performance metric. Google found that adding 0.5 seconds to search results page load time reduced traffic by 20%. Amazon found that every 100ms of latency cost them 1% in sales.
Performance is not just about being fast — it's about being consistently fast. A system with 50ms p50 but 5s p99 gives a terrible experience to 1 in 100 users.
Latency Numbers Reference
| Operation | Latency | Relative |
|---|---|---|
| L1 cache | ~1 ns | 1x |
| L2 cache | ~4 ns | 4x |
| RAM access | ~100 ns | 100x |
| SSD read | ~16 µs | 16,000x |
| HDD seek | ~2 ms | 2,000,000x |
| Same-DC network | ~0.5 ms | 500,000x |
| CA → Netherlands | ~150 ms | 150,000,000x |
Little's Law in Practice
Little's Law: L = λ × W
Where:
L = number of concurrent requests in the system
λ = throughput (requests/second)
W = average time per request (seconds)
Example:
If your service handles 1,000 QPS with 200ms avg latency:
L = 1,000 × 0.2 = 200 concurrent connections
If each server handles 50 concurrent connections:
You need 200 / 50 = 4 servers (minimum)
With 2x headroom: 8 servers⚠️Never Use Averages for Latency
A system with average latency of 100ms might have p50 = 20ms and p99 = 2000ms. The average hides that fact that 1% of users wait 100x longer. Always report p50, p95, and p99.
Advantages
- •Directly impacts user experience and revenue
- •Measurable with standard tooling
- •Optimization techniques are well-documented
Disadvantages
- •Tail latency (p99) is extremely hard to reduce
- •Optimization often trades off against other concerns
- •Network latency has physical limits (speed of light)
🧪 Test Your Understanding
Knowledge Check1/1
Using Little's Law, if your system handles 2,000 QPS with 100ms average latency, how many concurrent connections do you have?