๐Key Takeaways
- 1Auto-increment IDs fail at distributed scale โ you need globally unique, decentralized ID generators
- 2Snowflake IDs (64-bit: timestamp + datacenter + machine + sequence) generate 4096 IDs/ms/machine and are time-sortable
- 3UUID v4 is simple but not sortable and wastes storage (128 bits). ULID combines UUID's universality with sortability
- 4Choose your ID scheme based on: sortability, size, collision risk, information leakage, and database index friendliness
Why Is This Hard?
In a single-server system, auto-increment works perfectly. But in distributed systems with multiple database writers, coordinating sequential IDs requires synchronization โ which kills throughput. The challenge: generate IDs that are **globally unique**, **roughly sortable by time**, **compact**, and **require no coordination** between nodes.
This comes up in almost every system design interview โ "How would you generate unique IDs for URLs, messages, posts, or orders at scale?"
ID Generation Schemes Compared
| Scheme | Size | Sortable | Decentralized | Format | Best For |
|---|---|---|---|---|---|
| Auto-increment | 32-64 bits | Yes | No (single writer) | 1, 2, 3... | Small, single-DB systems |
| UUID v4 | 128 bits | No | Yes | 550e8400-e29b-41d4... | Simple distributed systems |
| UUID v7 (2024) | 128 bits | Yes | Yes | timestamp + random | Modern distributed IDs |
| Snowflake ID | 64 bits | Yes | Yes | timestamp+dc+machine+seq | High-throughput at scale |
| ULID | 128 bits | Yes | Yes | 01ARZ3NDEKTSV4RRFF... | DB-friendly sortable IDs |
| NanoID | Variable | No | Yes | V1StGXR8_Z5jdHi6B-myT | URL-safe short IDs |
| Sonyflake | 63 bits | Yes | Yes | 10ms precision variant | Lower throughput, longer lifespan |
Deep Dive Into Each Scheme
UUIDs are 128-bit identifiers typically represented as 32 hex digits: 550e8400-e29b-41d4-a716-446655440000. Version 4 uses random/pseudo-random generation. **Pros:** Zero coordination needed. Built into every language. Universally understood. **Cons:** 128 bits is large (wastes index space). Not sortable โ causes B-tree page splits in databases (terrible for write perf). Encodes no information. ~2^122 possible values means collision is theoretically possible but astronomically unlikely (p = 10^-37 for 1 billion IDs). **When to use:** Prototypes, low-scale systems, when sortability doesn't matter, or when you need guaranteed uniqueness across disconnected systems.
Created by Twitter in 2010. A 64-bit ID structured as: 1 sign bit + 41 timestamp bits + 5 DC bits + 5 machine bits + 12 sequence bits. **Key properties:** - Time-sortable: IDs generated later have higher values - Compact: fits in int64 (half the size of UUID) - High throughput: 4,096 IDs per millisecond per machine - Decentralized: no coordinator needed - 69 years of usable range from custom epoch **Clock skew problem:** If a machine's clock goes backwards, you'd generate duplicate IDs. Solution: reject requests until clock catches up, or use logical clock + sequence instead. **Adopted by:** Twitter, Discord (modified), Instagram (modified), Spotify.
ULID = 48-bit timestamp (ms) + 80-bit random. Encoded as Crockford Base32 (26 characters). Example: 01ARZ3NDEKTSV4RRFFQ69G5FAV. **Why ULID over UUID?** - Lexicographically sortable (string comparison = time order) - 1.21e+24 unique ULIDs per millisecond - Monotonically increasing within millisecond (random part incremented) - DB-friendly: sequential writes, no B-tree fragmentation - URL-safe: no special characters **When to use:** When you want UUID-level uniqueness with sortability and database friendliness.
**PostgreSQL:** Use UUID v7 (pg 17+) or bigserial + logical sharding. **MySQL:** AUTO_INCREMENT with different offsets per shard (shard1: 1,3,5... shard2: 2,4,6...). **MongoDB:** ObjectId is already a 12-byte, time-sortable ID (4 timestamp + 5 random + 3 counter). **DynamoDB:** Client-generated UUIDs or KSUIDs. **The Ticket Server approach (Flickr):** Dedicated MySQL instances that only generate IDs via AUTO_INCREMENT. Simple but creates a single point of failure (solved with 2 servers using odd/even).
import time, threading
class SnowflakeGenerator:
EPOCH = 1288834974657 # Twitter epoch (Nov 4, 2010)
def __init__(self, datacenter_id, machine_id):
self.datacenter_id = datacenter_id & 0x1F # 5 bits
self.machine_id = machine_id & 0x1F # 5 bits
self.sequence = 0
self.last_timestamp = -1
self.lock = threading.Lock()
def generate(self):
with self.lock:
timestamp = int(time.time() * 1000) - self.EPOCH
if timestamp == self.last_timestamp:
self.sequence = (self.sequence + 1) & 0xFFF # 12 bits
if self.sequence == 0: # Overflow โ wait for next ms
while timestamp <= self.last_timestamp:
timestamp = int(time.time() * 1000) - self.EPOCH
else:
self.sequence = 0
self.last_timestamp = timestamp
return (
(timestamp << 22) |
(self.datacenter_id << 17) |
(self.machine_id << 12) |
self.sequence
)
# Usage: gen = SnowflakeGenerator(datacenter_id=1, machine_id=1)
# unique_id = gen.generate() # e.g., 1234567890123456789Single Database?
Advantages
- โขSnowflake IDs are compact (64 bits), sortable, and support 4M IDs/sec/machine
- โขNo coordination needed โ each machine generates independently
- โขTime-sortable IDs enable efficient range queries and index-friendly inserts
- โขUUID v7 / ULID offer standardized sortable alternatives at 128 bits
Disadvantages
- โขSnowflake relies on wall clocks โ vulnerable to clock skew
- โขFixed bit allocation limits: max 32 DCs ร 32 machines in standard Snowflake
- โขInformation leakage: timestamps in IDs reveal creation time
- โขNo scheme is perfect โ every choice involves trade-offs on size, sortability, and throughput
๐งช Test Your Understanding
Why are auto-increment IDs problematic in distributed systems?