⚙️Key Takeaways
- 1B-trees: read-optimized, used by most SQL databases — O(log n) reads, in-place updates
- 2LSM trees: write-optimized, used by Cassandra/RocksDB — sequential writes + periodic compaction
- 3WAL (Write-Ahead Log): crash recovery mechanism — write to log first, then apply to data files
- 4Column-oriented storage is best for analytics (OLAP); row-oriented is best for transactions (OLTP)
How Databases Store Data on Disk
Understanding storage engine internals helps you make informed database choices and troubleshoot performance issues. The two dominant structures are B-trees (used by PostgreSQL, MySQL) and LSM trees (used by Cassandra, RocksDB, LevelDB).
B-Tree vs LSM Tree
| Feature | B-Tree | LSM Tree |
|---|---|---|
| Read perf | Excellent (O(log n) direct lookup) | Good (may check multiple levels) |
| Write perf | Good (random I/O for in-place update) | Excellent (sequential writes) |
| Space usage | Moderate (fragmentation) | Higher (duplicates before compaction) |
| Write amplification | Low-Medium | High (compaction rewrites) |
| Used by | PostgreSQL, MySQL (InnoDB), SQLite | Cassandra, RocksDB, LevelDB, HBase |
| Best for | Read-heavy, OLTP | Write-heavy, time-series, logs |
💡Write-Ahead Log (WAL)
Before modifying any data page, the database writes the change to a sequential append-only log (WAL). If the system crashes mid-write, it replays the WAL on startup to recover to a consistent state. This is how PostgreSQL, MySQL, and most databases achieve durability (the D in ACID).
Advantages
- •B-trees provide predictable read performance
- •LSM trees excel at write throughput
- •WAL ensures crash recovery
Disadvantages
- •LSM compaction causes periodic latency spikes
- •B-tree random I/O is slower for writes
- •Storage engine choice is usually made by the database, not the user
🧪 Test Your Understanding
Knowledge Check1/1
Which storage engine is better for write-heavy workloads?