🗃️Key Takeaways
- 1ACID guarantees (Atomicity, Consistency, Isolation, Durability) ensure reliable transactions
- 2Indexes dramatically speed up reads but slow down writes — choose carefully
- 3Normalization reduces redundancy; denormalization improves read performance
- 4SQL databases scale vertically well but horizontal scaling (sharding) is complex
Why Relational Databases Dominate
Relational databases (PostgreSQL, MySQL, Oracle) have been the backbone of software systems for 40+ years. They store data in tables with predefined schemas, enforce relationships through foreign keys, and guarantee ACID transactions. When in doubt about your database choice, SQL is almost always a safe default.
In system design interviews, understanding SQL deeply is essential — roughly 50% of all design questions involve relational data modeling decisions.
ACID Properties Explained
A transaction is all-or-nothing. If a bank transfer debits account A but the credit to account B fails, the entire transaction rolls back. No partial state changes.
Implementation: Write-ahead log (WAL) records all changes before applying them. On failure, replay or rollback from the WAL.
A transaction moves the database from one valid state to another. All constraints (foreign keys, unique indexes, check constraints) must be satisfied.
Example: You can't insert an order referencing a product_id that doesn't exist if there's a foreign key constraint.
Concurrent transactions don't interfere with each other. Isolation levels trade off between correctness and performance:
READ UNCOMMITTED → READ COMMITTED → REPEATABLE READ → SERIALIZABLE. Most production systems use READ COMMITTED or REPEATABLE READ.
Once a transaction commits, its changes survive power failures and crashes. Ensured by flushing the WAL to disk before returning success to the client.
fsync() is the key system call — it forces data from OS buffers to persistent storage.
-- B-Tree Index (default, most common)
-- Best for: equality lookups, range queries, sorting
CREATE INDEX idx_users_email ON users(email);
SELECT * FROM users WHERE email = 'user@example.com'; -- O(log n)
-- Composite Index (multi-column)
-- Best for: queries filtering on multiple columns
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at);
-- Follows leftmost prefix rule: works for (user_id) and (user_id, created_at)
-- Does NOT work for queries only on created_at
-- Hash Index (equality only, no range)
CREATE INDEX idx_users_email_hash ON users USING hash(email);
-- GIN Index (full-text search, JSONB, arrays)
CREATE INDEX idx_posts_content ON posts USING gin(to_tsvector('english', content));
-- Partial Index (index only matching rows)
CREATE INDEX idx_active_users ON users(email) WHERE active = true;Popular SQL Databases
| Database | Best For | Max Scale | Key Feature |
|---|---|---|---|
| PostgreSQL | Complex queries, extensibility | ~50K QPS per node | JSONB, extensions ecosystem |
| MySQL | Web applications, reads | ~50K QPS per node | InnoDB, widely deployed |
| SQL Server | Enterprise, Windows | Very high with Enterprise | T-SQL, SSRS integration |
| CockroachDB | Distributed SQL | Horizontally scalable | Spanner-inspired, serializable |
| Aurora | AWS cloud-native | 5x MySQL throughput | Auto-scaling storage, 6-way replication |
Advantages
- •ACID guarantees prevent data corruption
- •Mature tooling and ecosystem
- •SQL is a universal query language
- •Complex joins and aggregations are built-in
Disadvantages
- •Horizontal scaling (sharding) is complex
- •Schema changes require migrations
- •Write-heavy workloads can be bottlenecked
- •Over-normalization leads to expensive joins
🧪 Test Your Understanding
What does the 'I' in ACID stand for?