Relational databases (SQL)

🗃️Key Takeaways

1
ACID guarantees (Atomicity, Consistency, Isolation, Durability) ensure reliable transactions
2
Indexes dramatically speed up reads but slow down writes — choose carefully
3
Normalization reduces redundancy; denormalization improves read performance
4
SQL databases scale vertically well but horizontal scaling (sharding) is complex

Why Relational Databases Dominate

Relational databases (PostgreSQL, MySQL, Oracle) have been the backbone of software systems for 40+ years. They store data in tables with predefined schemas, enforce relationships through foreign keys, and guarantee ACID transactions. When in doubt about your database choice, SQL is almost always a safe default.

In system design interviews, understanding SQL deeply is essential — roughly 50% of all design questions involve relational data modeling decisions.

ACID Properties Explained

A transaction is all-or-nothing. If a bank transfer debits account A but the credit to account B fails, the entire transaction rolls back. No partial state changes.

Implementation: Write-ahead log (WAL) records all changes before applying them. On failure, replay or rollback from the WAL.

A transaction moves the database from one valid state to another. All constraints (foreign keys, unique indexes, check constraints) must be satisfied.

Example: You can't insert an order referencing a product_id that doesn't exist if there's a foreign key constraint.

Concurrent transactions don't interfere with each other. Isolation levels trade off between correctness and performance:

READ UNCOMMITTED → READ COMMITTED → REPEATABLE READ → SERIALIZABLE. Most production systems use READ COMMITTED or REPEATABLE READ.

Once a transaction commits, its changes survive power failures and crashes. Ensured by flushing the WAL to disk before returning success to the client.

fsync() is the key system call — it forces data from OS buffers to persistent storage.

Index Types and When to Use Them

-- B-Tree Index (default, most common)
-- Best for: equality lookups, range queries, sorting
CREATE INDEX idx_users_email ON users(email);
SELECT * FROM users WHERE email = 'user@example.com';  -- O(log n)

-- Composite Index (multi-column)
-- Best for: queries filtering on multiple columns
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at);
-- Follows leftmost prefix rule: works for (user_id) and (user_id, created_at)
-- Does NOT work for queries only on created_at

-- Hash Index (equality only, no range)
CREATE INDEX idx_users_email_hash ON users USING hash(email);

-- GIN Index (full-text search, JSONB, arrays)
CREATE INDEX idx_posts_content ON posts USING gin(to_tsvector('english', content));

-- Partial Index (index only matching rows)
CREATE INDEX idx_active_users ON users(email) WHERE active = true;

Popular SQL Databases

Database	Best For	Max Scale	Key Feature
PostgreSQL	Complex queries, extensibility	~50K QPS per node	JSONB, extensions ecosystem
MySQL	Web applications, reads	~50K QPS per node	InnoDB, widely deployed
SQL Server	Enterprise, Windows	Very high with Enterprise	T-SQL, SSRS integration
CockroachDB	Distributed SQL	Horizontally scalable	Spanner-inspired, serializable
Aurora	AWS cloud-native	5x MySQL throughput	Auto-scaling storage, 6-way replication

✅When SQL is the Right Choice

Use SQL when you need: ACID transactions (payments, inventory), complex joins, strict schema enforcement, or you're not sure (SQL is the safe default). Avoid SQL when: you need >100K write QPS, your data is mostly key-value lookups, or your schema changes constantly.

Advantages

•ACID guarantees prevent data corruption
•Mature tooling and ecosystem
•SQL is a universal query language
•Complex joins and aggregations are built-in

Disadvantages

•Horizontal scaling (sharding) is complex
•Schema changes require migrations
•Write-heavy workloads can be bottlenecked
•Over-normalization leads to expensive joins

🧪 Test Your Understanding

Knowledge Check1/2

What does the 'I' in ACID stand for?