Reference
System Design Glossary
64 essential terms and concepts — from ACID to ZooKeeper. Your go-to reference for system design interviews and everyday engineering.
A
ACID
DatabaseAtomicity, Consistency, Isolation, Durability — properties of reliable database transactions. Atomicity ensures all-or-nothing, Consistency maintains invariants, Isolation prevents interference between concurrent transactions, Durability persists committed data.
API Gateway
ArchitectureA server that acts as the single entry point for API requests. Handles routing, authentication, rate limiting, request/response transformation, and load balancing. Examples: Kong, AWS API Gateway, Envoy.
Availability
FundamentalsThe proportion of time a system is operational and accessible. Measured in 'nines' — 99.9% (three nines) = 8.77 hours downtime per year. Availability = Uptime / (Uptime + Downtime).
Async Messaging
ArchitectureCommunication pattern where the sender doesn't wait for the receiver to process the message. Enables temporal decoupling, load levelling, and fault tolerance. Implemented via message queues (Kafka, RabbitMQ, SQS).
B
Back Pressure
Distributed SystemsA mechanism for a receiver to signal to the sender to slow down when overwhelmed. Prevents cascading failures in stream processing and messaging systems. Essential in Kafka, Flink, and reactive systems.
BASE
DatabaseBasically Available, Soft state, Eventually consistent — an alternative to ACID for distributed databases that prioritizes availability over immediate consistency.
Bloom Filter
Data StructuresA probabilistic data structure that tests set membership. May return false positives but never false negatives. Space-efficient — uses a bit array and multiple hash functions. Used in databases, caches, and network systems to avoid expensive lookups.
Bulkhead
ArchitectureA resilience pattern that isolates components so that a failure in one doesn't cascade to others, similar to watertight compartments in a ship. Implemented via thread pools, process isolation, or separate service instances.
C
CAP Theorem
Distributed SystemsBrewer's theorem: a distributed system can provide at most 2 of 3 guarantees — Consistency (every read gets the latest write), Availability (every request gets a response), Partition tolerance (system operates despite network splits). In practice, P is mandatory, so the choice is between CP and AP.
Cache
CachingA high-speed data storage layer that stores a subset of data for faster retrieval. Can exist at every layer — browser, CDN, application, database. Cache hit ratio is the key metric.
Cache-Aside
CachingAlso called 'lazy loading'. Application checks cache first; on miss, reads from DB and populates cache. Most common caching pattern. Drawback: first request always misses.
Circuit Breaker
ArchitectureA resilience pattern that prevents a service from calling a failing dependency repeatedly. States: Closed (normal) → Open (failing, fast-fail) → Half-Open (testing recovery). Prevents cascade failures.
Consistent Hashing
Distributed SystemsA hashing technique that minimizes key redistribution when nodes are added/removed. Keys and nodes are placed on a virtual ring. Used in DynamoDB, Cassandra, CDNs, and distributed caches.
Consensus
Distributed SystemsThe process by which distributed nodes agree on a value. Essential for leader election and replication. Algorithms: Paxos, Raft, Zab (ZooKeeper). FLP impossibility theorem proves deterministic consensus is impossible in async networks with failures.
CQRS
ArchitectureCommand Query Responsibility Segregation — separating read and write models. Commands modify state, queries read state. Enables independent scaling and optimization of reads vs writes. Often paired with event sourcing.
CRDT
Distributed SystemsConflict-free Replicated Data Type — data structures that can be replicated across nodes and merged deterministically without coordination. Used for eventual consistency in collaborative editing, counters, and sets.
CDN
InfrastructureContent Delivery Network — a geographically distributed network of proxy servers and data centers. Caches content at edge locations close to users, reducing latency and backend load. Examples: CloudFront, Cloudflare, Akamai.
D
Dead Letter Queue
MessagingA queue that stores messages that couldn't be processed after multiple retries. Enables manual inspection and reprocessing of failed messages without blocking the main queue.
DNS
NetworkingDomain Name System — translates domain names to IP addresses. Hierarchical: root → TLD → authoritative nameserver. Supports record types: A, AAAA, CNAME, MX, NS, TXT. TTL controls caching.
Database Sharding
DatabaseHorizontal partitioning of data across multiple database instances. Each shard holds a subset of data. Sharding keys determine data placement. Challenges: cross-shard queries, rebalancing, hot spots.
E
Event Sourcing
ArchitectureStoring all changes to application state as a sequence of immutable events rather than current state. The current state is derived by replaying events. Enables audit trails, time travel, and event-driven architectures.
Eventual Consistency
Distributed SystemsA consistency model where replicas will converge to the same value given enough time without new updates. Provides higher availability than strong consistency. Used in DNS, CDNs, NoSQL databases.
F
Failover
InfrastructureThe process of switching to a redundant/standby system when the primary fails. Active-passive: standby takes over. Active-active: both handle traffic, surviving node absorbs full load.
Fan-out
ArchitectureDistributing a message/event to multiple recipients. Write fan-out (push): write to all followers' timelines at write time. Read fan-out (pull): assemble feed at read time. Trade-off between write amplification and read latency.
G
Geohash
Data StructuresA hierarchical spatial data structure that encodes geographic coordinates into a short string of letters/digits. Adjacent locations share common prefixes. Used for proximity searches in location-based services.
gRPC
NetworkingGoogle's Remote Procedure Call framework. Uses HTTP/2 for transport, Protocol Buffers for serialization. Supports streaming, multiplexing, and strong typing. 2-10× faster than REST/JSON for internal services.
H
Heartbeat
Distributed SystemsPeriodic signal sent between nodes to detect failures. If heartbeats are missed beyond a threshold, the node is considered failed. Used in distributed systems, load balancers, and cluster management.
Horizontal Scaling
FundamentalsAdding more machines to handle increased load (scale out). Preferred for stateless services. Requires load balancing and/or sharding. Contrast with vertical scaling (bigger machine).
Hot Spot
Distributed SystemsA node or partition receiving disproportionately more traffic than others. Caused by skewed data distribution or popular keys. Solutions: salting keys, consistent hashing with virtual nodes, splitting hot partitions.
I
Idempotency
Distributed SystemsProperty where performing the same operation multiple times produces the same result. Critical for retry logic in distributed systems. Implemented via idempotency keys — unique request identifiers that prevent duplicate processing.
Inverted Index
Data StructuresA data structure mapping terms to the documents/records containing them. The foundation of full-text search engines like Elasticsearch and Lucene. Enables fast keyword lookup across millions of documents.
K
Kafka
MessagingA distributed event streaming platform. Append-only log with topics, partitions, and consumer groups. Provides at-least-once delivery, high throughput (millions of messages/second), and data retention.
L
Latency
FundamentalsTime between a request and its response. Measured in percentiles: p50 (median), p95, p99, p99.9. Tail latency (p99+) matters most for user experience. Amdahl's law limits parallel improvements.
Leader Election
Distributed SystemsThe process of choosing a single node to coordinate operations in a distributed system. Prevents split-brain. Implemented via Raft, ZooKeeper (ZAB), or etcd. Key concept: fencing tokens to prevent stale leaders.
Load Balancer
NetworkingDistributes incoming traffic across multiple servers. Layer 4 (TCP/UDP) or Layer 7 (HTTP). Algorithms: round-robin, least connections, consistent hashing. Examples: HAProxy, NGINX, AWS ALB.
LSM Tree
StorageLog-Structured Merge Tree — a write-optimized storage engine. Writes go to an in-memory table (memtable), then flushed to sorted files (SSTables) on disk. Background compaction merges files. Used in Cassandra, RocksDB, LevelDB.
M
Message Queue
MessagingMiddleware that enables async communication between services. Provides buffering, ordering, delivery guarantees. Point-to-point or pub/sub. Examples: RabbitMQ, Amazon SQS, Kafka, Redis Streams.
Microservices
ArchitectureArchitecture style where an application is composed of small, independently deployable services. Each owns its data and communicates via APIs or events. Benefits: team autonomy, independent scaling, tech diversity.
MVCC
DatabaseMulti-Version Concurrency Control — database technique allowing concurrent reads and writes by maintaining multiple versions of data. Readers see a consistent snapshot without locking. Used in PostgreSQL, MySQL InnoDB.
N
NoSQL
DatabaseNon-relational databases optimized for specific access patterns. Types: Document (MongoDB), Key-Value (Redis), Wide-Column (Cassandra), Graph (Neo4j), Time-Series (InfluxDB), Search (Elasticsearch).
O
Outbox Pattern
ArchitectureEnsures reliable event publishing by writing events to a local 'outbox' table in the same DB transaction as the business operation. A separate process reads the outbox and publishes events. Solves the dual-write problem.
P
Partition Tolerance
Distributed SystemsThe system continues to function despite network partitions (message loss between nodes). In CAP theorem, this is effectively mandatory — networks are unreliable. So the real choice is CP vs AP.
Pub/Sub
MessagingPublish-Subscribe messaging pattern. Publishers send messages to topics, subscribers receive messages from topics they've subscribed to. Enables loose coupling and fan-out. Examples: Kafka, Google Pub/Sub, Redis Pub/Sub.
Q
Quorum
Distributed SystemsThe minimum number of nodes that must agree for an operation to succeed. For N replicas: write quorum W, read quorum R. If W + R > N, reads are guaranteed to see latest writes. DynamoDB and Cassandra use quorum-based consistency.
R
Rate Limiting
NetworkingControlling the rate of requests a client can make. Protects against abuse and overload. Algorithms: Token Bucket, Leaky Bucket, Fixed Window, Sliding Window Log, Sliding Window Counter.
Read Replica
DatabaseA copy of the primary database that handles read queries. Reduces load on the primary. Introduces replication lag — the delay between a write on primary and its availability on replicas.
Replication
DatabaseCopying data across multiple nodes for fault tolerance and read scalability. Modes: single-leader, multi-leader, leaderless. Trade-off between consistency and latency.
REST
NetworkingRepresentational State Transfer — an architectural style for APIs. Uses HTTP methods (GET, POST, PUT, DELETE), stateless communication, resource-oriented URLs. Most common API style.
S
Saga Pattern
ArchitectureManages distributed transactions as a sequence of local transactions. Each step has a compensating action for rollback. Types: choreography (event-driven) and orchestration (central coordinator).
Service Discovery
NetworkingMechanism for services to find and communicate with each other. Client-side: client queries registry (Consul, etcd). Server-side: load balancer queries registry. DNS-based: use DNS for discovery.
Service Mesh
InfrastructureInfrastructure layer handling service-to-service communication. Provides traffic management, security (mTLS), and observability without changing application code. Examples: Istio, Linkerd, Envoy.
Sharding
DatabaseSee Database Sharding — horizontal partitioning of data across multiple database instances.
SLA / SLO / SLI
InfrastructureSLA: Service Level Agreement — the contract. SLO: Service Level Objective — the target (e.g. 99.9% uptime). SLI: Service Level Indicator — the measured metric (e.g. actual uptime %).
Snowflake ID
Data StructuresA 64-bit unique ID generation scheme created by Twitter. Components: timestamp (41 bits) + datacenter ID (5 bits) + machine ID (5 bits) + sequence number (12 bits). Generates 4096 IDs per millisecond per machine. Time-sortable.
T
Throughput
FundamentalsThe number of operations/requests a system can handle per unit time. Measured in QPS (queries/second), TPS (transactions/second), or RPS (requests/second). Throughput × Latency = Concurrency (Little's Law).
Token Bucket
NetworkingA rate limiting algorithm. Tokens are added at a fixed rate. Each request consumes a token. If the bucket is empty, the request is rejected or queued. Allows bursts up to the bucket capacity.
Two-Phase Commit (2PC)
Distributed SystemsA distributed transaction protocol. Phase 1: Coordinator asks all participants to prepare (vote yes/no). Phase 2: If all voted yes, coordinator sends commit; otherwise abort. Blocking protocol — coordinator failure stalls everyone.
V
Vector Clock
Distributed SystemsA mechanism to track causal ordering of events in distributed systems. Each node maintains a vector of logical timestamps. Enables detecting concurrent events (neither happened before the other). Used in DynamoDB.
Vertical Scaling
FundamentalsIncreasing the resources (CPU, RAM, disk) of a single machine (scale up). Simpler but has hardware limits. Good for databases that are hard to shard. Eventually hits a ceiling.
Virtual Node (vNode)
Distributed SystemsIn consistent hashing, a physical node is represented by multiple virtual nodes on the hash ring. Improves load distribution and enables heterogeneous node capacities.
W
WAL (Write-Ahead Log)
StorageA technique where changes are written to a sequential log before being applied to the main data structure. Ensures durability — if the system crashes, changes can be recovered from the log. Used in databases and Kafka.
WebSocket
NetworkingA protocol providing full-duplex communication over a single TCP connection. Enables real-time bidirectional communication (chat, gaming, live updates). Persistent connection — overhead of initial handshake only.
Write-Ahead Log
StorageSee WAL.
Z
ZooKeeper
InfrastructureA distributed coordination service for maintaining configuration, naming, synchronization, and group services. Provides: leader election, distributed locks, service registry. Being replaced by etcd in many modern systems.
Need deeper explanations?
Every glossary term is covered in depth in our structured topic modules.