gitGood.dev
Systems

System Design Tradeoffs

SystemsFREELast updated: May 2026 · By gitGood Editorial

The recurring forks in system design interviews. CAP, PACELC, sync vs async, push vs pull, SQL vs NoSQL, sharding shapes, consistency models, cache strategies, idempotency, and rate limiting. For each, the options and when to choose each.

How to use this sheet

System design rounds reward picking a side and defending it. "It depends" is a non-answer. For each fork below, name the dimension, name both options, name the workload that should pick each, then pick one for the problem on the table. Interviewers grade depth on tradeoffs more than novelty of architecture.

Distributed-systems tradeoffs

CAP theorem

CP (consistency + partition tolerance)

Under partition, refuse writes (or reads) on one side rather than serve stale data. Examples: ZooKeeper, etcd, HBase, MongoDB with majority writes.

AP (availability + partition tolerance)

Under partition, both sides keep accepting writes; reconcile later. Examples: Cassandra, DynamoDB (default), Riak, CouchDB.

CA (consistency + availability)

Only achievable on a single node or non-partitioned system. Not a real distributed-systems choice - listing it just to check the box.

When to choose each

CP for systems where stale data is dangerous: leader election, distributed locks, billing balances, inventory counts, configuration. AP for systems where availability beats freshness: shopping cart, social feed, user-generated content, telemetry. The interviewer wants you to name the partition behavior - "we choose CP" without explaining what happens during a partition is the red flag.

PACELC

PA/EL

Partition: pick Availability. Else (no partition): pick Latency. Examples: Cassandra, DynamoDB.

PA/EC

Partition: pick Availability. Else: pick Consistency. Examples: MongoDB (with majority reads).

PC/EL

Partition: pick Consistency. Else: pick Latency. Rare in practice.

PC/EC

Partition: pick Consistency. Else: pick Consistency (always). Examples: VoltDB, Spanner (with TrueTime), HBase.

When to choose each

PACELC extends CAP by surfacing the always-on tradeoff between latency and consistency even when there is no partition. Use it when the interviewer asks "why not just always pick C?" - the answer is the latency cost of synchronous quorum.

Sync vs async communication

Synchronous (RPC, REST, gRPC, blocking)

Caller waits for callee response. Tight coupling on availability + latency.

Asynchronous (queue, event bus, pub/sub)

Caller fires-and-forgets (or polls). Caller and callee can scale + fail independently.

When to choose each

Sync when the user is waiting and the result is needed for the response (auth, payments authorization, search). Async when the work can be deferred (email send, image processing, analytics, fan-out to N consumers, anything that benefits from buffering / smoothing). Prefer async for cross-service writes you don't want to lose during a downstream outage.

Push vs pull

Push (server -> client / consumer)

Server emits when data is ready. WebSocket, SSE, webhook, mobile push notification, fan-out-on-write feeds.

Pull (client / consumer -> server)

Consumer polls on its own cadence. HTTP long-poll, Kafka consumer, RSS, fan-out-on-read feeds.

When to choose each

Push when latency to the consumer matters more than the coordination cost (chat, live scores, trading). Pull when consumers are slow / unreliable, or when consumer count is large and uneven (Kafka's pull model lets each consumer set its own pace - critical for backpressure). Hybrid (push notification + pull payload) is the pragmatic mobile pattern.

SQL vs NoSQL

SQL / relational (Postgres, MySQL, Aurora)

Schema-on-write, ACID, joins, mature query planners, transactional guarantees.

Document (MongoDB, DynamoDB, Cosmos)

Schema-flexible, horizontal scaling first-class, single-item ACID, weak/no joins.

Wide-column (Cassandra, ScyllaDB, HBase)

Tunable consistency, very high write throughput, query patterns are baked in at table-design time.

Key-value (Redis, DynamoDB-as-KV, Memcached)

Lowest-latency lookup, smallest API surface, primary key access only.

Graph (Neo4j, JanusGraph, Neptune)

First-class edges + traversal queries (Cypher / Gremlin). Pays off when queries are k-hop traversals, not lookups.

Time-series (Influx, Timescale, Prometheus)

Column-store + retention + downsampling for append-only metric streams.

When to choose each

SQL by default unless you have a specific reason. NoSQL when (a) you need horizontal scale beyond a single relational instance, (b) the access pattern is fixed and known up front (DynamoDB single-table), (c) you need flexible / nested documents (MongoDB / catalog), or (d) the workload is a specialty (graph, time-series, KV-only). The wrong move is choosing NoSQL because "it scales" without specifying which NoSQL family - they have very different characteristics.

Sharding strategy

Vertical partitioning

Split by columns / table - put hot tables on one DB, cold on another. Doesn't scale a single table; relieves contention between subsystems.

Horizontal sharding (range-based)

Rows distributed by key range (e.g. user_id 0-1M on shard A, 1M-2M on shard B). Range queries are local; hotspots possible (e.g. monotonic IDs).

Horizontal sharding (hash-based)

Rows distributed by hash(key) mod N. Good load balance; range queries become scatter-gather.

Consistent hashing

Hash both keys and nodes onto a ring; each key goes to the next clockwise node. Adding/removing a node only re-shards O(1/N) of keys. Used by Cassandra, DynamoDB, memcached client-side.

Directory-based (lookup service)

A small lookup service maps key -> shard. Most flexible; the lookup service is itself a SPOF and capacity bottleneck.

When to choose each

Vertical first to relieve hot tables / hot read paths cheaply. Hash-based for write-heavy workloads where load balance dominates. Range-based when range queries are common (time-series, sorted feeds). Consistent hashing when nodes come and go often (caches, large clusters with frequent rebalancing). Directory-based when shard placement needs to be policy-driven (geo, tenant tier, regulatory).

Consistency model

Strong (linearizable)

Every read sees the latest acknowledged write. Implementation: Raft / Paxos majority quorum, single-leader with sync replication.

Sequential / causal

Operations have a single global order (sequential) or respect happens-before (causal). Cheaper than linearizable; sufficient for many UIs.

Read-your-writes / monotonic reads

Session-scoped guarantees - this user always sees their own writes / never sees time go backwards. Often built on top of eventual consistency with sticky routing or version vectors.

Eventual

Replicas converge given no new writes. Cheapest, highest-availability. Stale reads are visible.

When to choose each

Strong for money, locks, leadership, inventory. Causal / sequential when users care about ordering of related events but not absolute freshness (chat, comment threads). Read-your-writes for any UI where the user just submitted something - this is table-stakes UX even on top of eventually-consistent storage. Eventual for telemetry, recommendations, follower counts, anything where stale-by-seconds is fine.

Cache write strategies

Write-through

Write goes to cache + database synchronously. Cache always fresh; write latency = max(cache, db).

Write-back / write-behind

Write goes to cache; cache flushes to DB asynchronously. Lowest write latency; risk of data loss on cache crash before flush.

Write-around

Write skips cache and goes directly to DB. Read populates cache on miss. Good for write-heavy workloads where written data is rarely read soon (logs, telemetry).

Cache-aside (lazy load)

Application checks cache first, falls back to DB on miss, then populates cache. Cache can drift from DB - need invalidation discipline. The most common production pattern.

When to choose each

Cache-aside for general-purpose read caches (Redis in front of Postgres). Write-through when you cannot tolerate stale reads and write latency is acceptable. Write-back for buffered writes where small data loss is OK (counters, analytics). Write-around for write-heavy + read-rare workloads (audit logs, time-series ingestion).

Cache eviction policies

LRU (Least Recently Used)

Evict the least-recently-touched key. Default for most caches. Good for temporal locality.

LFU (Least Frequently Used)

Evict the least-touched key over a window. Better for skewed access (top-N gets re-hit forever). Heavier to implement.

FIFO

Evict by insertion order. Simple; ignores access pattern. Rare in practice.

TTL-only

Evict on time expiration; let memory fill. Common for derived data with known freshness (price quotes).

Random / 2-random

Pick a random victim (or pick the oldest of two random). Surprisingly competitive with LRU at much lower bookkeeping cost. Used by Redis allkeys-random.

When to choose each

LRU as default. LFU when access is heavy-tailed and the head set should be sticky. TTL when you know the freshness window and don't want to evict early. Random when you cannot afford LRU bookkeeping at very high QPS (Redis maxmemory-policy allkeys-random / allkeys-lru-approx).

Idempotency

Natural idempotency

The operation is inherently idempotent - PUT with full state, DELETE, GET. Re-applying gives the same result.

Idempotency key (client-supplied)

Client sends a UUID; server records request_id + first response. Replays return the cached response. Stripe-style.

Conditional / optimistic concurrency

Compare-and-swap on a version / etag. Replays past the etag fail with 412 Precondition Failed.

Exactly-once via dedup table

Persist (request_id, status) before side effects, dedup before replaying. Used in payments, message dedup.

When to choose each

Always design for at-least-once delivery (networks retry, clients retry, queues redeliver). Use natural idempotency when the API shape allows. Use idempotency keys for any mutating endpoint exposed to external clients (payments, account creation, send-money). Use conditional updates inside a service for compare-and-swap. "Exactly once" is a guarantee you build with idempotency + dedup, not one you get from the transport.

Rate limiting strategies

Token bucket

Bucket fills at rate r up to capacity B; each request consumes a token. Allows short bursts up to B; long-run rate r. Default for most APIs.

Leaky bucket

Requests enter a bounded queue; drain at fixed rate. Smooths bursts but does not allow them.

Fixed window counter

Count requests per fixed window (e.g. per minute). Cheap; allows 2x the limit at the boundary (last second of one window + first second of next).

Sliding window log

Store timestamps; on each request, drop entries older than window and count. Most accurate; high memory.

Sliding window counter

Approximation: weighted average of current + previous fixed-window counts. Memory-efficient; smooths the boundary problem.

When to choose each

Token bucket as the default, especially when you want to allow short bursts. Leaky bucket when downstream cannot tolerate any burst. Fixed window when you need cheap counters and the boundary effect doesn't matter. Sliding window counter when you want close-to-accurate rates without storing per-request timestamps. Sliding window log only for low-QPS / high-precision use cases (per-user limits at small scale).

Other cheat sheets

Practice the patterns

Reading is the floor. The signal in interviews comes from working problems out loud and defending your tradeoffs. Spin up an AI mock interview or run a coding challenge to put these to work.