System design interviews are where a lot of strong engineers fall apart. You can crush algorithms, nail the behavioral round, and then freeze when someone says "design Twitter."
It's not because the material is impossibly hard. It's because most people don't know how to structure their thinking under pressure. This guide fixes that.
Why System Design Interviews Exist
Companies aren't testing whether you've memorized CAP theorem. They want to know:
- Can you break down ambiguous problems?
- Do you understand tradeoffs?
- Can you communicate technical decisions clearly?
- Do you think about real-world constraints like cost, latency, and failure?
A senior engineer who gives a perfect textbook answer but can't explain why they chose a particular database is less impressive than a mid-level engineer who reasons clearly through the tradeoffs.
The Framework: 4 Steps in 45 Minutes
Every system design interview follows roughly the same structure. Here's how to spend your time:
Step 1: Clarify Requirements (5 minutes)
This is where most candidates go wrong. They jump straight into drawing boxes and arrows.
Stop. Ask questions first.
Functional requirements - What does the system actually do?
- Who are the users? How many?
- What are the core features we need to support?
- What can we explicitly leave out?
Non-functional requirements - How well does it need to work?
- What's the expected scale? (users, requests per second, data volume)
- What are the latency requirements?
- How important is availability vs consistency?
- Are there any compliance or regulatory constraints?
Example for "Design a URL shortener":
- "Are we supporting custom short URLs or just random ones?"
- "What's the expected volume? Millions of URLs per day or thousands?"
- "Do URLs expire? Can users delete them?"
- "What's the read-to-write ratio? Mostly reads, I'd assume?"
- "Do we need analytics - click tracking, geographic data?"
Write these requirements down. They're your guardrails for the rest of the interview.
Step 2: High-Level Design (10 minutes)
Start with the big picture. Draw the major components and how data flows between them.
For a URL shortener:
Client -> Load Balancer -> API Server -> Database
-> Cache (Redis)
Analytics -> Message Queue -> Analytics DB
At this stage, name the components but don't get into implementation details. The interviewer wants to see that you can decompose a system into logical pieces.
Key questions to address:
- What's the API look like? (REST endpoints, request/response)
- What are the core entities and relationships?
- Where does data flow from ingestion to retrieval?
Step 3: Deep Dive (20 minutes)
This is the meat. The interviewer will ask you to dig into specific components. Be prepared to go deep on:
Data model and storage:
- What's the schema?
- SQL vs NoSQL - and why?
- How will you handle growth? Sharding, partitioning?
Scaling:
- Where are the bottlenecks?
- How do you handle hot spots?
- Caching strategy?
Reliability:
- What happens when a server goes down?
- How do you handle data loss?
- What's the replication strategy?
The interviewer wants to hear you think through tradeoffs, not recite definitions. "I'd use Redis for caching because reads are 100:1 vs writes and we need sub-10ms latency" is better than "Redis is an in-memory key-value store."
Step 4: Discuss Tradeoffs and Improvements (10 minutes)
No design is perfect. Show maturity by discussing:
- What are the weak points in your design?
- What would you change at 10x or 100x scale?
- What would you build first vs defer?
- What monitoring/observability would you add?
Essential Building Blocks
You don't need to memorize every distributed system. But you need to understand these building blocks and when to use each one.
Load Balancing
Distributes traffic across multiple servers.
When to discuss it: Always. Every system with more than one server needs load balancing.
Approaches:
- Round robin - simple, works for stateless services
- Least connections - better for variable request processing times
- Consistent hashing - essential for caching layers and sharded databases
Caching
Store frequently accessed data closer to the consumer.
Patterns:
- Cache-aside (lazy loading): Application checks cache first, loads from DB on miss, populates cache. Most common pattern.
- Write-through: Application writes to cache and DB simultaneously. Consistent but slower writes.
- Write-behind: Application writes to cache, cache asynchronously writes to DB. Fast writes, risk of data loss.
When to discuss it: Any read-heavy system. Social media feeds, URL shorteners, product catalogs.
Cache invalidation is the hard part. Time-based expiration (TTL) is the simplest approach. Event-based invalidation is more accurate but more complex.
Databases
Relational (PostgreSQL, MySQL):
- Strong consistency, ACID transactions
- Complex queries with JOINs
- Best for structured data with relationships
- Vertical scaling first, sharding when necessary
NoSQL Document (MongoDB, DynamoDB):
- Flexible schema, horizontal scaling
- Best for data accessed as complete documents
- Avoid when you need complex joins
Key-Value (Redis, Memcached):
- Extremely fast reads/writes
- Best for caching, sessions, counters
- Limited query capability
Wide-column (Cassandra, ScyllaDB):
- Massive write throughput
- Best for time-series, IoT, logging
- Eventually consistent by default
Don't just pick a database. Explain why it fits your requirements.
Message Queues
Decouple producers from consumers. Smooth out traffic spikes.
When to use:
- Async processing (sending emails, generating thumbnails)
- Buffering between services with different throughput
- Event-driven architectures
Options: Kafka (high throughput, event streaming), SQS (simple queuing), RabbitMQ (flexible routing).
Content Delivery Networks (CDNs)
Cache static content at edge locations worldwide.
When to discuss: Any system serving static assets (images, videos, JavaScript bundles) to a global audience.
Rate Limiting
Protect your system from abuse and cascading failures.
Algorithms:
- Token bucket - allows bursts, simple to implement
- Sliding window - more precise, slightly more complex
- Fixed window - simplest, but allows burst at window boundaries
The 8 Most Common Questions (And How to Approach Them)
1. Design a URL Shortener
Key decisions:
- Generate short codes with base62 encoding (a-z, A-Z, 0-9)
- Use a counter or hash-based approach for uniqueness
- Heavy caching since reads >> writes (1000:1 ratio)
- Relational DB is fine at moderate scale; consider NoSQL if you need billions of URLs
Tradeoffs to discuss: Hash collisions vs counter-based approach, cache eviction policy, analytics pipeline design.
2. Design Twitter/X
Key decisions:
- Fan-out on write (precompute timelines) vs fan-out on read (compute on request)
- Fan-out on write works for most users, but celebrities with millions of followers need fan-out on read (hybrid approach)
- Tweet storage is simple; the timeline assembly is the hard part
- Use a message queue for async fan-out
3. Design a Chat Application
Key decisions:
- WebSockets for real-time messaging
- Message ordering: use timestamps + sequence numbers
- Group chats vs 1:1 have very different scaling characteristics
- Message storage: write-optimized store (Cassandra) + read cache
- Presence system (online/offline) is a separate service
4. Design a Rate Limiter
Key decisions:
- Where to enforce: API gateway vs application middleware
- Token bucket for most use cases
- Redis for distributed rate limiting across multiple servers
- Different limits per user tier, endpoint, and time window
5. Design YouTube/Netflix
Key decisions:
- Video upload pipeline: transcode to multiple resolutions asynchronously
- CDN for video delivery (this is non-negotiable at scale)
- Recommendation engine is its own system
- Separate hot storage (popular videos) from cold storage (rarely watched)
6. Design an E-Commerce System
Key decisions:
- Inventory management with optimistic locking to prevent overselling
- Shopping cart: session-based vs user-based vs hybrid
- Payment processing is always async
- Search and catalog are separate services from checkout
7. Design Google Docs (Collaborative Editing)
Key decisions:
- Conflict resolution: OT (Operational Transform) or CRDTs
- WebSockets for real-time sync
- Version history with snapshots at intervals
- This is genuinely hard - focus on the conflict resolution approach
8. Design a Notification System
Key decisions:
- Multiple channels: push, email, SMS, in-app
- User preferences and throttling
- Message queue for async delivery
- Template system for content management
- Delivery tracking and retry logic
Scaling Patterns You Need to Know
Horizontal vs Vertical Scaling
- Vertical: Bigger machine. Simple but has limits.
- Horizontal: More machines. Complex but effectively unlimited.
Start vertical, go horizontal when you hit limits. Most systems don't need horizontal scaling as early as people think.
Database Sharding
Split data across multiple database instances.
Strategies:
- Hash-based: Distribute by hash of key. Even distribution, hard to range query.
- Range-based: Distribute by key range. Easy range queries, risk of hot spots.
- Geographic: Distribute by region. Great for latency, complex for global queries.
Warning: Sharding adds enormous complexity. Only shard when you've exhausted indexing, read replicas, and caching. In an interview, mention this - it shows practical experience.
Replication
Leader-follower: One write node, multiple read nodes. Simple, good for read-heavy workloads.
Multi-leader: Multiple write nodes. Handles geographic distribution but introduces conflict resolution.
Leaderless: Any node can accept writes. Used by DynamoDB, Cassandra. Eventually consistent.
Consistent Hashing
Maps data to nodes in a way that minimizes redistribution when nodes are added or removed. Essential for distributed caches and some database sharding strategies.
Numbers You Should Know
These help you make quick back-of-envelope calculations:
| Operation | Time |
|---|---|
| L1 cache reference | 1 ns |
| L2 cache reference | 4 ns |
| RAM reference | 100 ns |
| SSD random read | 16 us |
| HDD random read | 4 ms |
| Network round trip (same datacenter) | 0.5 ms |
| Network round trip (cross-continent) | 150 ms |
| Scale | Number |
|---|---|
| Seconds in a day | 86,400 |
| Requests per day at 1 QPS | ~100K |
| Requests per day at 1000 QPS | ~100M |
| 1 GB of text | ~1 billion characters |
| 1 million users, 1 KB each | 1 GB |
Common Mistakes
1. Not clarifying requirements. You'll design the wrong system. Take 5 minutes upfront.
2. Going too deep too early. Don't start with "I'd use a B+ tree index on the..." before you've drawn the high-level architecture.
3. Ignoring tradeoffs. Every decision has downsides. If you present a perfect design with no compromises, the interviewer knows you're not thinking deeply enough.
4. Not considering failure modes. "What happens when this component goes down?" If you can't answer, your design is incomplete.
5. Overengineering. Don't add Kafka, Kubernetes, and a service mesh to a system that handles 100 requests per minute. Design for the scale you were given.
6. Not communicating. The interviewer can't read your mind. Talk through your reasoning. "I'm choosing PostgreSQL here because we need ACID transactions for the payment flow" is far more valuable than silently drawing a box labeled "DB."
How to Prepare
Week 1-2: Learn the building blocks. Understand databases, caching, load balancing, and message queues at a conceptual level.
Week 3-4: Practice the common questions. Design each system on a whiteboard or paper. Time yourself to 45 minutes.
Week 5-6: Do mock interviews. Practice explaining your reasoning out loud. Get feedback from peers or mentors.
Throughout: Read engineering blogs from companies that operate at scale. They explain real decisions with real tradeoffs. Stripe, Uber, Netflix, and Discord all publish excellent technical blogs.
The goal isn't to memorize solutions. It's to build a mental toolkit of patterns you can apply to any problem.
Practice system design concepts and sharpen your technical interview skills at gitGood.dev. Build the skills that actually get you hired.