We use cookies for site analytics. Accept to help us understand how the site is used. See our Privacy Policy for details.
The canonical bounded system design problem. Read-heavy, hot-key prone, and a great vehicle for hashing, caching, and capacity estimation.
Design a service that takes a long URL and returns a shorter alias. When a user visits the short alias, the service redirects them to the original URL.
This is the most-asked bounded system design question because it surfaces the full skill set in 45 minutes: requirements gathering, capacity estimation, ID generation, read-write asymmetry, caching, and database selection. Don't underestimate it - interviewers grade depth, not topic novelty.
Asking these before diving into a solution is the difference between a "hire" and a "no signal" rating. Pick the questions whose answers would change your design.
Storage
Throughput
Bandwidth
Cache
The system splits cleanly along the read-write asymmetry. The shorten path goes API gateway → write service → ID generator → primary key-value store, then warms the cache. The redirect path goes API gateway → read service → cache (hits ~95%) → key-value store on miss, with an async event published to an analytics pipeline.
Critical observation: 99% of engineering effort should be on the read path. Reads outnumber writes 100:1 and live on the user-facing latency budget. Writes can be slower, can fail and retry, can sit behind a queue.
Terminate TLS, route shorten vs redirect, apply rate limits, attach caching headers. CloudFront or equivalent absorbs the easy redirects at the edge.
Stateless. Validates input, calls ID generator, writes to KV store, warms cache, publishes shorten event.
Returns a unique base62 code. Strategies covered in deep-dives below: counter, hash + collision retry, or Snowflake-style.
Stateless. Lookup short code → URL. Cache-aside with Redis. Falls through to KV on miss, populates cache. Returns 301/302 redirect.
LRU + TTL. Sharded by short code. Holds the top 1% of links by access frequency.
Primary persistence. DynamoDB, Cassandra, or sharded Postgres. Partition key = short code. No secondary indexes needed for the redirect path.
Async. Redirect service publishes click events to Kafka; downstream Flink/Spark jobs aggregate counts, geo, referrer.
The subsystems where the interview is actually decided. Skim if you're running short; own these if you want a strong signal.
Three viable approaches, each with sharp trade-offs.
Option 1: Hash + truncate
Hash the long URL (MD5/SHA-1), take the first 6-8 bytes, base62-encode. Same URL always produces same code (free dedup), but collisions exist - a different long URL can hash to the same prefix. Detect on insert and re-hash with a salt. Collision rate at 8-character base62 codes (62^8 ≈ 218T keyspace) over 180B records is ~0.08% - manageable with retry.
Option 2: Counter-based base62
Maintain a global counter. Each new URL gets counter++. Encode the counter in base62 → short code. Pro: no collisions, perfectly dense, easy to reason about. Con: the counter is a global bottleneck. Solutions: (a) ZooKeeper coordination (slow), (b) DB sequences (single point of failure), (c) ranges - each write node grabs a 1M-block from a coordinator and serves locally (much better; coordinator hit only every 1M writes).
Option 3: Snowflake-style distributed IDs
Each write node has a unique node ID. ID = timestamp_ms (41 bits) || node_id (10 bits) || sequence (12 bits). 64 bits → base62 → ~11 characters. No coordination, no collisions, ordered by time. Slightly longer codes than counter.
Recommendation: Range-based counter for L4/L5 interviews. Hash + retry is acceptable but the collision discussion eats time. Snowflake is overkill at this scale and produces longer codes than necessary.
Common follow-up: "What if your counter range coordinator goes down?" → Answer: write nodes hold a buffer of pre-fetched ranges (10M IDs). Counter outage of minutes is invisible. Pre-warm on deploy.
The Pareto distribution is brutal here. A viral tweet's link can get 100K rps for 10 minutes. A naive cache shard takes the heat on one node and melts.
Detection
Maintain a per-shard sketch (Count-Min Sketch or HyperLogLog++) of access frequency. When any single key exceeds N rps (e.g., 10K), mark it as hot.
Mitigation
Failure mode to discuss
If you cache the redirect at the CDN, you also cache misses (404). If a malicious user requests random short codes, you fill the CDN with negative cache entries and increase origin load on legit traffic. Mitigation: don't cache 404s, or cache them with a short TTL (30s) and rate-limit unknown-code requests at the gateway.
The access pattern is dead simple: given a short code (6-11 chars), return a URL (up to 2KB) and a few metadata fields. No joins, no range queries, no analytics on the hot path. This screams key-value store.
DynamoDB
Cassandra
Sharded Postgres
Avoid: A single Postgres instance. At 36TB raw data and 500K rps peak reads, no single instance survives. Sharding solves it but you're now operating a custom KV store in Postgres clothing.
Primary key choice matters: Using short_code as the partition key gives perfect even distribution (codes are random). Using user_id would create hot partitions on power users. Using created_date would create write hotspots.
The redirect path must not be slowed down by analytics. Decouple aggressively.
Synchronous path (latency-critical)
Read service does the lookup, returns the redirect, and emits a single fire-and-forget event to a local in-process buffer. No I/O on the response path.
Async pipeline
Local buffers flush every N ms or M events to Kafka. Topic is partitioned by short_code so per-link aggregations land on the same consumer.
A streaming job (Flink, Spark Streaming, or Kinesis Analytics) consumes the topic and maintains:
Aggregates are written to a serving store (DynamoDB or Postgres) keyed by (short_code, time_bucket). The dashboard queries this serving store, never the raw event log.
Cost discussion: At 10B events/day, raw event volume is ~2 TB/day. Stored hot for 30 days = 60 TB. Use a tiered approach - hot in S3 for 30 days, then Glacier. Aggregates are tiny by comparison (a few MB per link per year).
Failure mode: If the Kafka cluster degrades, event lag grows. The redirect path is unaffected (events are buffered locally; on overflow they're dropped). Build alerting on lag, not on completeness.
Two extra requirements that interviewers add as follow-ups.
Custom aliases
Users want vanity codes ("my-bday-party"). Reservation needs to be atomic - no two users get the same alias. Solutions:
Validate aliases against a reserved-words list (admin, login, etc. - paths that conflict with the app). Length limits prevent griefing.
Malicious link defense
Short URLs are weaponized for phishing. At interview level, mention the surface area:
The interview signal: don't rabbit hole. Two minutes on this section is enough; show you know the surface exists, then move on if the interviewer doesn't push.
Read latency vs storage cost
The CDN edge cache is the cheapest and fastest layer. Push as many redirects there as possible by setting cache-control headers. The cost is staleness: when a user deletes their link, it lingers at the edge for the TTL. This is the right trade-off for a public link service.
ID length vs collision rate
6-character base62 = 56B keyspace. At 180B URLs, the probability of any collision is high (birthday paradox kicks in around √56B ≈ 237K records). 7-character = 3.5T keyspace - safe for 180B records. 8-character = 218T keyspace - safe for the foreseeable future. The cost is 1-2 extra characters per link. Almost always worth it.
Strong vs eventual consistency
A user creates a short URL and immediately tweets it. If the read replica hasn't caught up, the first clicks 404. Solutions: (a) read-your-writes via session cookie sticky to primary, (b) write to all replicas synchronously (slow), (c) accept ~100ms staleness as a known limitation.
Build vs buy
At the interview, mention: "If we're starting from scratch, I'd build on DynamoDB + CloudFront + Redis ElastiCache - 80% of the heavy lifting is already operationalized." Showing awareness of managed services scores higher than designing every component from raw VMs.
Be ready for at least three of these. The first one is almost always asked.
Reading is the floor. The interview signal is in walking through this live with someone probing follow-ups. Use the AI mock interview to practice talking through requirements, architecture, and trade-offs out loud.
Start an AI mock interview →