Storage

Design a URL Shortener (bit.ly)

MediumFREE

The canonical bounded system design problem. Read-heavy, hot-key prone, and a great vehicle for hashing, caching, and capacity estimation.

The problem

Design a service that takes a long URL and returns a shorter alias. When a user visits the short alias, the service redirects them to the original URL.

This is the most-asked bounded system design question because it surfaces the full skill set in 45 minutes: requirements gathering, capacity estimation, ID generation, read-write asymmetry, caching, and database selection. Don't underestimate it - interviewers grade depth, not topic novelty.

Clarifying questions

Asking these before diving into a solution is the difference between a "hire" and a "no signal" rating. Pick the questions whose answers would change your design.

→Are users authenticated, or can anyone shorten a URL anonymously?
→Do users get to choose custom aliases, or are short codes always system-generated?
→Should links expire, and if so, is the TTL configurable per link?
→Do we need analytics (click counts, geo, referrer)? Real-time or batch?
→Are there scale targets: daily creates, daily reads, retention horizon?
→Can the same long URL be shortened multiple times to different codes, or do we deduplicate?
→Do we need to support deletion or rotation of links?
→Any compliance requirements - link safety scanning, malware detection, content moderation?

Requirements

Functional requirements

·POST /shorten - accepts a long URL, returns a short code (or full short URL)
·GET /<short-code> - 301/302 redirects to the original URL
·Optional: custom aliases reserved by authenticated users
·Optional: per-link expiration (TTL)
·Optional: click analytics with at least daily granularity

Non-functional requirements

Scale: 100M new short URLs per day, 10B redirects per day (100:1 read/write ratio), 5-year retention horizon → ~180B total stored URLs.
Latency: Redirect p99 < 100ms (this is the user-facing path - users see latency directly). Shorten can tolerate 200-300ms.
Availability: 99.99% for redirects (cached, mostly). 99.9% for shortens is acceptable - a brief shorten outage is recoverable; a redirect outage breaks every link.
Consistency: Eventually consistent is fine for analytics. Reads of a freshly created link should see it within a second. The same long URL shortened twice can return different codes (unless dedup is required).

Capacity estimation

Storage

180B records × 200 bytes/record = ~36 TB raw, before indexes and replication. With 3x replication and indexes, plan for ~150 TB across the fleet.

Throughput

Writes: 100M / day = ~1,200 writes/sec average, peak ~5K writes/sec.
Reads: 10B / day = ~115K reads/sec average, peak ~500K reads/sec.

Bandwidth

Redirect responses are tiny (HTTP 301 with Location header). At 500B/response × 500K rps = 250 MB/s egress. Trivial.

Cache

80/20 rule: 20% of links serve 80% of traffic. Caching the top 1% in memory gets you a >90% hit rate. With 200B/record × 1% of 180B = 360 GB hot working set. Distribute across a dozen Redis nodes.

High-level architecture

The system splits cleanly along the read-write asymmetry. The shorten path goes API gateway → write service → ID generator → primary key-value store, then warms the cache. The redirect path goes API gateway → read service → cache (hits ~95%) → key-value store on miss, with an async event published to an analytics pipeline.

Critical observation: 99% of engineering effort should be on the read path. Reads outnumber writes 100:1 and live on the user-facing latency budget. Writes can be slower, can fail and retry, can sit behind a queue.

API Gateway / CDN edge

Terminate TLS, route shorten vs redirect, apply rate limits, attach caching headers. CloudFront or equivalent absorbs the easy redirects at the edge.

Write service

Stateless. Validates input, calls ID generator, writes to KV store, warms cache, publishes shorten event.

ID generator

Returns a unique base62 code. Strategies covered in deep-dives below: counter, hash + collision retry, or Snowflake-style.

Read service

Stateless. Lookup short code → URL. Cache-aside with Redis. Falls through to KV on miss, populates cache. Returns 301/302 redirect.

Hot cache (Redis)

LRU + TTL. Sharded by short code. Holds the top 1% of links by access frequency.

Key-value store

Primary persistence. DynamoDB, Cassandra, or sharded Postgres. Partition key = short code. No secondary indexes needed for the redirect path.

Analytics pipeline

Async. Redirect service publishes click events to Kafka; downstream Flink/Spark jobs aggregate counts, geo, referrer.

Deep dives

The subsystems where the interview is actually decided. Skim if you're running short; own these if you want a strong signal.

1. ID generation: counter vs hash vs Snowflake

Three viable approaches, each with sharp trade-offs.

Option 1: Hash + truncate
Hash the long URL (MD5/SHA-1), take the first 6-8 bytes, base62-encode. Same URL always produces same code (free dedup), but collisions exist - a different long URL can hash to the same prefix. Detect on insert and re-hash with a salt. Collision rate at 8-character base62 codes (62^8 ≈ 218T keyspace) over 180B records is ~0.08% - manageable with retry.

Option 2: Counter-based base62
Maintain a global counter. Each new URL gets counter++. Encode the counter in base62 → short code. Pro: no collisions, perfectly dense, easy to reason about. Con: the counter is a global bottleneck. Solutions: (a) ZooKeeper coordination (slow), (b) DB sequences (single point of failure), (c) ranges - each write node grabs a 1M-block from a coordinator and serves locally (much better; coordinator hit only every 1M writes).

Option 3: Snowflake-style distributed IDs
Each write node has a unique node ID. ID = timestamp_ms (41 bits) || node_id (10 bits) || sequence (12 bits). 64 bits → base62 → ~11 characters. No coordination, no collisions, ordered by time. Slightly longer codes than counter.

Recommendation: Range-based counter for L4/L5 interviews. Hash + retry is acceptable but the collision discussion eats time. Snowflake is overkill at this scale and produces longer codes than necessary.

Common follow-up: "What if your counter range coordinator goes down?" → Answer: write nodes hold a buffer of pre-fetched ranges (10M IDs). Counter outage of minutes is invisible. Pre-warm on deploy.

2. Hot key handling: when 1% of links are 99% of traffic

The Pareto distribution is brutal here. A viral tweet's link can get 100K rps for 10 minutes. A naive cache shard takes the heat on one node and melts.

Detection
Maintain a per-shard sketch (Count-Min Sketch or HyperLogLog++) of access frequency. When any single key exceeds N rps (e.g., 10K), mark it as hot.

Mitigation

Edge caching: Set Cache-Control: public, s-maxage=300 on redirect responses. CloudFront absorbs the 301 entirely - origin never sees it. This is the single biggest lever.
Replicate hot keys: When a key is flagged hot, write copies to all cache shards (or all replicas of the key's shard). Reads spread out.
Request coalescing: At each cache node, in-flight requests for the same hot key share a single backend lookup. 1000 concurrent misses become 1 lookup + 999 followers.

Failure mode to discuss
If you cache the redirect at the CDN, you also cache misses (404). If a malicious user requests random short codes, you fill the CDN with negative cache entries and increase origin load on legit traffic. Mitigation: don't cache 404s, or cache them with a short TTL (30s) and rate-limit unknown-code requests at the gateway.

3. Database choice: KV vs SQL, and why partition key matters

The access pattern is dead simple: given a short code (6-11 chars), return a URL (up to 2KB) and a few metadata fields. No joins, no range queries, no analytics on the hot path. This screams key-value store.

DynamoDB

Partition key: short_code. Single-key reads at single-digit ms latency at any scale. Auto-shards by partition key hash.
Cost: roughly $1.25 per million writes, $0.25 per million reads (eventually consistent). At 100M writes/day = $125/day = ~$45K/year writes; 10B reads/day = $2.5K/day = ~$900K/year reads. The reads dwarf everything - cache aggressively to push them off DynamoDB.
TTL attribute handles expiration for free.

Cassandra

Cheaper at this scale if self-hosted. Partition key = short_code. Tunable consistency. More operational burden.

Sharded Postgres

Works but you're working against the grain. The relational features (joins, transactions) buy you nothing here. Use only if the org standard mandates it.

Avoid: A single Postgres instance. At 36TB raw data and 500K rps peak reads, no single instance survives. Sharding solves it but you're now operating a custom KV store in Postgres clothing.

Primary key choice matters: Using short_code as the partition key gives perfect even distribution (codes are random). Using user_id would create hot partitions on power users. Using created_date would create write hotspots.

4. Analytics pipeline (the part interviewers love to skip)

The redirect path must not be slowed down by analytics. Decouple aggressively.

Synchronous path (latency-critical)
Read service does the lookup, returns the redirect, and emits a single fire-and-forget event to a local in-process buffer. No I/O on the response path.

Async pipeline
Local buffers flush every N ms or M events to Kafka. Topic is partitioned by short_code so per-link aggregations land on the same consumer.

A streaming job (Flink, Spark Streaming, or Kinesis Analytics) consumes the topic and maintains:

Per-link counters (1-min, 1-hour, 1-day rollups)
Geo aggregations from IP → country
Referrer breakdown
Top-N hottest links (for the hot-key replication system)

Aggregates are written to a serving store (DynamoDB or Postgres) keyed by (short_code, time_bucket). The dashboard queries this serving store, never the raw event log.

Cost discussion: At 10B events/day, raw event volume is ~2 TB/day. Stored hot for 30 days = 60 TB. Use a tiered approach - hot in S3 for 30 days, then Glacier. Aggregates are tiny by comparison (a few MB per link per year).

Failure mode: If the Kafka cluster degrades, event lag grows. The redirect path is unaffected (events are buffered locally; on overflow they're dropped). Build alerting on lag, not on completeness.

5. Custom aliases and abuse

Two extra requirements that interviewers add as follow-ups.

Custom aliases
Users want vanity codes ("my-bday-party"). Reservation needs to be atomic - no two users get the same alias. Solutions:

Conditional write to KV store: PUT short_code with condition "key does not exist". Clean, atomic, vendor-supported.
Pre-claim short codes from a reserved set (more complex; rarely worth it).

Validate aliases against a reserved-words list (admin, login, etc. - paths that conflict with the app). Length limits prevent griefing.

Malicious link defense
Short URLs are weaponized for phishing. At interview level, mention the surface area:

Domain blacklisting (Google Safe Browsing API on shorten)
Rate limit creates per IP / per user
Async re-scan after creation (if a domain becomes flagged, soft-delete affected links)
Soft-delete (HTTP 410 Gone) rather than hard delete - preserves analytics, can be reversed

The interview signal: don't rabbit hole. Two minutes on this section is enough; show you know the surface exists, then move on if the interviewer doesn't push.

Trade-offs

Read latency vs storage cost
The CDN edge cache is the cheapest and fastest layer. Push as many redirects there as possible by setting cache-control headers. The cost is staleness: when a user deletes their link, it lingers at the edge for the TTL. This is the right trade-off for a public link service.

ID length vs collision rate
6-character base62 = 56B keyspace. At 180B URLs, the probability of any collision is high (birthday paradox kicks in around √56B ≈ 237K records). 7-character = 3.5T keyspace - safe for 180B records. 8-character = 218T keyspace - safe for the foreseeable future. The cost is 1-2 extra characters per link. Almost always worth it.

Strong vs eventual consistency
A user creates a short URL and immediately tweets it. If the read replica hasn't caught up, the first clicks 404. Solutions: (a) read-your-writes via session cookie sticky to primary, (b) write to all replicas synchronously (slow), (c) accept ~100ms staleness as a known limitation.

Build vs buy
At the interview, mention: "If we're starting from scratch, I'd build on DynamoDB + CloudFront + Redis ElastiCache - 80% of the heavy lifting is already operationalized." Showing awareness of managed services scores higher than designing every component from raw VMs.

Common follow-up questions

Be ready for at least three of these. The first one is almost always asked.

?What changes if we need 100% uptime on redirects?
?How would you handle a viral link that gets 1M rps for 30 seconds?
?How would you support custom domains (acme.co/abc123)?
?What if we need real-time analytics (sub-second click counts)?
?How does the system change if we move to multi-region active-active?
?What's your rollback plan if a deploy breaks the redirect service?
?How do you prevent abuse (link-creation spam, malicious URLs)?
?How would you migrate from the existing schema if we wanted to add link previews?

Companies that test this topic