Feeds

Design a News Feed (Twitter / Facebook)

HardFREE

The classic write-vs-read amplification trade-off. Push, pull, or hybrid fanout - and how to handle the celebrity user with 100M followers.

The problem

Design the backend that produces a personalized, ranked feed of posts for each user. When a user opens the app, they see a list of posts from accounts they follow, sorted by recency or relevance, with infinite scroll.

This is the highest-bandwidth question in the bounded set. It tests fanout reasoning (push/pull/hybrid), the celebrity-user problem, and the ranking layer. Strong candidates make the fanout decision explicitly and defend it with numbers.

Clarifying questions

Asking these before diving into a solution is the difference between a "hire" and a "no signal" rating. Pick the questions whose answers would change your design.

→Is the feed reverse-chronological, or is it ranked by relevance (engagement, recency, social signals)?
→How many users (DAU)? Average follow count? Maximum follow count?
→How fresh does the feed need to be - sub-second (push notifications) or minute-level?
→Do we need to support media (images, videos), and is that storage in scope?
→Are posts public-only, or do we need privacy/access controls (followers-only, friends, lists)?
→Must we support replies/threads, or are posts standalone for v1?
→Do we need read receipts or 'seen this' state per user?
→Is the feed deterministic (same query → same answer) or is it allowed to drift?

Requirements

Functional requirements

·POST a status update (text + optional media)
·FOLLOW / UNFOLLOW another user
·GET feed - returns ranked, paginated list of posts from followed accounts
·Infinite scroll with cursor-based pagination
·Support a post-engagement signal (likes / replies) feeding back into ranking

Non-functional requirements

Scale: 500M DAU, average 200 follows per user (=100B follow edges), 1B posts created/day, 100B feed requests/day. 5-year retention.
Latency: Feed load p99 < 200ms. Post submission p99 < 500ms (the user can wait while we fanout in the background).
Availability: 99.95% for feed reads. A brief feed outage degrades the product but doesn't lose data. Posts must never be lost - 99.99% durability on writes.
Consistency: Eventually consistent. Users tolerate minutes of staleness for a follow change; seconds of staleness for a new post from a friend. A user's own posts must appear in their feed within 2 seconds.

Capacity estimation

Storage

Posts: 1B/day × 365 × 5y = 1.8 trillion posts. ~1KB/post (text + metadata; media stored separately) = 1.8 PB raw text. With replication and indexes: ~7 PB.
Follow edges: 100B edges × 50B/edge = 5 TB. Trivial.
Feed cache: 500M users × 1KB top-100 cache = 500 GB. Distributable across a few hundred Redis nodes.

Throughput

Writes: 1B posts/day = 12K posts/sec. Plus follow changes: ~100K/sec at peak.
Reads: 100B feed loads/day = 1.2M qps average, peak ~5M qps.
Fanout writes: depends on strategy - pull (0 amplification) vs push (avg 200x = 200K writes/sec, peaks at 100M writes for a celebrity post).

Bandwidth

5M qps × 50KB feed payload = 250 GB/s egress. This is huge - aggressive compression (gzip/brotli) and CDN caching of static feed components is essential.

Compute

Ranking pipeline runs ML inference per feed load. At 5M qps × 100 candidate posts × 10ms inference = ~5M cores. In practice, batched inference and aggressive feature caching pull this down by 50-100x.

High-level architecture

The defining decision is fanout-on-write vs fanout-on-read.

Fanout-on-write (push): When you post, the system writes a copy of your post ID into every follower's feed cache. Read path is trivial - look up your feed cache, fetch posts. Read latency: excellent. Write amplification: brutal for users with millions of followers.

Fanout-on-read (pull): When you load your feed, the system queries every account you follow and merges their recent posts. Write is trivial. Read is expensive - especially for users following 5,000 accounts.

Hybrid (the right answer): Push for normal users, pull for celebrities. A user's feed is the union of (a) the pre-fanned-out cache from normal-user posts, and (b) on-the-fly fetched posts from followed celebrities. The threshold (e.g., 1M followers) is tunable.

Post service

Accepts new posts, stores in post DB, publishes a 'new_post' event to Kafka.

Fanout service

Consumes new_post events. For each follower, writes a feed entry to their feed cache. Skips users above the celebrity threshold (handled at read time).

Follow service / graph DB

Stores follower edges. For a given user, returns followee list. For a given post author, returns follower list (for fanout). Optimized for both directions.

Feed cache (Redis)

Per-user list of post IDs (top 1000 most recent). Sharded by user_id. Treated as truth on the hot path - falls through to recovery only on miss.

Post DB (Cassandra / DynamoDB)

Source of truth for post content. Partition key = post_id. Read by ID after the feed cache returns post IDs.

Ranking service

Takes candidate posts (from cache + celebrity pull), scores each (engagement, recency, affinity), returns sorted list. ML model + feature store.

Media storage / CDN

S3 + CloudFront. Posts reference media URLs. Decoupled entirely from the feed pipeline.

Deep dives

The subsystems where the interview is actually decided. Skim if you're running short; own these if you want a strong signal.

1. Push vs pull vs hybrid: doing the math

The right answer depends on the follower distribution, which is power-law in every social network.

Pure push amplification
Average user has 200 followers → 1 post = 200 fanout writes. At 1B posts/day = 200B fanout writes/day = 2.3M writes/sec. Manageable.

But: a celebrity with 100M followers posting once = 100M fanout writes in a burst. If they tweet 5x/day = 500M writes. Push to everyone falls apart on the long tail.

Pure pull amplification
A normal user has 200 follows. Loading their feed: query 200 user timelines, merge top-K. At 5M feed-loads/sec × 200 lookups = 1B database queries/sec. Doesn't fit.

Hybrid (recommended)
Threshold T = 1M followers (tune from data).

Below T: push. ~99% of users. Fanout is bounded.
Above T: pull at read time. ~1000 users globally. Each load adds ~5-50 celebrity-timeline lookups - manageable.

A user's feed = merge(cached_feed, celebrity_pulls). Re-rank at read time.

Edge case: a user follows ONLY celebrities. Their cached feed is empty; their feed = pure pull. That's fine - it's the same cost as the per-user "pull from a few celebrities" path.

Edge case: a user just gained 100M followers (became a celebrity). Future posts now go through the pull path. Past posts: leave them in followers' caches; they'll age out naturally.

Edge case: someone unfollows. The cached feed has stale entries. Filter at read time using the live follow graph - cheaper than rewriting caches.

2. Ranking pipeline: from chronological to ML

Reverse-chronological is the v1. Real systems rank.

Candidate generation
Pull top 1000 posts from the user's feed cache (last 24 hours, deduped, filtered against blocks/mutes).

Feature extraction
Per (user, post) pair, compute features:

Recency (decay function of post age)
Author affinity (how often the user interacts with this author)
Content type preference (the user clicks videos more than images)
Predicted engagement (will they like, reply, share?)
Network signals (how many of the user's friends engaged)

Features come from a feature store (Feast, Tecton, or in-house) that pre-computes user-level and author-level features and serves them at sub-ms latency.

Scoring
A multi-task model predicts probabilities (P(like), P(reply), P(share), P(skip)) and combines them into a single score: score = w1·P(like) + w2·P(reply) + ... - w_skip·P(skip). Train offline on historical engagement.

Re-ranking and diversity
Score top-300 candidates, then apply diversity rules (no more than 3 posts from the same author in a row, mix content types). Final list = top 50, returned to the client.

Latency budget
Candidate generation: 20ms. Feature lookup: 30ms. Scoring (batched): 50ms. Re-ranking: 10ms. Total: ~110ms inside the 200ms feed budget. The remaining 90ms is network and serialization.

Cold-start users
New users have no engagement history. Fall back to a popularity-based ranker (most-engaged posts in the last hour from accounts they follow). Switch to personalized after ~50 interactions.

3. Storage: post data and feed entries

Post DB (Cassandra)

Partition key: post_id (Snowflake-generated, time-ordered).
No joins; clients fetch by ID.
Replication factor 3 across data centers.
Compaction strategy: time-window for TTL-based expiry of old posts (if you ever expire).

Why Cassandra over DynamoDB? At this scale, self-hosted Cassandra is cheaper for the steady write load. DynamoDB's per-write cost balloons. Many large feeds run on Manhattan (Twitter), Cassandra (FB pre-TAO), or proprietary systems.

Feed cache (Redis Cluster)

Per-user list of (post_id, author_id, timestamp) tuples. Capped at 1000 entries.
TTL on the list itself - if a user is inactive for 30 days, drop their cache and rebuild on next login.
Sharded by user_id. Each shard handles ~10M users.
Replication: master + 1 replica per shard for fast failover.

Why not store full post content in the feed cache? Memory pressure. Post text is up to 1KB; multiplied by 1000 entries × 500M users = 500 TB of cached strings. Caching just the IDs (50 GB total) is dramatically cheaper, and the post content lookup happens once per feed load against the post DB which is itself cached.

Cold cache rebuild
On miss, the read service queries the follow service for followees, fetches their recent posts, scores, returns. This is slower (300-500ms) but acceptable as a rare path.

4. The celebrity write storm

When a celebrity posts and the system has to fanout to 100M followers, the naive implementation takes minutes and saturates the network.

Sharded fanout
Split the follower list into chunks of 10K. Process chunks in parallel across worker pool. 100M / 10K = 10K parallel jobs. Each job: read chunk → for each follower, write feed entry. Total: ~30 seconds across the fleet, no single node overwhelmed.

Hybrid threshold (already mentioned)
For users above the celebrity threshold, don't fanout at all. Skip the storm entirely. The cost is a slightly slower feed read for followers (a single extra timeline lookup).

Backpressure on Kafka
new_post events for celebrities can saturate the topic. Use a dedicated 'celebrity_post' topic with higher partition count and dedicated consumers. Normal posts go to the main topic.

Ordering during a storm
Followers should see posts in roughly the right order. If celebrity post A arrives at fanout worker 1 and B arrives at worker 2, two followers might see them in different orders. This is acceptable - small ordering glitches are invisible.

What if the storm fails partway?
Fanout writes are idempotent (set with post_id key; second write is a no-op). Retry the failed chunks. At-least-once delivery via Kafka ensures no follower is missed.

5. Pagination, freshness, and infinite scroll

Naive offset pagination ("page 2 of feed") breaks when new posts arrive between requests. Cursor-based pagination is the only correct answer.

Cursor design
Cursor = (post_id, score, timestamp). Server returns top-K posts with a cursor pointing to the next batch (post_id below the lowest in the current batch, or score below). Client sends cursor on next request; server resumes from there.

Stable ranking under new posts
If new posts arrive while the user is scrolling, they should not pop into the middle of the visible list. Two strategies:

Snapshot at request time: cache the candidate set for 5 minutes; pagination works against the snapshot.
New-posts banner: separate API for "X new posts since you opened" - user taps to refresh top.

Pull-to-refresh
Triggers a full feed regeneration with the current top-K. The new top-K is returned; the user's old position is preserved client-side until they scroll up.

Offline support
Mobile clients cache the top 100 posts. Reopening the app shows the cached feed instantly while a background fetch refreshes. The "fresh" feed replaces the cached one when ready.

Trade-offs

Push vs pull is the central decision
The hybrid model isn't a compromise; it's the architecturally correct answer for power-law social graphs. Pure push collapses on celebrities. Pure pull collapses on normal-user feeds. Always present the hybrid as the recommendation, with the threshold as a tunable parameter.

Freshness vs cost
Eventually consistent feeds save you 10x on infrastructure vs strict consistency. Users don't notice 30-second staleness in posts. They do notice an outage. Choose eventual consistency.

Storage for user data vs feed cache
Post DB is durable, multi-region, write-once. Feed cache is rebuildable, single-region per user, eviction-tolerant. Conflating them (e.g., storing all feeds in DynamoDB with full content) is a common interview anti-pattern - candidates over-design for durability what doesn't need it.

Ranking complexity vs latency
Each ML signal added to ranking costs 5-20ms. Budget tightly. Many high-leverage signals can be precomputed offline (user-author affinity matrix) so online lookup is constant-time. Real-time signals (last-minute engagement) are worth their cost only if they materially improve outcomes - measure.

The "newsfeed" is actually three different products
A reverse-chronological friend feed (close ties), a ranked engagement feed (broader network), and an editorial feed (trending content). Modern apps blend all three. At interview level, design the engagement feed and acknowledge the others exist.

Common follow-up questions

Be ready for at least three of these. The first one is almost always asked.

?How would you handle a network partition between data centers during a fanout?
?What's the user experience if the feed cache is completely cold?
?How do you A/B test ranking model changes safely?
?What changes if the product wants real-time push notifications for top posts?
?How would you support muted accounts and content filters efficiently?
?What's your strategy for detecting and demoting low-quality / spam posts?
?How do you scale fanout if 1% of users are now celebrities (vs 0.01%)?
?What does the migration look like if we want to switch from chronological to ML-ranked?

Companies that test this topic

Practice in interview format

Reading is the floor. The interview signal is in walking through this live with someone probing follow-ups. Use the AI mock interview to practice talking through requirements, architecture, and trade-offs out loud.

Start an AI mock interview →

The problem

Clarifying questions

Requirements

Functional requirements

Non-functional requirements

Capacity estimation

High-level architecture

Post service

Fanout service

Follow service / graph DB

Feed cache (Redis)

Post DB (Cassandra / DynamoDB)

Ranking service

Media storage / CDN

Deep dives

1. Push vs pull vs hybrid: doing the math

2. Ranking pipeline: from chronological to ML

3. Storage: post data and feed entries

4. The celebrity write storm

5. Pagination, freshness, and infinite scroll

Trade-offs

Common follow-up questions

Related system design topics

Chat

Distributed Cache

Companies that test this topic

Practice in interview format

The problem

Clarifying questions

Requirements

Functional requirements

Non-functional requirements

Capacity estimation

High-level architecture

Post service

Fanout service

Follow service / graph DB

Feed cache (Redis)

Post DB (Cassandra / DynamoDB)

Ranking service

Media storage / CDN

Deep dives

1. Push vs pull vs hybrid: doing the math

2. Ranking pipeline: from chronological to ML

3. Storage: post data and feed entries

4. The celebrity write storm

5. Pagination, freshness, and infinite scroll

Trade-offs

Common follow-up questions

Related system design topics

Chat

Distributed Cache

Companies that test this topic

Practice in interview format