We use cookies for site analytics. Accept to help us understand how the site is used. See our Privacy Policy for details.
The classic write-vs-read amplification trade-off. Push, pull, or hybrid fanout - and how to handle the celebrity user with 100M followers.
Design the backend that produces a personalized, ranked feed of posts for each user. When a user opens the app, they see a list of posts from accounts they follow, sorted by recency or relevance, with infinite scroll.
This is the highest-bandwidth question in the bounded set. It tests fanout reasoning (push/pull/hybrid), the celebrity-user problem, and the ranking layer. Strong candidates make the fanout decision explicitly and defend it with numbers.
Asking these before diving into a solution is the difference between a "hire" and a "no signal" rating. Pick the questions whose answers would change your design.
Storage
Throughput
Bandwidth
Compute
The defining decision is fanout-on-write vs fanout-on-read.
Fanout-on-write (push): When you post, the system writes a copy of your post ID into every follower's feed cache. Read path is trivial - look up your feed cache, fetch posts. Read latency: excellent. Write amplification: brutal for users with millions of followers.
Fanout-on-read (pull): When you load your feed, the system queries every account you follow and merges their recent posts. Write is trivial. Read is expensive - especially for users following 5,000 accounts.
Hybrid (the right answer): Push for normal users, pull for celebrities. A user's feed is the union of (a) the pre-fanned-out cache from normal-user posts, and (b) on-the-fly fetched posts from followed celebrities. The threshold (e.g., 1M followers) is tunable.
Accepts new posts, stores in post DB, publishes a 'new_post' event to Kafka.
Consumes new_post events. For each follower, writes a feed entry to their feed cache. Skips users above the celebrity threshold (handled at read time).
Stores follower edges. For a given user, returns followee list. For a given post author, returns follower list (for fanout). Optimized for both directions.
Per-user list of post IDs (top 1000 most recent). Sharded by user_id. Treated as truth on the hot path - falls through to recovery only on miss.
Source of truth for post content. Partition key = post_id. Read by ID after the feed cache returns post IDs.
Takes candidate posts (from cache + celebrity pull), scores each (engagement, recency, affinity), returns sorted list. ML model + feature store.
S3 + CloudFront. Posts reference media URLs. Decoupled entirely from the feed pipeline.
The subsystems where the interview is actually decided. Skim if you're running short; own these if you want a strong signal.
The right answer depends on the follower distribution, which is power-law in every social network.
Pure push amplification
Average user has 200 followers → 1 post = 200 fanout writes. At 1B posts/day = 200B fanout writes/day = 2.3M writes/sec. Manageable.
But: a celebrity with 100M followers posting once = 100M fanout writes in a burst. If they tweet 5x/day = 500M writes. Push to everyone falls apart on the long tail.
Pure pull amplification
A normal user has 200 follows. Loading their feed: query 200 user timelines, merge top-K. At 5M feed-loads/sec × 200 lookups = 1B database queries/sec. Doesn't fit.
Hybrid (recommended)
Threshold T = 1M followers (tune from data).
A user's feed = merge(cached_feed, celebrity_pulls). Re-rank at read time.
Edge case: a user follows ONLY celebrities. Their cached feed is empty; their feed = pure pull. That's fine - it's the same cost as the per-user "pull from a few celebrities" path.
Edge case: a user just gained 100M followers (became a celebrity). Future posts now go through the pull path. Past posts: leave them in followers' caches; they'll age out naturally.
Edge case: someone unfollows. The cached feed has stale entries. Filter at read time using the live follow graph - cheaper than rewriting caches.
Reverse-chronological is the v1. Real systems rank.
Candidate generation
Pull top 1000 posts from the user's feed cache (last 24 hours, deduped, filtered against blocks/mutes).
Feature extraction
Per (user, post) pair, compute features:
Features come from a feature store (Feast, Tecton, or in-house) that pre-computes user-level and author-level features and serves them at sub-ms latency.
Scoring
A multi-task model predicts probabilities (P(like), P(reply), P(share), P(skip)) and combines them into a single score: score = w1·P(like) + w2·P(reply) + ... - w_skip·P(skip). Train offline on historical engagement.
Re-ranking and diversity
Score top-300 candidates, then apply diversity rules (no more than 3 posts from the same author in a row, mix content types). Final list = top 50, returned to the client.
Latency budget
Candidate generation: 20ms. Feature lookup: 30ms. Scoring (batched): 50ms. Re-ranking: 10ms. Total: ~110ms inside the 200ms feed budget. The remaining 90ms is network and serialization.
Cold-start users
New users have no engagement history. Fall back to a popularity-based ranker (most-engaged posts in the last hour from accounts they follow). Switch to personalized after ~50 interactions.
Post DB (Cassandra)
Why Cassandra over DynamoDB? At this scale, self-hosted Cassandra is cheaper for the steady write load. DynamoDB's per-write cost balloons. Many large feeds run on Manhattan (Twitter), Cassandra (FB pre-TAO), or proprietary systems.
Feed cache (Redis Cluster)
Why not store full post content in the feed cache? Memory pressure. Post text is up to 1KB; multiplied by 1000 entries × 500M users = 500 TB of cached strings. Caching just the IDs (50 GB total) is dramatically cheaper, and the post content lookup happens once per feed load against the post DB which is itself cached.
Cold cache rebuild
On miss, the read service queries the follow service for followees, fetches their recent posts, scores, returns. This is slower (300-500ms) but acceptable as a rare path.
When a celebrity posts and the system has to fanout to 100M followers, the naive implementation takes minutes and saturates the network.
Sharded fanout
Split the follower list into chunks of 10K. Process chunks in parallel across worker pool. 100M / 10K = 10K parallel jobs. Each job: read chunk → for each follower, write feed entry. Total: ~30 seconds across the fleet, no single node overwhelmed.
Hybrid threshold (already mentioned)
For users above the celebrity threshold, don't fanout at all. Skip the storm entirely. The cost is a slightly slower feed read for followers (a single extra timeline lookup).
Backpressure on Kafka
new_post events for celebrities can saturate the topic. Use a dedicated 'celebrity_post' topic with higher partition count and dedicated consumers. Normal posts go to the main topic.
Ordering during a storm
Followers should see posts in roughly the right order. If celebrity post A arrives at fanout worker 1 and B arrives at worker 2, two followers might see them in different orders. This is acceptable - small ordering glitches are invisible.
What if the storm fails partway?
Fanout writes are idempotent (set with post_id key; second write is a no-op). Retry the failed chunks. At-least-once delivery via Kafka ensures no follower is missed.
Naive offset pagination ("page 2 of feed") breaks when new posts arrive between requests. Cursor-based pagination is the only correct answer.
Cursor design
Cursor = (post_id, score, timestamp). Server returns top-K posts with a cursor pointing to the next batch (post_id below the lowest in the current batch, or score below). Client sends cursor on next request; server resumes from there.
Stable ranking under new posts
If new posts arrive while the user is scrolling, they should not pop into the middle of the visible list. Two strategies:
Pull-to-refresh
Triggers a full feed regeneration with the current top-K. The new top-K is returned; the user's old position is preserved client-side until they scroll up.
Offline support
Mobile clients cache the top 100 posts. Reopening the app shows the cached feed instantly while a background fetch refreshes. The "fresh" feed replaces the cached one when ready.
Push vs pull is the central decision
The hybrid model isn't a compromise; it's the architecturally correct answer for power-law social graphs. Pure push collapses on celebrities. Pure pull collapses on normal-user feeds. Always present the hybrid as the recommendation, with the threshold as a tunable parameter.
Freshness vs cost
Eventually consistent feeds save you 10x on infrastructure vs strict consistency. Users don't notice 30-second staleness in posts. They do notice an outage. Choose eventual consistency.
Storage for user data vs feed cache
Post DB is durable, multi-region, write-once. Feed cache is rebuildable, single-region per user, eviction-tolerant. Conflating them (e.g., storing all feeds in DynamoDB with full content) is a common interview anti-pattern - candidates over-design for durability what doesn't need it.
Ranking complexity vs latency
Each ML signal added to ranking costs 5-20ms. Budget tightly. Many high-leverage signals can be precomputed offline (user-author affinity matrix) so online lookup is constant-time. Real-time signals (last-minute engagement) are worth their cost only if they materially improve outcomes - measure.
The "newsfeed" is actually three different products
A reverse-chronological friend feed (close ties), a ranked engagement feed (broader network), and an editorial feed (trending content). Modern apps blend all three. At interview level, design the engagement feed and acknowledge the others exist.
Be ready for at least three of these. The first one is almost always asked.
Reading is the floor. The interview signal is in walking through this live with someone probing follow-ups. Use the AI mock interview to practice talking through requirements, architecture, and trade-offs out loud.
Start an AI mock interview →