Networking

Design a CDN with Edge Compute (CloudFront / Cloudflare)

Hard Premium

Edge cache hierarchies, cache key design, invalidation, origin shield, and edge compute - the system every other system relies on without thinking about it.

The problem

Design a content delivery network with hundreds of edge points-of-presence, multi-tiered caching, and edge compute. Customers route traffic through us; we serve cached responses fast from nearby PoPs and forward misses to the origin. We support edge compute - small JavaScript / WASM functions that run at the edge with sub-50ms cold start.

This is the canonical "fast and global" system. Strong candidates anchor on the latency budget (sub-100ms p99 from anywhere on Earth), explain the multi-tier cache hierarchy, and walk through cache invalidation - the second-hardest problem in computer science.

Clarifying questions

Asking these before diving into a solution is the difference between a "hire" and a "no signal" rating. Pick the questions whose answers would change your design.

→What workload mix - static (images, JS bundles), dynamic (HTML), API responses, video?
→Customer count - 100K websites or 10M? Affects multi-tenancy design.
→Edge compute requirements - simple header rewrites or full request handling?
→Origin types we support - HTTP, HTTPS, S3, custom backends?
→Cache TTL flexibility - per-customer, per-route, dynamic via headers?
→Geographic coverage - US/EU only, global with China?
→Pricing model - per request, per GB, tiered?
→DDoS protection scope - protect customers from DDoS, or also prevent us from being DDoS'd?

Requirements

Functional requirements

·Customers configure origin + cache rules per route/path
·Edge PoPs serve cache hits; forward misses to origin (via shield/regional tier)
·Customer-controlled TTLs, cache keys, headers
·Cache invalidation API (purge by URL, prefix, tag)
·Edge compute: customer-deployed functions run at the edge
·TLS termination at edge with customer-provided or managed certificates
·DDoS mitigation and rate limiting at edge

Non-functional requirements

Scale: 200 PoPs across 100 countries. 10M concurrent connections per large PoP. 50 Tbps aggregate bandwidth. 100M requests/sec at peak. 99% cache hit rate target for static workloads.
Latency: Cache hit p99 < 30ms (TLS handshake + cache lookup + response). Cache miss p99 < origin's latency + 20ms overhead. Edge compute p99 < 50ms cold start.
Availability: 99.99% - PoP failures must not affect customer availability (anycast routing fails over). Origin failures degrade gracefully (stale-while-revalidate, custom error pages).
Consistency: Eventually consistent on cache invalidation - global purge p95 < 30 seconds. Customer config changes propagate p99 < 60 seconds. Origin reads are not transactionally consistent with cache writes.

Capacity estimation

Edge nodes

200 PoPs × 50-200 servers/PoP avg = ~20K edge servers globally.
Each server: 1.5TB NVMe cache, 256GB RAM, 100Gbps NIC.
Aggregate edge cache: ~30 PB.

Cache hit rate

Static (CSS/JS/images): 99%+. Most requests served from edge.
Dynamic HTML with short TTL: 85-95%.
API responses (cacheable): 70-90%.
Personalized: 0% (passthrough).

Bandwidth

50 Tbps aggregate egress at peak.
Per PoP: 100 Gbps - 1 Tbps depending on tier.
Origin egress (misses + shield refreshes): ~1-5% of edge bandwidth = 500 Gbps - 2.5 Tbps total.

Origin requests

Net of cache: 100M req/sec × 1% miss = 1M req/sec to origins. With shield consolidation, ~100K req/sec actually hits origins.

Storage tiers

L1 (in-memory hot): 256 GB / server, sub-ms lookup. Top 0.1% of objects.
L2 (NVMe SSD): 1.5 TB / server, ~1ms lookup. Bulk cache.
L3 (regional shield): per-region, larger pool. Reduces origin load when L1/L2 misses.

Edge compute

100K-1M concurrent V8 isolates per server. ~1-5ms warm invocation, 5-50ms cold.
WASM workers: similar profile, more language flexibility.

Cache key entropy

Static: URL only. Low entropy, high hit rate.
Personalized: URL + cookie / header. High entropy, low hit rate.
A balance: cache key normalization (strip irrelevant query params, vary on language only).

High-level architecture

The system has three concentric layers: edge PoPs (close to users), regional shield tiers (consolidate origin load), and the customer's origin (last resort).

A request from a user hits the nearest edge PoP via anycast. The edge tries L1 cache (RAM), then L2 (SSD). On miss, it forwards to the regional shield, which checks its cache. If the shield misses, it goes to origin.

The shield exists to avoid the thundering herd: 50 PoPs all missing on the same hot URL would create 50 origin requests. Routing them through 1 shield consolidates to 1.

Edge compute (Workers, Lambda@Edge) runs inline with the request: rewrite headers, A/B test, call backend APIs, render HTML. The function runs in a sandboxed isolate; cold start is sub-50ms because the runtime is preloaded.

The defining engineering challenge: every component must be coordinated across 200 globally distributed PoPs, with no central authoritative state on the request path. Configuration propagates eventually; cache invalidation propagates eventually; only DNS / anycast routing is "live".

Anycast / DNS routing

User's DNS resolves CDN hostname to the IP of the closest PoP. Anycast routes the IP to the nearest PoP via BGP. Failover on PoP outage is automatic (BGP withdraws the route).

Edge PoP (point of presence)

Set of servers in a metro. Terminates TLS, runs request pipeline (cache lookup, edge compute, origin fetch). Scales horizontally; load balanced via L4 hash.

Edge cache (L1 + L2)

L1: hot objects in RAM. L2: bulk on local NVMe. Hash-partitioned within the PoP across servers. Cache key derived from URL + selected vary headers.

Regional shield tier

Per-region cache layer between edges and origin. All origin requests from N edge PoPs in a region funnel through the shield. Reduces origin load 50-100x.

Origin connector

Persistent connection pool to origin. HTTP/2 + keep-alive. Adds origin auth (signed URLs, headers). Handles origin failures (retries, fallbacks).

Edge compute runtime

V8 isolates (Cloudflare Workers) or microVMs (Lambda@Edge). Sandboxed JavaScript / WASM execution. Cold start optimized via preloaded runtime.

Configuration distribution

Customer config (cache rules, edge functions) pushed from central control plane to every PoP within seconds. Eventually consistent; PoPs serve last-known-good during config service outages.

Invalidation propagation

Customer-triggered purge: invalidation event broadcast to all PoPs. Each PoP marks affected entries stale. Global propagation < 30s p95.

TLS termination + cert mgmt

Per-customer certs (uploaded or managed via ACME). Servers hold per-domain cert in memory; SNI selects on each connection.

DDoS / WAF layer

L3/4 DDoS scrubbing at PoP entry. L7 WAF rules (SQL injection, XSS, custom rules) before origin forward. Rate limits per customer, per IP.

Telemetry / observability

Per-request metrics (cache status, latency, errors) sampled and aggregated. Customer-facing analytics + internal ops monitoring.

Deep dives

The subsystems where the interview is actually decided. Skim if you're running short; own these if you want a strong signal.

1. Cache key design and the explosion of variants

Cache key design determines hit rate. Bad keys = low hits = high origin load = expensive.

The default key
URL (host + path + query string). Simple. Works for static content.

Query string handling:

Include all: utm_source=facebook variants of the same page each cache separately. Hit rate destroyed.
Strip all: caching wrong - if ?lang=fr and ?lang=es should serve different content, we're broken.
Include selected: explicitly list query params that affect content. The right answer.

Vary by header
HTTP Vary header tells the cache "this response depends on header X". Typical Vary values:

Accept-Encoding (gzip vs br vs none): yes, common, low cardinality.
Accept-Language: yes if you serve different languages.
User-Agent: very high cardinality - effectively defeats caching. Almost never the right answer.
Cookie: extreme cardinality - every authenticated user gets their own cache entry. Avoid; use cache bypass for personalized.

Cache key normalization
The CDN can normalize the key before lookup:

Strip tracking params (utm_*, fbclid).
Group user agents into buckets (mobile / desktop / bot).
Normalize accept-language to top-level (en-US → en).

This recovers hit rate without losing functionality. Configurable per customer.

Cookie handling
Cookies are user-specific. Three patterns:

Bypass cache when cookies present: simple, correct, low hit rate for logged-in users.
Strip irrelevant cookies (analytics) before cache lookup: keeps hits for read-only flows.
Vary on auth cookie only: caches per-session content; storage cost rises sharply.

Cache key per device class
Mobile vs desktop responses differ. Vary on a device-class header (computed from User-Agent at edge, not the raw UA). 2-3 device classes vs millions of UA strings.

The cardinality budget
A URL with N variants stores N cache entries. Each is independently warm/cold. For a moderately personalized URL, 10x variants drops hit rate to 1/10.

Calculate the variant count before adding a vary axis. "Vary on user_id" sounds reasonable until you realize it's 10M variants.

2. Cache invalidation: purge, tag, and stale-while-revalidate

"There are only two hard things in computer science: cache invalidation and naming things." Phil Karlton

Time-based expiration (TTL)
Each entry has a TTL. After expiry, the entry is stale; next request triggers a revalidation (conditional GET to origin).

Pros: simple, predictable, no coordination.
Cons: stale content for up to TTL after content actually changes.

Explicit purge
Customer calls API: PURGE /path. CDN broadcasts to all PoPs. Each PoP marks the entry as immediately invalid.

Mechanics:

API → control plane → message bus → PoP receives purge → invalidate entry.
Propagation: p95 < 5s globally on modern CDNs.
Race: a request in-flight at moment of purge may serve old. Acceptable.

Surrogate keys / tags
Each cached entry tagged with one or more keys (e.g., "product-1234", "category-electronics"). Customer purges by tag, not URL. The CDN finds all entries with that tag and invalidates them.

Use case: "I just updated product 1234; invalidate every page that displays it" without knowing every URL.

Implementation: per-PoP inverted index from tag → list of cache entries. Purge looks up the index, marks entries.

Stale-while-revalidate (SWR)
After TTL expires, the entry is "stale". Instead of blocking on revalidation, the CDN serves the stale entry to the user immediately AND fetches fresh from origin in the background. Next request gets the fresh.

Standard HTTP cache directive (Cache-Control: max-age=60, stale-while-revalidate=300). Hit rate stays high; users see fresh content with one-request delay.

Stale-if-error
If origin is down, serve stale entries beyond TTL. Customer can configure an outer staleness window (e.g., max-stale=86400). Their site stays up even if their backend dies.

Versioned URLs (the cleanest invalidation)
URLs include a content hash: app.a3f8c.js. New content = new URL = new cache entry. Old URL stays cached forever (or until evicted) and is correct. No purge needed.

Used everywhere webpack-style asset pipelines exist. Eliminates invalidation as a problem entirely for static assets.

Cache lifecycle

Fresh: served from cache, no origin contact.
Stale: TTL expired. Behavior depends on directives:
- SWR: serve and revalidate in background.
- Default: revalidate before serving (block).
- Stale-if-error: serve when origin is down.
Purged: explicit purge. Refetch on next request.
Evicted: LRU/LFU pushed it out of the cache. Refetch on next request.

The thundering herd on purge
A purge of a hot URL causes every subsequent request to miss simultaneously. 100K req/sec × instant purge = 100K origin requests at once.

Mitigation:

Origin shield consolidates them to 1 origin request.
Single-flight at the PoP: first miss fetches; subsequent misses for the same URL piggyback on the in-flight fetch.

Both standard in production CDNs.

3. Origin shield and request consolidation

Without a shield, every PoP handles its own misses independently. With a shield, regional consolidation cuts origin load 50-100x.

The problem
200 PoPs each cache the same URL. A new release invalidates the URL. The next request at each PoP misses. 200 origin fetches for the same URL.

The shield layer
Each PoP, on miss, doesn't fetch origin directly. It fetches from a shield - typically a single PoP per region designated as the consolidation point. The shield has its own cache.

If shield has a cache hit, it serves the PoP. Origin sees no traffic.
If shield has a miss, it fetches from origin once. Other PoPs requesting the same URL hit the shield's cache (or wait for the in-flight fetch).

Regional fanout: 50 PoPs → 1 shield → 1 origin request. 50x reduction.

Shield placement
Customer-configurable: pick the shield region closest to your origin. Adds one hop for non-shield PoPs but consolidates origin load dramatically. Most customers enable this for any origin under load.

Single-flight at shield
Multiple in-flight requests for the same URL at the shield share one origin fetch. Prevents post-purge thundering.

Cache poisoning at shield
If origin returns a bad response, the shield caches it; every PoP gets the bad response. Mitigation:

Validate responses (status code, content type) before caching.
Customer-controlled "do not cache" responses for specific status codes.
Soft TTLs - stale entries revalidated more aggressively.

Two-tier vs three-tier
Two-tier: edge PoPs + origin.
Three-tier: edge PoPs + regional shield + origin.

Three-tier is standard for high-traffic customers. Adds ~5-15ms latency on misses for the shield hop; origin offload more than compensates.

4. Edge compute: V8 isolates vs containers vs WASM

Edge compute lets customer code run inline with the request. Cold start latency is the make-or-break number.

The cold-start problem
Lambda's classic cold start: 200-500ms (container provision + runtime init + customer code load). Acceptable for backend invokes; brutal at edge where you may add 200ms to every cold path.

V8 isolates (Cloudflare Workers)
Each customer's worker runs in a V8 isolate - a JavaScript heap inside a shared V8 process. Isolates are lightweight (~5MB each). One process can hold thousands.

Cold start: ~5ms. Effectively no cold start because the V8 process is always running; spinning up a new isolate is cheap.

Trade-off: limited to JavaScript / WASM. Filesystem-isolated; no native code. Can't import arbitrary Node packages.

MicroVMs (Lambda@Edge, Firecracker)
Each invocation runs in a microVM (Firecracker, ~125ms boot). More isolation than V8 isolates - kernel-level, not language-level.

Cold start: 50-200ms. Better than container Lambda; worse than isolates.

Trade-off: full Linux container support (any language). Higher resource cost per invocation.

WASM at edge (Fastly Compute@Edge, Cloudflare Workers)
Customer compiles to WASM; CDN runs the module in a sandboxed runtime. Multi-language (Rust, AssemblyScript, Go, etc.).

Cold start: ~5-20ms. Comparable to V8 isolates.

Trade-off: still maturing. Some libraries don't compile cleanly. Async I/O models differ across runtimes.

Use cases by latency tolerance

Header rewrite, A/B routing, simple auth: V8 isolates (sub-10ms).
HTML rendering, image transformation: any of the three.
Long-running compute, batch: edge is the wrong place; use central regions.

Pricing models

Cloudflare Workers: per-request + per-CPU-time. ~$0.50/M requests.
Lambda@Edge: per-request + per-100ms. ~$0.60/M requests + compute.

Cheap at scale; can be the largest line item for high-traffic edge-compute-heavy customers.

The KV story
Edge compute often needs state - read a session, cache a value, store a counter. CDNs offer eventually-consistent KV stores at edge (Workers KV, DynamoDB Global Tables). Reads <10ms; writes propagate globally in seconds.

Strong consistency at edge is hard - it requires consensus per write, which costs the latency advantage. Most edge KVs are eventually consistent and tell customers explicitly.

The "edge databases" hype
Durable Objects (Cloudflare), D1, etc. promise transactional state at edge. Reality: they pin a key range to one location and route requests there - effectively "state in a single region, fronted by edge code". Useful but not magic.

5. TLS, certificates, and SNI at scale

Every request hits the edge over HTTPS. Cert management at 10M domains is its own engineering problem.

Per-customer certs
Customer brings their own cert and key. Stored encrypted at rest. Distributed to every PoP. Loaded into memory at startup.

Storage: 10M customers × 5KB cert + 2KB key = 70 GB. Fits in RAM per server (just barely); typically held in a per-server hot index keyed by SNI hostname.

SNI (Server Name Indication)
TLS extension where the client sends the hostname before cert exchange. Server uses the hostname to pick the right cert.

Without SNI: one cert per IP. Doesn't scale.
With SNI (universal modern client support): one IP serves millions of domains.

Managed certs (ACME / Let's Encrypt)
Customer points DNS at the CDN; the CDN provisions certs automatically via Let's Encrypt or its own ACME flow. Typical: cert valid for 90 days, renewed at 60 days.

Scale challenge: 10M domains × renewal = 100K renewals/day. Each renewal involves:

DNS / HTTP challenge to prove ownership.
Submission to Let's Encrypt.
Cert distribution to all PoPs.

Standard pipeline: cron-driven, idempotent, retry-on-failure. Slow renewals don't break customers (cert valid for another 30 days).

Cert storage and rotation
Distributing a cert update to 200 PoPs is like distributing config. Eventually consistent. Within 60s p99 acceptable.

Stale cert risk: if the customer's old cert is still served briefly during rotation, no harm (cert is still valid). If the new cert hasn't propagated and customer revokes the old cert externally, we serve a revoked cert briefly. Mitigation: customer can't revoke through us without rotation completing.

TLS 1.3 and 0-RTT
Modern TLS. 1-RTT handshake (down from 2-RTT in TLS 1.2). 0-RTT for resumption (data sent in the first packet).

0-RTT is unsafe for non-idempotent requests (replay attacks). CDNs typically allow 0-RTT for GET only, not POST.

OCSP / CRL
Cert revocation. OCSP stapling: server attaches a fresh OCSP response to the TLS handshake. Avoids client-side OCSP fetches.

CDN handles OCSP for customer certs. Periodic refresh from cert issuer; embedded in handshakes.

Termination cost
TLS handshake is CPU-heavy. Modern hardware offloads (AES-NI, AVX). A 10K-conn/sec server uses ~30% CPU on TLS alone. Scale by adding servers; rarely a bottleneck on tier-2 PoPs.

TLS termination then re-encrypt to origin
By default, the CDN terminates TLS at edge and connects to origin over TLS again (with the customer's origin cert). Two TLS sessions per request - the first user-visible, the second internal.

Cost: 2x handshakes. Mitigation: persistent HTTPS connections to origin, reused across requests.

6. DDoS, WAF, and bot management at edge

Every CDN doubles as the customer's first line of defense. Edge filtering blocks attacks before they reach the origin.

L3/L4 DDoS
Volumetric attacks: SYN floods, UDP floods, amplification (DNS, NTP, memcached reflection).

Mitigations:

BGP-level: customer's traffic is anycast to many PoPs; an attacker overwhelms one PoP, traffic shifts to others.
Per-IP rate limiting.
SYN cookies (no state for half-open connections).
Drop traffic that fails basic protocol checks (malformed packets, weird flags).

CDNs absorb terabits/sec. The 2.5 Tbps Cloudflare attack in 2022 was mitigated globally without customer impact.

L7 (application-layer) DDoS
Slowloris (slow HTTP requests holding connections), HTTP flood (many GETs).

Mitigations:

Connection limits per IP (5-100 concurrent).
Request rate limits per IP, per customer, per route.
Challenge pages (CAPTCHA, JavaScript challenge) for suspicious traffic.

WAF rules
Customer-defined rules: block requests matching SQL-injection patterns, XSS payloads, known attack signatures. CDN ships managed rule sets (OWASP Core Rule Set, customer-specific).

Performance: regex matching at line rate. Hyperscan or similar for high-throughput regex.

Bot management
Distinguish humans from bots. Signals:

TLS fingerprint (JA3): legit clients have characteristic handshakes; many bots don't match.
HTTP/2 fingerprint: similar.
JS challenge: send a small JS that solves a puzzle; humans pass via browser, bots without a real engine fail.
Behavioral: mouse movement, time between requests.

Tiers:

Allow good bots (Googlebot, Bingbot - verified by reverse DNS).
Challenge unknown.
Block known-bad (lists of known scraper IPs).

Per-customer policies
Some customers want strict bot blocking (e-commerce checkout); others want all bots allowed (public APIs). Configurable per-route.

The false-positive problem
A WAF rule that's too aggressive blocks legitimate users. Customer support tickets spike. Production discipline:

Rules launched in monitor-only first; alarm on hits, no blocking.
Promote to block once false-positive rate is acceptably low.
Customer-visible audit log of blocks.

Origin protection
Even with edge filtering, the CDN must protect the origin's IP. Customers should not expose origin IPs publicly. CDN-only access via:

Origin firewall: only accept traffic from CDN IP ranges.
Authenticated origin pulls: CDN signs requests; origin verifies.
Private connectivity: AWS PrivateLink, etc.

If an attacker discovers the origin IP, they can bypass the CDN entirely. Treat origin IP as a secret.

Trade-offs

TTL high vs low
High TTL: better hit rate, more stale content. Low TTL: fresher content, more origin load. Pick by content volatility; use SWR to bridge.

Cache key entropy vs hit rate
Every vary axis multiplies cache entries. Vary only on what actually affects the response. Strip query params and headers that don't matter.

Two-tier vs three-tier (with shield)
Three-tier (with shield) cuts origin load 50-100x at cost of one extra hop on misses. Worth it for any nontrivial origin.

V8 isolates vs microVMs vs WASM
Isolates: fastest cold start, JS-only. MicroVMs: slower cold start, full Linux. WASM: fast and multi-language, ecosystem still maturing. Pick by language requirements and cold-start sensitivity.

Anycast vs DNS-based routing
Anycast: instant failover, shared IP across PoPs, complex to operate. DNS: simpler, slower failover (TTL-bound). Modern CDNs use anycast; smaller operators sometimes start with DNS.

Strong vs eventual consistency for edge KV
Strong: consensus per write, slow. Eventual: fast, requires app to handle staleness. Most edge KVs are eventual; advertise it explicitly.

Per-customer certs vs SAN certs
Per-customer: max flexibility, storage cost. SAN (one cert with N domains): cheaper storage, harder to manage rotations and per-customer deletes. Per-customer is the standard at scale.

Block vs challenge in WAF
Block: simple, hard on false positives. Challenge (CAPTCHA, JS): users can pass, bots usually can't, friendlier on false positives. Best-of-breed CDNs use both.

Verify origin via header vs mTLS vs IP allowlist
Header (signed): simple, key rotation needed. mTLS: strong, harder ops. IP allowlist: simple but exposes origin if list leaks. mTLS is the gold standard.

Customer-configurable everything vs sane defaults
Sane defaults serve 90% of customers; configurable knobs serve the remaining 10%. Surface the simple defaults prominently; advanced settings deeper.

Common follow-up questions

Be ready for at least three of these. The first one is almost always asked.

?How would you implement a global cache purge that completes in < 5 seconds?
?What changes if 50% of your traffic is suddenly authenticated/personalized?
?How do you design for a customer whose origin is behind a corporate firewall?
?What's your strategy when one PoP is being targeted by a 1 Tbps attack?
?How would you offer strongly-consistent KV at edge without losing the latency advantage?
?How do you migrate a customer from your competitor's CDN with zero downtime?
?What's your story for serving traffic from China?
?How do you charge customers fairly when one cache hit serves 100 users?

Companies that test this topic

Practice in interview format

Reading is the floor. The interview signal is in walking through this live with someone probing follow-ups. Use the AI mock interview to practice talking through requirements, architecture, and trade-offs out loud.

Start an AI mock interview →