Caching CACHE · 03 · 04

Stale-while-revalidate and CDN request coalescing

RFC 5861''''s stale-while-revalidate directive returns the stale cached value immediately while refreshing in the background — eliminating the wait entirely. CDNs extend this with request coalescing that forwards exactly one origin fetch per cache miss.

CACHE Middle ◷ 12 min

Level

FoundationsJuniorMiddleSenior

A lock-based cache gives 10,000 waiting requests one of two things: the rebuilt value (after waiting 400 ms) or a fallback. Stale-while-revalidate gives all 10,000 waiting requests the old value immediately — and queues exactly one background refresh. Zero waiting, zero DB spike.

RFC 5861: the standard

Before you reach for SWR, ask yourself: does my data tolerate being up to N seconds stale? If yes, SWR gives you zero-wait TTL boundaries for free — no lock infra, no probabilistic code, just one header directive.

RFC 5861 (2010) defines two Cache-Control extensions:

stale-while-revalidate=N — serve the stale (expired) cached response for up to N seconds while initiating a background revalidation. The user sees a response without waiting.
stale-if-error=N — serve the stale cached response for up to N seconds if revalidation fails (5xx, timeout). Keeps the site available during origin failures.

Example header:

Cache-Control: max-age=60, stale-while-revalidate=30, stale-if-error=3600

This means: fresh for 60 s, serve stale for an additional 30 s while refreshing in the background, serve stale for up to 1 hour if the origin errors.

What happens at TTL expiry

With SWR enabled on a cache (CDN or application-level):

T=60.0s — TTL fires. Cache key is now “stale but within stale-while-revalidate window”.
Requests 1–N all arrive at T=60.001s.
All N requests immediately receive the stale value. No waiting.
Exactly one background refresh is queued (the cache picks the first request, or uses a separate background goroutine/task).
T=60.4s — background refresh completes. New value stored.
Future requests get the fresh value.

DB load at the boundary: 1 query, not N.

Mitigation	User wait at TTL boundary	DB queries at boundary
None (naïve TTL)	None — but DB falls over	N concurrent
Lock only	Up to rebuild p99 (waiters queue)	1 (serialised)
Single-flight	Up to rebuild p99 (subscribers wait)	1 per node
XFetch	None — cache never expires under traffic	~1 (early rebuild)
SWR	None — stale served immediately	1 (background)

SWR is the only quadrant that pays neither cost: every reader gets an instant response and the DB sees exactly one rebuild. Locks trade wait for safety; naive TTL trades the DB for speed.

The tradeoff: bounded staleness

SWR explicitly accepts that readers will see stale data for up to the stale-while-revalidate duration after the max-age expires. This is fine for:

Content pages, news feeds, product listings
Homepage hero banners, navigation menus
Any data where a 30–300 s lag is invisible to users

It is wrong for:

Account balance, vote counts, anything that affects business decisions in real time
Anything where two users must see consistent state simultaneously

CDN-level: request coalescing

CDNs extend SWR with request coalescing (Cloudflare) or request collapsing (Fastly). When a cache miss arrives at the edge:

The edge enters a “stitching” state — it has issued one upstream fetch and is waiting for the response.
Any additional requests for the same path while in “stitching” state do not generate additional upstream fetches.
All waiting requests receive the response simultaneously when the single upstream fetch completes.

A viral content event with 10 million viewers hitting one URL produces one origin fetch, not 10 million. Both Cloudflare and Fastly publish that request coalescing turns sudden-traffic incidents into low-impact events at the origin.

Framework-level: Next.js ISR

Next.js Incremental Static Regeneration (ISR) is SWR at the framework level. A page configured with revalidate: 60 is served from cache for 60 seconds; the first request after the revalidate window triggers a background regeneration while the stale page continues serving. The shape is identical to RFC 5861 — the framework just implements it without requiring HTTP Cache-Control headers.

▸Why this works

Apollo’s GraphQL caching uses SWR semantics for normalised cache entries. A query result is served from the normalised cache while a background refetch reconciles any out-of-date fields. The same principle extends to gRPC response caching and even DNS TTLs — the pattern “serve stale, refresh in background” is universal wherever TTL-based caches exist.

Quiz

At T=60.001 s, 2,000 requests arrive for a key with TTL=60 s and stale-while-revalidate=30 s. How many of them wait for the rebuild to complete?

Quiz

Which use case is UNSUITABLE for stale-while-revalidate?

Quiz

Cloudflare request coalescing fires at a CDN edge during a viral traffic event. 50,000 simultaneous requests arrive for the same URL that just expired. How many upstream origin requests are made?

Every reader gets the stale value with zero wait; the cache fires exactly one background refresh to the origin. DB load at the boundary is 1, not N.

Recall before you leave

01
What does the HTTP header Cache-Control: max-age=60, stale-while-revalidate=30, stale-if-error=3600 mean in practice?
02
How does Next.js ISR implement the same guarantee as RFC 5861 stale-while-revalidate?

Recap

Stale-while-revalidate (RFC 5861) eliminates waiter queues at TTL boundaries by returning the stale cached value immediately to all requests and triggering exactly one background refresh. The user latency at the boundary drops to zero; DB load drops to one rebuild query. CDN-level request coalescing extends the same principle globally: the edge issues one origin fetch per cache miss event regardless of concurrent request count. The tradeoff is explicit bounded staleness — acceptable for content, unsuitable for strongly consistent business data. Compose SWR at the CDN edge with XFetch or a distributed lock at the application cache layer for defence-in-depth at every tier. Now when you review a caching design, check whether SWR staleness is acceptable first — if it is, you get zero-wait boundaries for the cost of one header directive.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 5 done

Connected lessons

builds on

XFetch: coordination-free probabilistic early expirationmiddle

unlocks

Detecting stampedes and designing TTL for productionmiddle

deepens into

Detecting stampedes and designing TTL for productionmiddle

appears again in228

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.