Caching CACHE · 06 · 01

Stale-while-revalidate: serve stale now, refresh in the background

SWR decouples freshness from latency at two layers — the HTTP Cache-Control directive and the client data-fetching model. You trade bounded staleness for an erased p99 revalidation spike, because no request ever blocks on the origin.

CACHE Junior ◷ 16 min

Level

FoundationsJuniorMiddleSenior

Already know this unit? Take a 1-minute quick check →

A pricing API has max-age=60. p50 latency is 8ms — gorgeous on the dashboard. But every 60 seconds the entry expires, and the next request blocks on a 400ms origin round-trip to refresh it. With thousands of req/s, that “next request” is hundreds of users, every minute, sampled straight into your p99. The graph is a sawtooth: flat at 8ms, then a 400ms spike on the dot of every TTL boundary. The team spent a sprint chasing a “slow origin” that was never slow — it was just synchronously revalidating in the hot path of a real user.

The sawtooth is the cost of synchronous revalidation

Plain TTL caching has one cruel property: the moment an entry expires, someone pays the full origin cost to refill it. With max-age alone, that someone is a live user — the unlucky request that arrives first after expiry blocks until the origin answers. Your steady-state latency is the cache-hit time; your tail latency is the cache-miss time; and a TTL guarantees you periodically sample the miss into real traffic. The higher your request rate, the more users hit each boundary, the fatter your p99.

Stale-while-revalidate breaks the link. When an entry crosses its freshness window but is still inside the SWR window, the cache does two things at once: it returns the stale value immediately (the user gets the cache-hit latency, ~8ms) and it kicks off an asynchronous refresh against the origin. The user who triggered the refresh never waits for it. The fresh value lands a few hundred milliseconds later and serves the next requests. Revalidation still happens — it just stopped happening in the foreground.

The trade you are making is explicit and bounded: you accept that some responses are slightly stale in exchange for deleting the p99 spike. Staleness is capped by your SWR window; latency is capped by your cache-hit time. A senior reads stale-while-revalidate=N as “I am fine with data up to N seconds past its freshness deadline, as long as no human waits on the refresh.”

RFC 5861: two windows, then stale-if-error as the floor

This is a real HTTP directive, standardised in RFC 5861, and it works in CDNs and browsers. Two Cache-Control extensions matter:

stale-while-revalidate=N — for N seconds after the entry goes stale, serve it immediately and revalidate in the background.
stale-if-error=N — for N seconds after it goes stale, if the origin returns an error (500/502/503/504) or is unreachable, keep serving the stale copy instead of propagating the error.

A production header reads like a layered defence:

Cache-Control: max-age=60, stale-while-revalidate=300, stale-if-error=86400

Read it as three nested windows. 0–60s: fresh, served directly. 60–360s: stale, served instantly while a background refresh runs (the SWR window). on origin failure, up to 24h: still served from the last good copy rather than throwing a 503 at the user (the SIE floor). The first window optimises freshness, the second kills tail latency, the third turns an outage into degraded-but-up.

Each directive opens one nested window: max-age guards freshness, the SWR window kills tail latency, stale-if-error survives an outage — past all three, the cache must fail loudly.

Strategy	Who pays the origin round-trip	Max staleness	p99 at TTL boundary
`max-age=60` only	A live user, every 60s	0 (always fresh)	Spikes to origin latency
`max-age=60, swr=300`	A background fetch — no user	Up to 60+300s	Flat (no foreground miss)
SWR + `stale-if-error`	Background; none during outage	Up to the SIE window	Flat, survives origin down

▸Why this works

The names are mirror images. stale-while-revalidate is the happy-path tool — the origin is fine, you just don’t want users to wait for the refresh. stale-if-error is the failure-path tool — the origin is sick, and stale is better than a 503. They are independent: you can ship one without the other, but pairing them is the senior default, because SWR alone still hands the user an error once the origin breaks and the SWR window lapses.

The same idea, one layer up: SWR/React Query on the client

The HTTP directive lives in the network. The exact same pattern lives in client data-fetching libraries — the library SWR is literally named after it, and React Query (TanStack Query) implements the same model. When a component asks for a key, the library returns the cached value from the last fetch immediately (no spinner if it has seen the key before), then fires a background request and swaps in the fresh data when it arrives. The user sees instant content that quietly self-corrects.

The client model adds two production-grade behaviours the raw header doesn’t:

Request coalescing / single-flight. If two components call useSWR("/api/user") in the same tick, SWR makes one request and both get the same result. The default dedupingInterval is 2000ms — repeat calls for the same key inside 2s are collapsed onto the in-flight promise. React Query does the same; its analogous staleTime defaults to 0 (refetch eagerly) versus SWR’s 2s.
Revalidation triggers. Both libraries revalidate on window focus and on network reconnect by default, so a tab left open for an hour shows fresh data the instant you click back into it — again, served stale-first, refreshed in the background.

Whether it is a CDN edge node or a React hook, the contract is identical: return what you have now, fetch what’s correct next, never block the human on the refresh.

Pick the best fit

A product-listing API behind a CDN sees the classic per-minute p99 sawtooth from synchronous revalidation. Catalogue data can be a few minutes stale without harm. Pick the header.

Where it bites: stampedes, unbounded staleness, and auth

SWR is not free of failure modes — it relocates them. Three bite in production. When you ship a stale-while-revalidate header, these are the three places that will eventually find you.

The background-refresh stampede. A naive edge fires one background revalidation per request that finds the entry stale. If a popular key expires while it’s taking 10k req/s, you can launch thousands of simultaneous refreshes at an origin that expected one — a self-inflicted thundering herd, except now it’s background fetches melting the origin instead of users seeing latency. The fix is single-flight: the first stale request triggers exactly one refresh (Cloudflare marks it UPDATING), and everyone else is served stale until it lands. Add jitter to refresh timing so many keys don’t expire and revalidate in lockstep.

Unbounded staleness when the origin is down. This is the classic stale-if-error mistake in reverse. If the origin is unreachable, a background revalidation fails — and a poorly built cache may then keep extending the stale copy forever, “cache until heat death.” You need a hard ceiling: a finite SWR window, a finite stale-if-error window. Past those, the cache must fail loudly rather than serve content of unknown age. Staleness must always be bounded.

Serving stale auth and permissions. This is the dangerous one. SWR is correct for content that is fine a few minutes old — a product list, a blog post, an analytics tile. It is wrong for authorization decisions, feature flags that gate access, or anything where stale = a security regression. Serve a revoked user a 5-minute-stale “you still have access” and you have shipped a vulnerability. The senior rule: never SWR a permission check. Per-user, security-sensitive responses get short, synchronous, validated freshness — or they aren’t cached at all.

Quiz

With `Cache-Control: max-age=60, stale-while-revalidate=300`, a request arrives 90 seconds after the entry was cached. What does the user get?

Quiz

A popular key expires under heavy traffic and your edge fires one background revalidation per stale-hit. What's the production risk, and the fix?

Order the steps

Order what happens to a request that arrives during the stale-while-revalidate window:

1 The cache finds the entry is past max-age but still inside the SWR window
2 It returns the stale cached value to the user immediately — no origin wait
3 It triggers exactly one background revalidation against the origin (single-flight)
4 The fresh response arrives and replaces the cached entry out of band
5 Subsequent requests are served the fresh value (or stale again, if it lapsed)

The user who triggers the refresh never blocks on the origin; the background fetch lands out of band and serves the next request.

Recall before you leave

01
Explain why max-age alone produces a p99 sawtooth, and how adding stale-while-revalidate flattens it.
02
When is stale-while-revalidate the wrong tool, and what failure modes must a senior guard against?

Recap

Stale-while-revalidate decouples freshness from latency: instead of letting a live user pay the origin round-trip every time a TTL expires, the cache returns the stale value instantly and revalidates in the background, so the p99 sawtooth of synchronous revalidation flattens. It lives at two layers — the HTTP Cache-Control: stale-while-revalidate=N directive (RFC 5861, honoured by CDNs and browsers), and client libraries like SWR and React Query that return last-known data immediately, coalesce duplicate requests via dedupingInterval, and revalidate on focus and reconnect. The trade is explicit: bounded staleness for erased tail latency. Pair it with stale-if-error so an origin outage degrades gracefully instead of throwing 503s. Then mind the three failure modes: single-flight the background refresh or a hot key triggers a thundering herd, cap staleness with finite windows or a down origin serves content of unknown age forever, and never apply SWR to auth, permissions, or anything where stale data is a security bug. Now when you see a sawtooth in your p99 graph, your first instinct should be to check whether the cache is forcing a live user to pay for each revalidation — and whether a single directive can move that cost to the background instead.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 6 done

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.