Caching
Stale-while-revalidate: serve stale now, refresh in the background
A pricing API has max-age=60. p50 latency is 8ms — gorgeous on the dashboard. But every 60 seconds the entry expires, and the next request blocks on a 400ms origin round-trip to refresh it. With thousands of req/s, that “next request” is hundreds of users, every minute, sampled straight into your p99. The graph is a sawtooth: flat at 8ms, then a 400ms spike on the dot of every TTL boundary. The team spent a sprint chasing a “slow origin” that was never slow — it was just synchronously revalidating in the hot path of a real user.
The sawtooth is the cost of synchronous revalidation
Plain TTL caching has one cruel property: the moment an entry expires, someone pays the full origin cost to refill it. With max-age alone, that someone is a live user — the unlucky request that arrives first after expiry blocks until the origin answers. Your steady-state latency is the cache-hit time; your tail latency is the cache-miss time; and a TTL guarantees you periodically sample the miss into real traffic. The higher your request rate, the more users hit each boundary, the fatter your p99.
Stale-while-revalidate breaks the link. When an entry crosses its freshness window but is still inside the SWR window, the cache does two things at once: it returns the stale value immediately (the user gets the cache-hit latency, ~8ms) and it kicks off an asynchronous refresh against the origin. The user who triggered the refresh never waits for it. The fresh value lands a few hundred milliseconds later and serves the next requests. Revalidation still happens — it just stopped happening in the foreground.
The trade you are making is explicit and bounded: you accept that some responses are slightly stale in exchange for deleting the p99 spike. Staleness is capped by your SWR window; latency is capped by your cache-hit time. A senior reads stale-while-revalidate=N as “I am fine with data up to N seconds past its freshness deadline, as long as no human waits on the refresh.”
RFC 5861: two windows, then stale-if-error as the floor
This is a real HTTP directive, standardised in RFC 5861, and it works in CDNs and browsers. Two Cache-Control extensions matter:
stale-while-revalidate=N— for N seconds after the entry goes stale, serve it immediately and revalidate in the background.stale-if-error=N— for N seconds after it goes stale, if the origin returns an error (500/502/503/504) or is unreachable, keep serving the stale copy instead of propagating the error.
A production header reads like a layered defence:
Cache-Control: max-age=60, stale-while-revalidate=300, stale-if-error=86400Read it as three nested windows. 0–60s: fresh, served directly. 60–360s: stale, served instantly while a background refresh runs (the SWR window). on origin failure, up to 24h: still served from the last good copy rather than throwing a 503 at the user (the SIE floor). The first window optimises freshness, the second kills tail latency, the third turns an outage into degraded-but-up.
| Strategy | Who pays the origin round-trip | Max staleness | p99 at TTL boundary |
|---|---|---|---|
max-age=60 only | A live user, every 60s | 0 (always fresh) | Spikes to origin latency |
max-age=60, swr=300 | A background fetch — no user | Up to 60+300s | Flat (no foreground miss) |
SWR + stale-if-error | Background; none during outage | Up to the SIE window | Flat, survives origin down |
Why this works
The names are mirror images. stale-while-revalidate is the happy-path tool — the origin is fine, you just don’t want users to wait for the refresh. stale-if-error is the failure-path tool — the origin is sick, and stale is better than a 503. They are independent: you can ship one without the other, but pairing them is the senior default, because SWR alone still hands the user an error once the origin breaks and the SWR window lapses.
The same idea, one layer up: SWR/React Query on the client
The HTTP directive lives in the network. The exact same pattern lives in client data-fetching libraries — the library SWR is literally named after it, and React Query (TanStack Query) implements the same model. When a component asks for a key, the library returns the cached value from the last fetch immediately (no spinner if it has seen the key before), then fires a background request and swaps in the fresh data when it arrives. The user sees instant content that quietly self-corrects.
The client model adds two production-grade behaviours the raw header doesn’t:
- Request coalescing / single-flight. If two components call
useSWR("/api/user")in the same tick, SWR makes one request and both get the same result. The defaultdedupingIntervalis 2000ms — repeat calls for the same key inside 2s are collapsed onto the in-flight promise. React Query does the same; its analogousstaleTimedefaults to 0 (refetch eagerly) versus SWR’s 2s. - Revalidation triggers. Both libraries revalidate on window focus and on network reconnect by default, so a tab left open for an hour shows fresh data the instant you click back into it — again, served stale-first, refreshed in the background.
Whether it is a CDN edge node or a React hook, the contract is identical: return what you have now, fetch what’s correct next, never block the human on the refresh.
A product-listing API behind a CDN sees the classic per-minute p99 sawtooth from synchronous revalidation. Catalogue data can be a few minutes stale without harm. Pick the header.
Where it bites: stampedes, unbounded staleness, and auth
SWR is not free of failure modes — it relocates them. Three bite in production.
The background-refresh stampede. A naive edge fires one background revalidation per request that finds the entry stale. If a popular key expires while it’s taking 10k req/s, you can launch thousands of simultaneous refreshes at an origin that expected one — a self-inflicted thundering herd, except now it’s background fetches melting the origin instead of users seeing latency. The fix is single-flight: the first stale request triggers exactly one refresh (Cloudflare marks it UPDATING), and everyone else is served stale until it lands. Add jitter to refresh timing so many keys don’t expire and revalidate in lockstep.
Unbounded staleness when the origin is down. This is the classic stale-if-error mistake in reverse. If the origin is unreachable, a background revalidation fails — and a poorly built cache may then keep extending the stale copy forever, “cache until heat death.” You need a hard ceiling: a finite SWR window, a finite stale-if-error window. Past those, the cache must fail loudly rather than serve content of unknown age. Staleness must always be bounded.
Serving stale auth and permissions. This is the dangerous one. SWR is correct for content that is fine a few minutes old — a product list, a blog post, an analytics tile. It is wrong for authorization decisions, feature flags that gate access, or anything where stale = a security regression. Serve a revoked user a 5-minute-stale “you still have access” and you have shipped a vulnerability. The senior rule: never SWR a permission check. Per-user, security-sensitive responses get short, synchronous, validated freshness — or they aren’t cached at all.
With `Cache-Control: max-age=60, stale-while-revalidate=300`, a request arrives 90 seconds after the entry was cached. What does the user get?
A popular key expires under heavy traffic and your edge fires one background revalidation per stale-hit. What's the production risk, and the fix?
Order what happens to a request that arrives during the stale-while-revalidate window:
- 1 The cache finds the entry is past max-age but still inside the SWR window
- 2 It returns the stale cached value to the user immediately — no origin wait
- 3 It triggers exactly one background revalidation against the origin (single-flight)
- 4 The fresh response arrives and replaces the cached entry out of band
- 5 Subsequent requests are served the fresh value (or stale again, if it lapsed)
- 01Explain why max-age alone produces a p99 sawtooth, and how adding stale-while-revalidate flattens it.
- 02When is stale-while-revalidate the wrong tool, and what failure modes must a senior guard against?
Stale-while-revalidate decouples freshness from latency: instead of letting a live user pay the origin round-trip every time a TTL expires, the cache returns the stale value instantly and revalidates in the background, so the p99 sawtooth of synchronous revalidation flattens. It lives at two layers — the HTTP Cache-Control: stale-while-revalidate=N directive (RFC 5861, honoured by CDNs and browsers), and client libraries like SWR and React Query that return last-known data immediately, coalesce duplicate requests via dedupingInterval, and revalidate on focus and reconnect. The trade is explicit: bounded staleness for erased tail latency. Pair it with stale-if-error so an origin outage degrades gracefully instead of throwing 503s. Then mind the three failure modes: single-flight the background refresh or a hot key triggers a thundering herd, cap staleness with finite windows or a down origin serves content of unknown age forever, and never apply SWR to auth, permissions, or anything where stale data is a security bug.