Crux Read real response headers, a Cache-Control line, an edge-worker handler, and a CDN trace, then pick the behaviour or the highest-leverage fix.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at senior altitude — in orbit
◷ 14 min
CDN problems are diagnosed in response headers, Cache-Control lines, and edge logs — not in prose. Read each artefact, predict what the cache and the worker actually do, then choose the fix a senior engineer reaches for first.
Goal
Practise the loop you run in every CDN incident: read the headers and the trace, predict cacheability and hit rate, and pick the highest-leverage change before touching routing or origin.
Hit rate on this cacheable HTML sits near 5%. Reading only these headers, what is the dominant problem and the fix?
Heads-up A year-long TTL on mutable HTML is wrong, and TTL is not why hit rate collapsed. The cache key is fragmented by Vary: User-Agent + Cookie — that is what destroys reuse.
Heads-up A single MISS is just this one cold request; the response IS cacheable (public, max-age). The chronic 5% hit rate comes from per-request cache-key fragmentation via Vary, not from caching being off.
Heads-up age: 0 simply means this is a fresh MISS being populated. Shortening TTL would lower hit rate further. The real fault is the high-cardinality Vary header.
A product-listing endpoint sends this. Describe what the shared CDN cache does at second 0, second 120, and during an 8-minute origin outage at second 300.
Heads-up 60 s is a perfectly valid shared-cache freshness window; public makes it CDN-cacheable. SWR and SIE then extend useful serving well past it.
Heads-up That is exactly what stale-while-revalidate=600 prevents: past max-age it serves stale to all and sends one background revalidation, so the herd never hits origin.
Heads-up stale-if-error only serves stale while origin is failing, for up to its window; on the next successful revalidation the CDN caches the fresh 200 and resumes normal behaviour.
This edge-side composition worker is correct in shape but has a latency bug on the critical path. What is it, and what is the fix?
Heads-up HTMLRewriter is a streaming parser; that is its whole point. The latency comes from awaiting the page fetch and the price fetch one after the other instead of in parallel.
Heads-up A single innerContent rewrite is sub-millisecond of CPU, and I/O wait does not count against the wall-time budget. The defect is serialised I/O, fixed by parallelising the fetches.
Heads-up Edge workers routinely fetch external and regional services; that is a core use case. The real issue is doing the two fetches sequentially on the critical path.
You deployed a corrected version of /article/42 ten minutes ago, but users still see the old text. Reading this trace, why, and what is the fastest correct fix?
Heads-up age 7200 is well under max-age 86400, so the entry is still fresh and will serve for ~22 more hours. Only a purge (or TTL expiry) clears it now.
Heads-up HIT is the CDN edge cache, not the browser, and one user's refresh fixes nobody else. A CDN purge is what invalidates the shared edge copy for all users.
Heads-up Changing max-age only affects responses cached after the change; the already-cached entry keeps its original 86400 s window. You must purge to evict it now.
Recap
Every CDN incident is read in headers and traces: a Vary header listing User-Agent or Cookie fragments the cache key per request and collapses hit rate; max-age plus stale-while-revalidate and stale-if-error define a freshness window, a stampede-safe stale window, and an outage fallback; an edge composition worker must parallelise its fetches and bound the volatile one with a timeout and fallback; and a HIT with age below max-age means a stale deploy that only a purge will clear. Diagnose from the artefact, apply the highest-leverage fix, then re-check the same header to confirm.