awesome-everything RU
↑ Back to the climb

Caching

Caching layers: from L1 to CDN, and where the wrong layer bites

Crux Every layer from CPU cache to CDN trades latency for distance and cost. Caching at the wrong layer, double-caching, or fronting a fast origin with a slow cache all make systems worse, not faster.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at junior altitude — the surface
◷ 16 min

A team adds Redis in front of a Postgres query to “speed it up.” Latency gets worse. The query was a primary-key lookup hitting a warm buffer pool — Postgres answered in ~0.3 ms from RAM. The new path serialized a key, made a network round trip to Redis (~1 ms in-region), deserialized, and only then maybe hit. They had wrapped a RAM-speed origin in a network-speed cache. The cache was the bottleneck it was supposed to remove.

The hierarchy is a latency ladder

Caching is not one thing — it is a stack of layers, each one bigger, cheaper, and slower than the one above it. The whole reason the stack exists is that fast storage is small and expensive, and large storage is slow and cheap. Each layer holds the hot subset of the layer below it so most accesses never reach the slow tier. The numbers are the entire point: a senior carries them in their head because they decide whether a cache helps at all.

The orders of magnitude matter more than the exact figures. L1 is roughly 200x faster than RAM; RAM is roughly 1000x faster than an SSD; an SSD is roughly 100x faster than a cross-continent network hop. When you add a cache layer, you are betting that the layer you front is slower than the cache by a wide margin. Front something already faster, and you have added latency, not removed it.

LayerTypical accessSize / scopeOwned by
L1 CPU cache~1 ns~32–64 KB / coreHardware
L2 CPU cache~4 ns~256 KB–1 MB / coreHardware
L3 CPU cache~10–40 ns~8–64 MB / socketHardware
RAM~100 nsGBs / machineOS + your process
OS page cache~100 ns (RAM hit)free RAMKernel
App cache (Redis/memcached)~0.2–1 ms (in-region)GBs–TBs, sharedYou
SSD~100 µsTBs / machineOS
Reverse proxy / CDN edge~ms (edge), 100+ ms (far)global, per-POPYou / provider

What each layer is actually for

The hardware caches (L1/L2/L3) and RAM are not yours to manage — the CPU and OS run them automatically. They still shape your code: a cache-friendly access pattern (sequential, locality-rich) can run an order of magnitude faster than a pointer-chasing one, because the cache line you need is already loaded. That is why a flat array beats a linked list for iteration even with identical big-O.

The OS page cache is the most underrated layer. When you read a file, the kernel keeps its pages in free RAM; the second read is a memory hit, not a disk hit. This is why your database is fast on hot data without you doing anything — Postgres’s buffer pool plus the page cache mean a “disk” read is usually a RAM read. A senior knows this before reaching for Redis: if the working set fits in RAM, the database is already a RAM cache.

Application caches (Redis, memcached) are the first layer you operate. They are a shared, network-attached RAM store: any app instance can hit the same cache, which is what local in-process memory cannot do. The cost is the network round trip — typically sub-millisecond in-region, but 100+ ms if your client and cache are in different continents. Reverse proxies and CDNs push cached responses out toward the user, so a hit never reaches your origin at all; the CDN’s whole value is geographic — serving a French user from a Paris POP in milliseconds instead of crossing the Atlantic.

Why this works

“Just add Redis” is the reflex that skips the question that matters: what is the origin’s actual latency? If the data already lives in your process memory, or in the OS page cache, or in a warm DB buffer pool, the origin is RAM-speed and a network cache can only slow it down. Redis earns its place when the origin is genuinely expensive — a multi-join aggregate, an external API, a cold-disk scan — not when it fronts a hot key lookup.

Hit ratio is the only number that justifies a cache

A cache that misses pays both costs: the cache lookup and the origin fetch. So the average latency of a cached path is hit_rate × cache_latency + miss_rate × (cache_latency + origin_latency). Run the math and a low hit ratio is strictly worse than no cache, because you pay the cache tax on every miss for nothing. This is why “increase the hit ratio” is not always the goal — caching cold, rarely-reused data can drop your overall latency by evicting the genuinely hot keys to make room for one-hit wonders.

The senior framing: a cache is worth it only when origin_latency is large, the hit ratio is high, and the data tolerates being slightly stale. Miss any of the three and you are adding moving parts, a new failure domain, and an invalidation problem in exchange for nothing. The most expensive caches in production are the ones with a 40% hit ratio that everyone is afraid to remove.

Pick the best fit

A read endpoint does a 3-table join that takes 80 ms, called 50x/sec, and the data changes a few times a day. Where should the cache go?

The senior failure mode: wrong layer and double-caching

The dangerous bugs are not cache misses — they are caches that lie. Double-caching is the classic: a value is cached in Redis and the response is cached at the reverse proxy/CDN, each with its own TTL and its own invalidation. A write clears Redis but the CDN still serves the old HTML for its 5-minute TTL — or the reverse, the CDN is purged but the proxy in front of it keeps serving its stale copy. Now you have stale-on-stale: two independent layers out of sync, and cf-cache-status in DevTools shows you only one of them. Debugging means walking the request layer by layer (X-Cache, Cache-Status, proxy logs, Redis hits) to find which copy is lying.

The wrong-layer failure is the Hook: fronting a fast origin with a slow cache. It also shows up as caching at a layer that cannot see the invalidation event — caching personalized data at the CDN (which is keyed by URL and can’t tell users apart, so it leaks one user’s data to another), or caching at the edge data that changes per-request. The rule a senior internalizes: cache as close to the consumer as the data’s volatility allows, and never let two layers cache the same thing with independent lifetimes. One owner per cached fact, one invalidation path.

Quiz

You wrap a Postgres primary-key lookup (warm, ~0.3 ms from the buffer pool) in Redis. What happens to latency?

Quiz

A cache layer has a 35% hit ratio fronting an 80 ms origin, with a 1 ms cache lookup. Versus no cache, the average latency is:

Order the steps

Order the caching layers from fastest access to slowest:

  1. 1 L1 CPU cache (~1 ns)
  2. 2 L3 CPU cache (~10–40 ns)
  3. 3 RAM / OS page cache (~100 ns)
  4. 4 SSD (~100 µs)
  5. 5 Application cache over network — Redis in-region (~1 ms)
  6. 6 CDN edge for a far user (100+ ms)
Recall before you leave
  1. 01
    Explain why adding a cache can make latency worse, using the access-latency numbers.
  2. 02
    What is double-caching and why does it produce stale-on-stale bugs that are hard to debug?
Recap

Caching is a ladder of layers — L1/L2/L3, RAM, OS page cache, application caches like Redis, reverse proxies and CDNs — each one larger, cheaper, and slower than the one above. The latency numbers are the whole point: roughly 1 ns at L1, 100 ns at RAM, 100 µs at SSD, ~1 ms for an in-region Redis round trip, 100+ ms for a far CDN hop, with orders of magnitude between rungs. A cache pays off only when it fronts a genuinely slower origin, the hit ratio is high, and the data tolerates staleness — because a miss pays both the cache lookup and the origin fetch, so a low hit ratio is worse than no cache. The senior failure modes are caching at the wrong layer (fronting a RAM-speed origin with a network-speed cache, or caching personalized data at a URL-keyed CDN) and double-caching, where two layers hold the same fact with independent TTLs and drift into stale-on-stale bugs that surface only when you walk the request layer by layer. Cache as close to the consumer as the data’s volatility allows, give every cached fact a single owner and a single invalidation path, and check the origin’s real latency before you reach for Redis.

Continue the climb ↑Cache layers: multiple-choice review
shortcuts expand
search
K
prev piece
k
next piece
j
cycle tier
t
this menu
?
sources4
expand
  1. 01
  2. 02
  3. 03
  4. 04

Trademarks belong to their respective owners. Editorial reference only.