Caching CACHE · 01 · 01

Caching layers: from L1 to CDN, and where the wrong layer bites

Every layer from CPU cache to CDN trades latency for distance and cost. Caching at the wrong layer, double-caching, or fronting a fast origin with a slow cache all make systems worse, not faster.

CACHE Junior ◷ 16 min

Level

FoundationsJuniorMiddleSenior

Already know this unit? Take a 1-minute quick check →

A team adds Redis in front of a Postgres query to “speed it up.” Latency gets worse. The query was a primary-key lookup hitting a warm buffer pool — Postgres answered in ~0.3 ms from RAM. The new path serialized a key, made a network round trip to Redis (~1 ms in-region), deserialized, and only then maybe hit. They had wrapped a RAM-speed origin in a network-speed cache. The cache was the bottleneck it was supposed to remove.

In ten minutes you’ll know exactly how to read the latency numbers, spot the layer mismatch before it ships, and walk away with the three questions that gate every caching decision.

The hierarchy is a latency ladder

Caching is not one thing — it is a stack of layers, each one bigger, cheaper, and slower than the one above it. The whole reason the stack exists is that fast storage is small and expensive, and large storage is slow and cheap. Each layer holds the hot subset of the layer below it so most accesses never reach the slow tier. The numbers are the entire point: a senior carries them in their head because they decide whether a cache helps at all.

The orders of magnitude matter more than the exact figures. L1 is roughly 200x faster than RAM; RAM is roughly 1000x faster than an SSD; an SSD is roughly 100x faster than a cross-continent network hop. When you add a cache layer, you are betting that the layer you front is slower than the cache by a wide margin. Front something already faster, and you have added latency, not removed it.

Layer	Typical access	Size / scope	Owned by
L1 CPU cache	~1 ns	~32–64 KB / core	Hardware
L2 CPU cache	~4 ns	~256 KB–1 MB / core	Hardware
L3 CPU cache	~10–40 ns	~8–64 MB / socket	Hardware
RAM	~100 ns	GBs / machine	OS + your process
OS page cache	~100 ns (RAM hit)	free RAM	Kernel
App cache (Redis/memcached)	~0.2–1 ms (in-region)	GBs–TBs, shared	You
SSD	~100 µs	TBs / machine	OS
Reverse proxy / CDN edge	~ms (edge), 100+ ms (far)	global, per-POP	You / provider

What each layer is actually for

The hardware caches (L1/L2/L3) and RAM are not yours to manage — the CPU and OS run them automatically. They still shape your code: a cache-friendly access pattern (sequential, locality-rich) can run an order of magnitude faster than a pointer-chasing one, because the cache line you need is already loaded. That is why a flat array beats a linked list for iteration even with identical big-O.

The OS page cache is the most underrated layer. When you read a file, the kernel keeps its pages in free RAM; the second read is a memory hit, not a disk hit. This is why your database is fast on hot data without you doing anything — Postgres’s buffer pool plus the page cache mean a “disk” read is usually a RAM read. A senior knows this before reaching for Redis: if the working set fits in RAM, the database is already a RAM cache.

Application caches (Redis, memcached) are the first layer you operate. They are a shared, network-attached RAM store: any app instance can hit the same cache, which is what local in-process memory cannot do. The cost is the network round trip — typically sub-millisecond in-region, but 100+ ms if your client and cache are in different continents. Reverse proxies and CDNs push cached responses out toward the user, so a hit never reaches your origin at all; the CDN’s whole value is geographic — serving a French user from a Paris POP in milliseconds instead of crossing the Atlantic.

▸Why this works

“Just add Redis” is the reflex that skips the question that matters: what is the origin’s actual latency? If the data already lives in your process memory, or in the OS page cache, or in a warm DB buffer pool, the origin is RAM-speed and a network cache can only slow it down. Redis earns its place when the origin is genuinely expensive — a multi-join aggregate, an external API, a cold-disk scan — not when it fronts a hot key lookup.

Hit ratio is the only number that justifies a cache

Before you accept a “just add a cache” proposal, ask: what is the actual hit ratio, and what does the math say? The answer often kills the idea.

A cache that misses pays both costs: the cache lookup and the origin fetch. So the average latency of a cached path is hit_rate × cache_latency + miss_rate × (cache_latency + origin_latency). Run the math and a low hit ratio is strictly worse than no cache, because you pay the cache tax on every miss for nothing. This is why “increase the hit ratio” is not always the goal — caching cold, rarely-reused data can drop your overall latency by evicting the genuinely hot keys to make room for one-hit wonders.

A miss pays both costs, so a 0% hit ratio (81 ms) is worse than no cache (80 ms). The cache only wins once the hit ratio is high enough to outweigh the miss tax.

The senior framing: a cache is worth it only when origin_latency is large, the hit ratio is high, and the data tolerates being slightly stale. Miss any of the three and you are adding moving parts, a new failure domain, and an invalidation problem in exchange for nothing. When you audit a cache and find a 40% hit ratio that everyone is afraid to remove, you have found the most expensive kind — one that costs complexity but earns almost nothing.

Pick the best fit

A read endpoint does a 3-table join that takes 80 ms, called 50x/sec, and the data changes a few times a day. Where should the cache go?

The senior failure mode: wrong layer and double-caching

The dangerous bugs are not cache misses — they are caches that lie. Double-caching is the classic: a value is cached in Redis and the response is cached at the reverse proxy/CDN, each with its own TTL and its own invalidation. A write clears Redis but the CDN still serves the old HTML for its 5-minute TTL — or the reverse, the CDN is purged but the proxy in front of it keeps serving its stale copy. Now you have stale-on-stale: two independent layers out of sync, and cf-cache-status in DevTools shows you only one of them. Debugging means walking the request layer by layer (X-Cache, Cache-Status, proxy logs, Redis hits) to find which copy is lying.

The wrong-layer failure is the Hook: fronting a fast origin with a slow cache. It also shows up as caching at a layer that cannot see the invalidation event — caching personalized data at the CDN (which is keyed by URL and can’t tell users apart, so it leaks one user’s data to another), or caching at the edge data that changes per-request. The rule a senior internalizes: cache as close to the consumer as the data’s volatility allows, and never let two layers cache the same thing with independent lifetimes. One owner per cached fact, one invalidation path.

Quiz

You wrap a Postgres primary-key lookup (warm, ~0.3 ms from the buffer pool) in Redis. What happens to latency?

Quiz

A cache layer has a 35% hit ratio fronting an 80 ms origin, with a 1 ms cache lookup. Versus no cache, the average latency is:

Order the steps

Order the caching layers from fastest access to slowest:

1 L1 CPU cache (~1 ns)
2 L3 CPU cache (~10–40 ns)
3 RAM / OS page cache (~100 ns)
4 SSD (~100 µs)
5 Application cache over network — Redis in-region (~1 ms)
6 CDN edge for a far user (100+ ms)

L1 CPU cache ~1 ns

L3 CPU cache ~10–40 ns

RAM / OS page cache ~100 ns

SSD ~100 µs

App cache · Redis (in-region) ~1 ms

CDN edge · far user 100+ ms

Fastest at the top. Each rung is ~orders of magnitude slower than the one above — a cache only removes latency when it fronts a rung slower than itself.

Recall before you leave

01
Explain why adding a cache can make latency worse, using the access-latency numbers.
02
What is double-caching and why does it produce stale-on-stale bugs that are hard to debug?

Recap

Caching is a ladder of layers — L1/L2/L3, RAM, OS page cache, application caches like Redis, reverse proxies and CDNs — each one larger, cheaper, and slower than the one above. The latency numbers are the whole point: roughly 1 ns at L1, 100 ns at RAM, 100 µs at SSD, ~1 ms for an in-region Redis round trip, 100+ ms for a far CDN hop, with orders of magnitude between rungs. A cache pays off only when it fronts a genuinely slower origin, the hit ratio is high, and the data tolerates staleness — because a miss pays both the cache lookup and the origin fetch, so a low hit ratio is worse than no cache. The senior failure modes are caching at the wrong layer (fronting a RAM-speed origin with a network-speed cache, or caching personalized data at a URL-keyed CDN) and double-caching, where two layers hold the same fact with independent TTLs and drift into stale-on-stale bugs that surface only when you walk the request layer by layer. Now when you hear “just add a cache,” you know to measure the origin’s real latency first, check the three gates, and give every cached fact a single owner and a single invalidation path.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 5 done

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.

Apply this

Put this lesson to work on a real build.

Distributed rate limiterBuild a token-bucket limiter that holds across many app instances by keeping the counter in Redis, not in process memory.URL shortener at scaleBuild a URL shortener that survives real traffic — then run it: deploy it, watch it, and work the incident when one hot link melts your cache.