Caching
Detecting stampedes and designing TTL for production
A team deploys single-flight and locks. Three weeks later an on-call alert fires: DB CPU spiking every 5 minutes. The protection is in place — but it is protecting the wrong keys. The spike is coming from a different key with TTL=300 s that nobody instrumented.
The observability fingerprint
Cache stampedes leave a distinctive signature in metrics:
- DB query rate: a sawtooth pattern — near zero between TTL boundaries, a sharp spike at each boundary. The spike width equals the rebuild duration.
- Periodicity: spikes recur at intervals matching the TTL. TTL=60 s → spikes every 60 s. TTL=300 s → spikes every 5 minutes.
- p99 latency: spikes at the same intervals as the DB query rate.
- cache_miss_total rate: sharp periodic increases at boundaries instead of a smooth low baseline.
Without these instrumented, a stampede looks like a general “DB slowness” incident with no obvious cause.
Minimum-viable dashboard
Six metrics cover all stampede scenarios:
| Metric | Alert condition | What it signals |
|---|---|---|
cache_miss_total rate | Periodic spikes > 5× steady-state | Stampede in progress |
db_query_rate p99 | Sawtooth pattern | Downstream stampede from cache boundaries |
cache_rebuild_duration_seconds p99 | Long tail at boundaries | Rebuild contention |
cache_lock_wait_seconds p99 | Above rebuild p99 | Lock queue building — waiter starvation |
singleflight_subscriber_count p99 | > 1 (coalescing active) | Single-flight firing — normal under load |
cache_swr_stale_serve_total | Non-zero during boundaries | SWR is absorbing expiry — expected |
Alert 1: cache_miss_total rate above 5× steady-state → stampede forming.
Alert 2: db_query_rate p99 above 10× p50 → sawtooth DB load → boundary spikes.
Alert 3: cache_lock_wait_seconds p99 above rebuild duration → lock queue depth growing.
TTL jitter
Single-flight and locks handle per-key stampedes. But what if 1,000 keys all have TTL=300 s and they were all cached at the same time? They all expire together, producing 1,000 simultaneous per-key stampedes — single-flight correctly handles each one, but the sum of 1,000 concurrent rebuilds is a DB spike.
TTL jitter: instead of a fixed TTL, use a random value in a range:
ttl = base_ttl * (1 + jitter_fraction * (rand() - 0.5))
# Example: base=300, jitter=0.25 → TTL in range [225, 375]A fleet of 1,000 keys with ±25% jitter spreads expiries over 150 s instead of all firing at once. DB load becomes a smooth low curve instead of a spike.
Most cache libraries support jitter natively (Caffeine in Java, Redis via application-level calculation). The default ±15–25% is sufficient for most workloads.
Negative caching
The same stampede shape applies when the database answers “no such row.” If the application does not cache null results, every request for a non-existent key hits the DB — and under high concurrency this is a miss-storm that overloads the DB as fully as a positive-key stampede.
Fix: cache the “missing” sentinel with a short TTL.
# On DB miss:
SET key:missing "" EX 10 # 10 s negative TTL
# On read:
val = GET key
if val == "":
return NOT_FOUND # from cache, no DB hitShort negative TTL (5–30 s) bounds memory churn. The positive TTL can be much longer (60–300 s). Write-through invalidation must delete the negative entry when a real row is inserted.
Security note: without negative caching, an attacker can mint random non-existent keys (random UUIDs in a URL path) to amplify DB load by orders of magnitude — a documented pattern that affected CDN-backed sites in 2024.
Pre-warming after restarts
A cache that restarts cold (deploy, eviction, machine failure) starts empty. Every incoming request misses and hits the DB — a full-traffic origin spike. If restarted at peak traffic, this spike is equal in magnitude to a full stampede.
Pre-warming procedure:
- Before accepting public traffic, replay the top-N most-accessed keys from an audit log or access log.
- Warm the cache first, then cut traffic.
- For blue-green deploys: warm the green cache instance to steady-state before switching the load balancer.
Cloudflare edges pre-warm from neighbouring POPs. Redis-backed services use a startup script reading “top 1,000 keys” from an audit table. The rule: never restart a cache under live traffic without pre-warming.
Why this works
Pre-warming is the most common missed step in cache tier upgrade runbooks. Teams test the lock and SWR logic, but neglect the cold-start window. The cold-start stampede is usually 3–10× worse than a normal TTL-boundary stampede because 100% of keys are cold simultaneously. Runbooks must include a “warm the new cache before routing traffic” step as a hard gate.
A service's DB query rate shows sharp spikes every 60 seconds with near-zero load between spikes. What is the most likely cause?
A cache stores 5,000 product pages, all cached at the same moment with TTL=300 s. What happens at second 300, even with single-flight protection?
Order the steps to diagnose and fix a 60-second-periodic DB spike:
- 1 Check DB query rate over time — confirm sawtooth pattern with 60-second periodicity
- 2 Identify the cache key(s) with TTL=60 s on the hot path
- 3 Deploy in-process single-flight as the first mitigation
- 4 Verify spikes drop from 4,000 QPS to 50 QPS (one rebuild per node × 50 nodes)
- 5 Add TTL jitter ±20% to desynchronise future expiries
- 6 Add dashboard alert: cache_miss_total rate above 5× steady-state
- 7 Run a synthetic stampede test in CI: inject 5,000 misses, assert DB query count stays under threshold
A service caches user profile lookups for 5 minutes. An attacker requests 100,000 random non-existent user IDs per second. What happens without negative caching?
- 01A team has single-flight and Redis locks deployed. What does cache_lock_wait_seconds p99 above the rebuild p99 indicate, and what is the correct fix?
- 02Explain why pre-warming is the most important step before traffic cutover in a blue-green cache deployment, and what happens if it is skipped.
Detecting a cache stampede in production requires instrumented metrics: a sawtooth db_query_rate pattern with periodicity matching the TTL is the canonical fingerprint. Minimum viable observability includes six metrics: miss rate, DB query rate, rebuild duration, lock wait, single-flight subscriber count, and SWR stale serve count. TTL jitter (±15–25%) prevents synchronized multi-key expiry by spreading boundaries across time. Negative caching (short-TTL sentinel for missing rows) prevents miss-storm amplification attacks. Pre-warming the cache before accepting live traffic after a restart prevents cold-start stampede. Together these operational practices close the gap between “mitigations deployed” and “stampedes actually prevented in production.”
appears again in202
- Why GraphQL gets N+1junior
- DataLoader mechanics: tick-boundary batchingmiddle
- Batch function contracts: ordering, shapes, errorsmiddle
- Federation and lookahead: batching beyond DataLoadermiddle
- Query complexity defences: depth, cost, persisted queriesmiddle
- Senior GraphQL API: scheduling contract, tenant isolation, observabilitysenior
- Why idempotency: making retries safejunior
- Server-side state machine: four states of an idempotency keymiddle
- Outbox and inbox: effectively-once across the dual-write boundarymiddle
- Concurrency and cache architecture for idempotency at scalesenior
- Observability, production failures, and global-scale designsenior
- The event loop: one thread, three queuesjunior
- Tasks, microtasks, and scheduler.yield()middle
- Microtask starvation, Long Tasks, and LoAFsenior
- Node.js event loop: phases, nextTick, and loop lagsenior
- React, Vue, and INP observability in productionsenior
- The render pipeline: six stages from bytes to pixelsjunior
- Stage costs and the renderer process modelmiddle
- Invalidation, dirty bits, and containmiddle
- Compositor layers: promotion, overlap, and GPU memorymiddle
- DevTools flame strip and the frame lifecyclemiddle
- Layout thrash: forced synchronous layoutsenior
- BeginMainFrame, compositor-driven animations, and GPU memorysenior
- Production observability: LoAF, INP, and the full attack surfacesenior
- What V8 is and why performance varies 100×junior
- V8''''s four-tier JIT pipeline and profile-guided tieringmiddle
- Hidden classes, transition trees, and memory layoutmiddle
- Inline caches, IC states, and deoptimizationmiddle
- Orinoco GC: parallel scavenger, concurrent marking, and write barriersmiddle
- TurboFan''''s speculative engine and the deopt-loop trapsenior
- V8 in production: isolates, pointer compression, and real failuressenior
- What workers are and why they existjunior
- Web worker mechanics: dedicated, shared, and OffscreenCanvasmiddle
- Structured clone and transferablesmiddle
- Service worker lifecycle and cache strategiesmiddle
- SharedArrayBuffer, Atomics, and cross-origin isolationsenior
- Service worker edge cases: version skew, durability, and navigation trapssenior
- Worker pools, Comlink, and production observabilitysenior
- What the reconciler does: render vs commitjunior
- The fiber object and the double-buffer treemiddle
- Render phase purity and commit phase sub-stepsmiddle
- Reconciliation: diffing heuristics and the key trapmiddle
- Priority lanes, time-slicing, and useTransitionmiddle
- Bailout, memoisation, and tearingsenior
- React Profiler, the Compiler, and production observabilitysenior
- Rendering strategies: SSG, SSR, ISR, streaming, and hydrationjunior
- SSG, SSR, ISR, streaming, and RSC — how each worksmiddle
- Hydration cost: selective, progressive, islands, resumabilitymiddle
- Hydration mismatch: causes, detection, and the determinism rulesenior
- RSC, per-route strategy, and production observabilitysenior
- Core Web Vitals: what LCP, INP, and CLS measurejunior
- CLS: why layout shifts happen and how to stop themmiddle
- Metric tradeoffs, RUM attribution, and the CI+field loopsenior
- The full picture: URL to LCP to INP as a relay racejunior
- Eight layers traced: from the service worker to the second navigationmiddle
- Five canonical breaks: where production reliably diessenior
- The three-track method: reading traces and building a monitored systemsenior
- What a relation is: tables, rows, keys, and constraintsjunior
- Constraints, keys, and Postgres data typesmiddle
- Normal forms, denormalization, and why schemas stickmiddle
- JSONB, arrays, and when a side table winsmiddle
- Heap storage, TOAST, and column alignmentsenior
- Schema integrity: deferral, versioning, and production failure modessenior
- Relational vs document, wide-column, graph, and key-valuesenior
- Index-only scans, the Visibility Map, and INCLUDEsenior
- Production failure modes and the index audit playbooksenior
- pg_statistic, ANALYZE, and production observabilitymiddle
- Production failure modes and plan stabilitysenior
- MVCC: why readers and writers never wait for each otherjunior
- Row versions and snapshots: the on-disk mechanicsmiddle
- HOT updates and isolation levels: what you gain and what you paymiddle
- Vacuum and bloat: keeping the storage tax boundedmiddle
- CLOG, XID wraparound, and MultiXact: deep visibility internalssenior
- SSI internals and production autovacuum tuningsenior
- Real-world MVCC failures, deployment patterns, and distributed snapshotssenior
- Connection pools: amortising the cost of a Postgres backendjunior
- PgBouncer session, transaction, and statement modesmiddle
- Pool sizing: the (cores × 2) + spindles formula and the two-layer stackmiddle
- Pool exhaustion and idle-in-transaction: the 3 AM failure modemiddle
- Migrating to transaction mode: rollout playbook and PgBouncer 1.21 prepared statementsmiddle
- The Postgres process model and why raising max_connections degrades throughputsenior
- Pooler landscape 2026, serverless connection storms, and the full failure-mode taxonomysenior
- What a schema migration is and why it replaces ad-hoc DDLjunior
- ADD COLUMN: instant in PG 11+ vs rewrite in older Postgresjunior
- The lock-queue failure mode: why instant DDL can freeze the databasemiddle
- Safe DDL patterns: NOT VALID, CONCURRENTLY, and unsafe-op fixesmiddle
- Expand-contract: zero-downtime for breaking schema changesmiddle
- Advisory locks, migration tools, and deploy coordinationsenior
- Migration failure taxonomy and production disciplinesenior
- Why sharding exists: the single-Postgres ceilingjunior
- Shard-key selection: hash, range, list, and directory strategiesmiddle
- Partitioning vs sharding: same word, two different thingsmiddle
- Co-location and Citus: the invariant that makes sharding usablemiddle
- The hot-shard failure mode: detection, isolation, and durable policymiddle
- Schema-based sharding and multi-tenancy alternativessenior
- Online resharding, 2PC, and the operational cost of shardingsenior
- The seven acts: from CREATE TABLE to Citusjunior
- Acts 1–3 in depth: schema, indexes, and planner statisticsmiddle
- Acts 4–6 in depth: MVCC bloat, connection pooling, and safe migrationsmiddle
- Act 7 in depth: sharding, co-location, and the seven-tier tradeoff cascademiddle
- Observability, anti-patterns, and production triagesenior
- Raft roles, terms, and why majority quorums prevent split brainjunior
- How Raft replicates a log entry and decides it is safe to commitmiddle
- Raft leader election: timeouts, voting rules, and the four safety propertiesmiddle
- Raft in the real world: partitions, slow disks, and client routingmiddle
- Raft extensions: pre-vote, learners, snapshots, and linearizable readssenior
- Raft in production: membership changes, Multi-Raft, and observabilitysenior
- Where data fetching happens — and why it decides LCPjunior
- Fetch waterfalls — diagnosis and the Promise.all curemiddle
- React Server Components and Suspense streamingmiddle
- Client-side cache: TanStack Query, SWR, and stale-while-revalidatemiddle
- LCP, prefetch, and race conditions in interactive fetchingmiddle
- Senior internals: RSC payload, caching layers, and production failure modessenior
- The IP envelopejunior
- Reading the IP headermiddle
- The three-way handshakejunior
- Sequence numbers and connection statemiddle
- DNS: what it does and why it existsjunior
- The resolver walk: referrals, record types, and gluemiddle
- TTL, caching, and DNS propagationmiddle
- What TLS does and why it existsjunior
- The 1-RTT handshake: key shares and ECDHEmiddle
- Session resumption and 0-RTTmiddle
- Key schedule, SNI, ALPN, and extensionssenior
- 0-RTT defenses, ECH, hybrid PQ, and production TLSsenior
- WebSocket: the HTTP upgrade handshakejunior
- WebSocket frame format: opcodes, masking, fragmentationmiddle
- WebSocket backpressure: when clients can''''t keep upmiddle
- Reconnection: jittered backoff, thundering herd, message resumptionsenior
- WebSocket at scale: HTTP/2 multiplexing, permessage-deflate, C10Msenior
- WebSocket in production: proxies, security, and distributed architecturesenior
- What reverse proxies dojunior
- Health checks, connection draining, and slow startmiddle
- Session affinity, consistent hashing, and the right fixmiddle
- Retry storms, circuit breakers, and load sheddingsenior
- Resilient LB architecture: anycast, zone-aware routing, and observabilitysenior
- Why QUIC and not TCP+TLSjunior
- Connection IDs and network migrationmiddle
- 0-RTT resumption and packet encryptionsenior
- DDoS: what it is and why it worksjunior
- Amplification attacks and state exhaustionmiddle
- Rate limiting: algorithms and architecturemiddle
- WAFs, firewalls, mTLS, and HSTSmiddle
- DNS cache poisoning and BGP hijackingsenior
- Defense-in-depth architecture and attack economicssenior
- The twelve layers: one URL, seven actorsjunior
- DNS, TCP, TLS in sequence: where the milliseconds gomiddle
- Proxy intercepts and security gates: rate limiters, WAF, mTLSmiddle
- Alternate paths: QUIC 0-RTT, WebSocket upgrade, connection migrationmiddle
- Observability: distributed traces, USE/RED, and samplingsenior
- Resilience: cascading retries, circuit breakers, and error budgetssenior
- What the three signals are: logs, metrics, and tracesjunior
- Why structured logs exist: the diary vs the spreadsheetjunior
- The production log schema: fields every line must carrymiddle
- PII redaction and log injectionsenior
- OTel Logs Data Model and audit logs as a subsystemsenior
- What is OpenTelemetry: API, SDK, Collector, OTLPjunior
- OTel signals, Semantic Conventions, and the OTLP wire formatmiddle
- The OTel Collector: receivers, processors, exporters, and deployment patternsmiddle
- Vendor neutrality, eBPF instrumentation, the Operator, and browser/serverless OTelsenior
- Operating the OTel Collector: reliability, version skew, failure modes, and governancesenior
- SLI, SLO, and the error budget: reliability by the numbersjunior
- Error budget policy, latency SLOs, and composite journeysmiddle
- Production SLO failures, self-observability, security, and the big picturesenior
- What is trace propagation and why broken propagation is worse than nonejunior
- traceparent and tracestate: the W3C header format in fullmiddle
- Baggage and async boundaries: carrying context across queues and callbacksmiddle
- Async context per language, service mesh, B3 migration, and securitysenior
- Production propagation failures, span links, and platform designsenior
- The debugging funnel: SLO → RED → trace → profilejunior
- OTel architecture: one SDK, four signals, one wire formatmiddle
- The incident loop: from pager to postmortem to preventionmiddle
- Scale, security, and the ROI of observable systemssenior
- Cache lines, struct layout, and false sharingmiddle
- SIMD, SoA vs AoS, and memory bandwidthmiddle
- Cache-oblivious algorithms, PGO, and production failuressenior
- GC in production: observability, security, edge cases, and fleet governancesenior
- Batching: amortize fixed cost per operationjunior
- The batching window: size and wait timemiddle
- Batching in Kafka and Postgresmiddle
- io_uring and observability of batchingmiddle
- From Nagle to io_uring: evolution of batchingmiddle
- Backpressure, failure isolation, and batch security in productionsenior
- CI enforcement and RUM: making budgets stickmiddle
- V8 JIT pipeline, HTTP priorities, and bundle securitysenior
- The performance loop: discipline, not a projectjunior
- Classify and fix: matching bottleneck families to remediesmiddle
- Observability stack and CI gates: catching regressions before they shipmiddle
- Incident to enforcement: SLO burn to verified fix in 35 minutesmiddle
- Culture, economics, and org-scale performancesenior
- At-most-once, at-least-once, exactly-once: the three delivery contractsjunior
- The three failure legs — where duplicates and losses actually happenmiddle
- Consumer-side dedup: the cheapest path to exactly-once processingmiddle
- Kafka exactly-once semantics: idempotent producer and transactionsmiddle
- SQS visibility timeout, DLQ, and the outbox patternmiddle
- Exactly-once in production: impossibility proof, hybrid patterns, and real incidentssenior
- What OAuth is and why passwords are not the answerjunior
- Authorization code flow with PKCEmiddle
- ID token validation and JWKS cache managementmiddle
- Refresh token rotation and scope-based least privilegemiddle
- Sender-constrained tokens: DPoP and mTLSsenior
- OAuth in production: audience attacks, observability, and real failuressenior