awesome-everything RU
↑ Back to the climb

Backend Architecture

Timeouts and tail latency: budgets, deadlines, and the fan-out trap

Crux Every hop needs a timeout, and timeouts must compose into a request-wide budget. At scale the tail dominates: fan out to enough services and the slowest one decides almost every user''''s latency.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at senior altitude — in orbit
◷ 18 min

Each service in the chain is healthy: every dependency answers in 10 ms at the 99th percentile. The product page fans out to 100 of them in parallel and waits for all. Yet 63% of page loads take over a second. No single service is slow. The math of the tail is slow — and most engineers’ intuition about averages hides it completely.

A timeout on every hop, or a hang on every incident

A network call without a timeout is a bug waiting for an incident. When a dependency stops responding (not refuses — hangs), a call with no timeout waits forever, holding a thread or connection. Enough such calls exhaust the pool, and a single slow dependency takes down a healthy service. Every outbound call — DB query, cache, HTTP, RPC — needs an explicit timeout. The default in most clients is no timeout, which is the worst default in backend engineering.

Timeouts must compose into a budget

Per-hop timeouts set in isolation lie. If a request has a 1 s SLA but calls service A (timeout 1 s) which calls service B (timeout 1 s), then when B is slow, A waits the full second and the client has already given up — A is now doing work nobody is waiting for. The fix is a timeout budget (a deadline): the entry point allocates a total, and each hop passes the remaining time downward. gRPC formalizes this as a deadline propagated in metadata; every service computes its local timeout as min(local default, remaining budget).

ApproachWhat each hop usesFailure mode
No timeoutsOne hung dependency exhausts pools, cascades
Independent per-hop timeoutsA fixed local valueInner work outlives the caller’s patience
Propagated deadline (budget)min(local, remaining)Bounded; inner hops stop when the budget is spent

Why the tail, not the average, is the SLA

Users do not experience your average. They experience their own request, and the slow ones are what they remember and what trips alerts. So latency is reported as percentiles: p50 (median), p99 (1 in 100 is worse), p99.9. The gap between p50 and p99 is “tail latency,” caused by GC pauses, queueing, cache misses, lock contention, and retries.

The danger is tail amplification under fan-out. If one request to a service is slow with probability p, then a request that fans out to N services in parallel and waits for all is slow if any one is slow — probability 1 − (1 − p)^N. With a per-service p99 (p = 1%) and N = 100, that is 1 − 0.99^100 ≈ 63%. This is the Hook’s number, straight from Dean and Barroso’s The Tail at Scale: a service fanning out to 2,000 leaves about 20% of requests over a second even when each backend’s p99 is fine.

Defending the tail: hedging, not just timeouts

A timeout caps the worst case but does not improve the typical tail. The technique from The Tail at Scale is hedged requests: send the request, and if no answer arrives by the p95 latency, send a second copy to another replica and take whichever returns first. Because only the slow ~5% get hedged, the extra load is small (~5%) while the tail collapses — in Google’s measurements, sending a hedge after a 10 ms delay cut p99.9 from 1,800 ms to 74 ms at the cost of ~2% more requests. Tied requests go further: the duplicates tell each other to cancel once one starts executing, trimming wasted work.

Why this works

Why not just retry on timeout instead of hedging? A retry fires only after you have already paid the full timeout — so it improves availability but not latency, and naive retries amplify load exactly when a service is already struggling (the retry storm). Hedging fires speculatively at p95, before the timeout, so it attacks latency directly; and because it is capped to the slow tail it adds bounded load. The two are complementary: hedge to cut the tail, retry with backoff and jitter to survive failures, and a circuit breaker (next unit) to stop both when the dependency is truly down.

Quiz

A page fans out to 100 independent services in parallel and waits for all. Each has a p99 of 10 ms (1% chance a call exceeds it). Roughly what fraction of page loads exceed 10 ms on at least one call?

Quiz

Why do independent per-hop timeouts fail to protect a request with an overall SLA?

Quiz

How does a hedged request reduce tail latency without large extra load?

Recall before you leave
  1. 01
    Why does every outbound call need an explicit timeout, and why is a propagated deadline better than independent per-hop timeouts?
  2. 02
    Explain tail amplification under fan-out with the math and the canonical numbers.
  3. 03
    What are hedged requests and tied requests, and why are they preferred over retries for cutting the tail?
Recap

The last stop turns a request from “it returns” into “it returns in time.” Every outbound call needs an explicit timeout, because the default of waiting forever lets one hung dependency exhaust pools and cascade. But isolated timeouts do not compose, so they must roll up into a propagated deadline where each hop uses min(local, remaining) — the model gRPC standardizes. At scale, the average is a lie: users feel their own request, so you track p99/p99.9, and fan-out amplifies the tail brutally — 1 − (1 − p)^N reaches 63% slow at p=1%, N=100. Timeouts cap the worst case but do not fix the typical tail; hedged and tied requests, fired at p95, collapse it for a few percent extra load. This is the bridge to resilience: when a dependency is not just slow but failing, timeouts and hedging are not enough, and the next unit’s circuit breakers and bulkheads take over.

Connected lessons
appears again in185
Continue the climb ↑Request lifecycle: multiple-choice review
shortcuts expand
search
K
prev piece
k
next piece
j
cycle tier
t
this menu
?
sources3
expand
  1. 01
  2. 02
  3. 03

Trademarks belong to their respective owners. Editorial reference only.