awesome-everything RU
↑ Back to the climb

Backend Architecture

Retry strategies: backoff, jitter, and thundering herd

Crux Exponential backoff without jitter creates synchronized retry storms. Full jitter spreads retries across time. Retry-After and retry budgets prevent recovering services from being re-killed.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at middle altitude — in the sky
◷ 14 min

A service returns 503. One thousand clients all schedule a retry for exactly 2 seconds later. At T+2s, 1000 requests hit the recovering server simultaneously — the same spike that broke it the first time. The retry mechanism just made the outage worse.

Exponential backoff: the base formula

The minimum viable retry loop waits longer before each successive attempt:

delay = min(cap, base × 2^attempt)

base is typically 100–500 ms. cap is typically 30–60 s. Attempts cap at 3–6 depending on operation criticality.

With base = 200ms and cap = 30s:

  • Attempt 1 → wait 200 ms
  • Attempt 2 → wait 400 ms
  • Attempt 3 → wait 800 ms
  • Attempt 4 → wait 1 600 ms
  • Attempt 8 → wait 30 s (capped)

Problem: if 1,000 clients fail at the same instant, they all pick the same delay and retry at the exact same time — the synchronized spike is the thundering herd.

Jitter: spreading retries across time

Three jitter variants (named by the AWS Architecture Blog):

Full jitter (recommended default):

delay = random(0, min(cap, base × 2^attempt))

Picks a random value from 0 to the exponential cap. Smoothest distribution. Simplest to implement.

Equal jitter:

delay = (base × 2^attempt)/2 + random(0, (base × 2^attempt)/2)

Half deterministic, half random. Harder to reason about.

Decorrelated jitter (used by AWS SDK v3):

delay = min(cap, random(base, prev_delay × 3))

Each retry’s window depends on the previous delay. Grows organically. Empirically similar load profile to full jitter.

Avoid plain exponential backoff without jitter — it is a textbook thundering-herd trap.

Jitter typeFormulaUsed by
Fullrandom(0, min(cap, base × 2^n))Most SDKs
Equalhalf_fixed + random(0, half_fixed)Some Java libs
Decorrelatedrandom(base, prev × 3) cappedAWS SDK v3
Nonemin(cap, base × 2^n)Thundering herd

What to retry — and what not to

Retry conditions matter as much as retry timing:

Retry: 5xx errors, network timeouts, connection refused, 429 Too Many Requests (with Retry-After).

Do not retry: 4xx errors (except 408 and 429) — the client’s request is wrong and retrying produces the same 4xx. 410 Gone — the resource is permanently absent.

Many libraries naively retry every non-2xx, generating retry storms for permanent client bugs. Explicitly enumerate the retry condition in your configuration.

Retry-After: the server’s backoff hint

When a server returns 429 or 503, it can include Retry-After: 30 (seconds) or an absolute date. A well-behaved client honors this before applying its own backoff. gRPC uses the grpc-retry-pushback-ms trailer for the same purpose.

A client that ignores Retry-After and applies its own short backoff is the thundering-herd pattern in action — the server asked for breathing room and the client refused.

Retry budget: capping total retries across the call site

Individual retry caps (3–6 attempts) are not enough at scale. A retry budget is a token-bucket of allowed retries per second across the entire call site. When the bucket empties, the client opens a circuit and returns a deterministic error immediately — it stops hammering the downstream.

Without a retry budget, a service outage causes every client to burn through its 6-attempt budget simultaneously, amplifying traffic by 6× and turning a 30-second blip into a minutes-long outage.

Why this works

Netflix and Google published tail-latency papers showing uncapped retries cascade: a slow downstream causes retries, retries increase load on the downstream, increased load creates more slow responses, which create more retries. A retry budget breaks this loop by making the retry rate a hard limit, not a per-request policy.

Quiz

Why is jitter added on top of exponential backoff?

Quiz

A server returns 503 with Retry-After: 30. Three clients: A honors Retry-After, B ignores it and uses its own 1s backoff, C does not retry at all. Which client behaves correctly?

Order the steps

Put the thundering herd scenario steps in order (no jitter, 1000 clients):

  1. 1 All 1000 clients POST to the server; the downstream is overloaded and returns 503
  2. 2 All 1000 clients schedule a retry at exactly T + 1s (no jitter)
  3. 3 At T + 1s, all 1000 retry simultaneously — the same spike that broke the server
  4. 4 Server returns 503 again; all 1000 schedule for T + 2s
  5. 5 The pattern repeats indefinitely, preventing recovery
Recall before you leave
  1. 01
    Explain why a retry loop without jitter is more dangerous than the original failure it is trying to recover from.
  2. 02
    What is the full jitter formula and what does each variable mean?
  3. 03
    What is a retry budget and why is an individual attempt cap of 6 not enough?
Recap

Exponential backoff — min(cap, base × 2^n) — grows the wait between retries, but without jitter it synchronizes all clients to the same moment and causes thundering herds. Full jitter adds random(0, ...) to spread retries uniformly across the window. The AWS SDK v3 uses decorrelated jitter, which behaves similarly in practice. Retry only 5xx and timeouts, not 4xx. Honor the Retry-After header from 429/503 responses. Cap attempts at 3–6 per request, and enforce a retry budget (token-bucket) across the call site to prevent a failing downstream from being re-killed on every retry cycle.

Connected lessons
appears again in185
Continue the climb ↑Outbox and inbox: effectively-once across the dual-write boundary
shortcuts expand
search
K
prev piece
k
next piece
j
cycle tier
t
this menu
?
sources3
expand
  1. 01
  2. 02
  3. 03

Trademarks belong to their respective owners. Editorial reference only.