Backend Architecture BE · 05 · 03

Retry strategies: backoff, jitter, and thundering herd

Exponential backoff without jitter creates synchronized retry storms. Full jitter spreads retries across time. Retry-After and retry budgets prevent recovering services from being re-killed.

BE Middle ◷ 14 min

Level

FoundationsJuniorMiddleSenior

A service returns 503. One thousand clients all schedule a retry for exactly 2 seconds later. At T+2s, 1000 requests hit the recovering server simultaneously — the same spike that broke it the first time. The retry mechanism just made the outage worse.

Exponential backoff: the base formula

A poorly configured retry loop does not just fail to help — it can turn a 30-second outage into a 30-minute one. Here is why, and what to do instead.

The minimum viable retry loop waits longer before each successive attempt:

delay = min(cap, base × 2^attempt)

base is typically 100–500 ms. cap is typically 30–60 s. Attempts cap at 3–6 depending on operation criticality.

With base = 200ms and cap = 30s:

Attempt 1 → wait 200 ms
Attempt 2 → wait 400 ms
Attempt 3 → wait 800 ms
Attempt 4 → wait 1 600 ms
…
Attempt 8 → wait 30 s (capped)

Problem: if 1,000 clients fail at the same instant, they all pick the same delay and retry at the exact same time — the synchronized spike is the thundering herd.

Jitter: spreading retries across time

Three jitter variants (named by the AWS Architecture Blog):

Full jitter (recommended default):

delay = random(0, min(cap, base × 2^attempt))

Picks a random value from 0 to the exponential cap. Smoothest distribution. Simplest to implement.

Equal jitter:

delay = (base × 2^attempt)/2 + random(0, (base × 2^attempt)/2)

Half deterministic, half random. Harder to reason about.

Decorrelated jitter (used by AWS SDK v3):

delay = min(cap, random(base, prev_delay × 3))

Each retry’s window depends on the previous delay. Grows organically. Empirically similar load profile to full jitter.

Avoid plain exponential backoff without jitter — it is a textbook thundering-herd trap.

Jitter type	Formula	Used by
Full	`random(0, min(cap, base × 2^n))`	Most SDKs
Equal	`half_fixed + random(0, half_fixed)`	Some Java libs
Decorrelated	`random(base, prev × 3)` capped	AWS SDK v3
None	`min(cap, base × 2^n)`	Thundering herd

What to retry — and what not to

Before you add a retry to any client call, ask: if this request fails with a 400, will retrying it ever succeed? The answer determines whether you need a retry at all.

Retry conditions matter as much as retry timing:

Retry: 5xx errors, network timeouts, connection refused, 429 Too Many Requests (with Retry-After).

Do not retry: 4xx errors (except 408 and 429) — the client’s request is wrong and retrying produces the same 4xx. 410 Gone — the resource is permanently absent.

Many libraries naively retry every non-2xx, generating retry storms for permanent client bugs. Explicitly enumerate the retry condition in your configuration.

Retry only transient, server-side or network failures; never retry a permanent client error — a wrong request just produces the same 4xx again.

Retry-After: the server’s backoff hint

When a server returns 429 or 503, it can include Retry-After: 30 (seconds) or an absolute date. A well-behaved client honors this before applying its own backoff. gRPC uses the grpc-retry-pushback-ms trailer for the same purpose.

A client that ignores Retry-After and applies its own short backoff is the thundering-herd pattern in action — the server asked for breathing room and the client refused.

Retry budget: capping total retries across the call site

Individual retry caps (3–6 attempts) are not enough at scale. A retry budget is a token-bucket of allowed retries per second across the entire call site. When the bucket empties, the client opens a circuit and returns a deterministic error immediately — it stops hammering the downstream.

Without a retry budget, a service outage causes every client to burn through its 6-attempt budget simultaneously, amplifying traffic by 6× and turning a 30-second blip into a minutes-long outage.

▸Why this works

Netflix and Google published tail-latency papers showing uncapped retries cascade: a slow downstream causes retries, retries increase load on the downstream, increased load creates more slow responses, which create more retries. A retry budget breaks this loop by making the retry rate a hard limit, not a per-request policy.

Quiz

Why is jitter added on top of exponential backoff?

Quiz

A server returns 503 with Retry-After: 30. Three clients: A honors Retry-After, B ignores it and uses its own 1s backoff, C does not retry at all. Which client behaves correctly?

Order the steps

Put the thundering herd scenario steps in order (no jitter, 1000 clients):

1 All 1000 clients POST to the server; the downstream is overloaded and returns 503
2 All 1000 clients schedule a retry at exactly T + 1s (no jitter)
3 At T + 1s, all 1000 retry simultaneously — the same spike that broke the server
4 Server returns 503 again; all 1000 schedule for T + 2s
5 The pattern repeats indefinitely, preventing recovery

Without jitter all N clients wake at the same instant — thundering herd. Full jitter spreads retries uniformly across the window.

Recall before you leave

01
Explain why a retry loop without jitter is more dangerous than the original failure it is trying to recover from.
02
What is the full jitter formula and what does each variable mean?
03
What is a retry budget and why is an individual attempt cap of 6 not enough?

Recap

Exponential backoff — min(cap, base × 2^n) — grows the wait between retries, but without jitter it synchronizes all clients to the same moment and causes thundering herds. Full jitter adds random(0, ...) to spread retries uniformly across the window. The AWS SDK v3 uses decorrelated jitter, which behaves similarly in practice. Retry only 5xx and timeouts, not 4xx. Honor the Retry-After header from 429/503 responses. Cap attempts at 3–6 per request, and enforce a retry budget (token-bucket) across the call site to prevent a failing downstream from being re-killed on every retry cycle. Now when you see a service take minutes to recover from what should have been a 30-second blip, look at the clients — synchronized retries without jitter are almost always the amplifier.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 5 done

Connected lessons

builds on

Why idempotency: making retries safejunior

unlocks

deepens into

Observability, production failures, and global-scale designsenior

appears again in188

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.

Apply this

Put this lesson to work on a real build.

Job schedulerA cron + backoff job runner with at-least-once delivery, idempotent handlers, and visibility timeouts — so no job is silently lost even when workers crash mid-execution.Distributed rate limiterBuild a token-bucket limiter that holds across many app instances by keeping the counter in Redis, not in process memory.