awesome-everything RU
↑ Back to the climb

Backend Architecture

Why a circuit breaker: a slow dependency takes down the caller

Crux A dependency rarely fails cleanly — it gets slow, and a slow dependency is more dangerous than a dead one because every caller waits, occupying a thread until the whole service starves. A circuit breaker fast-fails calls to a sick dependency so the caller stops waiting.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at junior altitude — the surface
◷ 12 min

The payment provider slows from 50 ms to 5 s during an incident — it does not go down, just slow. Your checkout handler calls it on every order and waits. Within seconds every worker thread is parked on a 5-second payment call, the thread pool is full, and the service stops accepting any request — including the home page and the product listing, which never touch payments. Nothing crashed. One downstream got slow, your code kept politely waiting, and the wait spread until the whole service was effectively down. A circuit breaker is the piece that would have noticed payments were failing and started rejecting those calls instantly — freeing the threads and keeping the rest of the app alive.

A slow dependency is more dangerous than a dead one

A dependency that refuses connections fails instantly — your call returns an error in a millisecond and the thread moves on. A dependency that accepts the connection and then takes 5 seconds to answer is far worse, because every caller now blocks for 5 seconds holding scarce resources: a worker thread, a socket, a pooled DB connection it grabbed before the call, and often an upstream caller waiting on it.

This is the same occupancy problem from the pooling unit, now one layer out. Under load the math is brutal: if a handler holds a thread for 5 s instead of 50 ms, the same traffic needs 100× more threads to keep up. It never gets them, so requests queue, the thread pool fills, and the service can no longer serve work that has nothing to do with the slow dependency. One sick downstream becomes a total outage — a cascading failure.

Fast-fail beats hanging

The fix is counter-intuitive: when a dependency is failing, the safest thing you can do is stop calling it and return an error immediately. Returning a failure in 1 ms is strictly better than timing out in 5 s, because the fast failure frees the caller’s resources — the thread, the connection, the upstream slot — to do something useful: serve other routes, return a degraded response, shed load. A slow success that never comes still costs you everything a real success would; a fast failure costs almost nothing.

A circuit breaker automates exactly this. It sits in front of a dependency, watches the calls, and when failures cross a threshold it trips — for a cooldown period it rejects calls instantly without even attempting them, then cautiously tests whether the dependency has recovered. The name is literal: like an electrical breaker, it opens to stop current flowing into a fault, protecting everything wired behind it.

Why this works

Why reject calls you might be able to make, instead of letting each one try and time out? Because every attempt against a sick dependency is not free — it spends a thread, a connection, and a timeout’s worth of wall-clock waiting, all on a call that will almost certainly fail anyway. When the dependency is genuinely down, those attempts do no good and real harm: they keep your resources pinned, they pile retries onto a service that needs less load to recover, and they make your own latency track the broken dependency’s. The breaker’s bet is statistical — once enough recent calls have failed, the next one is overwhelmingly likely to fail too, so the expected value of trying is negative. Failing fast converts a slow, resource-eating, harm-amplifying failure into a cheap, instant one, and hands the freed capacity back to the parts of the system that still work. It is the same discipline as a bounded wait queue: you cap the damage a broken thing can do rather than letting it consume the whole system on the way down.

What the breaker buys you

The breaker turns an unbounded, system-wide failure into a bounded, local one. Instead of every caller discovering the dependency is broken the slow, expensive way — by waiting for a timeout — the breaker discovers it once, then short-circuits everyone else cheaply until the dependency proves it has recovered. That single change is what stops one slow service from taking down the rest.

No breakerWith breaker
Slow dependencyEvery caller waits the full timeoutFirst few fail, rest rejected instantly
Thread/connection costPinned for the whole waitFreed immediately on fast-fail
Blast radiusWhole service starvesContained to the one dependency
Recovery loadFull traffic keeps hammering itTrickle of trial calls only
Caller latencyTracks the broken downstreamStays fast (instant error)
Quiz

A payment provider slows from 50 ms to 5 s but never goes fully down. Within seconds the whole service stops serving even unrelated routes. Why is the slow case worse than an outright outage?

Quiz

Why is returning an error in 1 ms better than timing out in 5 s against a failing dependency?

Order the steps

Order how a slow dependency cascades into a full outage without a breaker:

  1. 1 A downstream dependency slows from milliseconds to seconds
  2. 2 Each caller blocks on it, holding a worker thread for the whole wait
  3. 3 The shared thread pool fills with parked callers
  4. 4 The service can no longer accept any request, even unrelated routes
Recall before you leave
  1. 01
    Why is a slow dependency more dangerous than a dependency that is fully down?
  2. 02
    Why is fast-failing better than letting each call time out, and what does a circuit breaker actually do?
Recap

A dependency seldom dies cleanly; it gets slow, and slow is the dangerous case because every caller waits the full time holding a thread, a socket, and a pooled connection — the pooling unit’s occupancy problem one layer out. Under load a call that grows from 50 ms to 5 s demands roughly a hundred times the threads, so the pool fills and the service starves even on routes that never touch the slow dependency: a cascading failure where one sick downstream takes down everything. The fix is counter-intuitive — when a dependency is failing, stop calling it and return an error immediately, because a 1 ms failure frees the caller’s resources for useful work while a 5 s timeout pins them on a doomed call and keeps hammering a service that needs less load to recover. A circuit breaker automates this: it watches the calls, trips when failures cross a threshold, rejects calls instantly during a cooldown, then tests for recovery, converting an unbounded system-wide failure into a bounded local one. The next lesson opens the breaker up — the three-state machine of closed, open, and half-open, and the cooldown timer that decides how fast it tests recovery.

Connected lessons
Continue the climb ↑The state machine: closed, open, half-open
shortcuts expand
search
K
prev piece
k
next piece
j
cycle tier
t
this menu
?
sources3
expand
  1. 01
  2. 02
  3. 03

Trademarks belong to their respective owners. Editorial reference only.