awesome-everything RU
↑ Back to the climb

Backend Architecture

The service under overload: load shedding and graceful degradation

Crux Every mechanism is also a load-control knob, and overload is where they combine: pool and concurrency caps bound work in flight, timeouts and breakers cut losses, and load shedding rejects excess early with backpressure — so the service degrades gracefully instead of collapsing.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at senior altitude — in orbit
◷ 18 min

There is a load level above which your service cannot serve every request, no matter how it is written — and the only choice you have is how it fails when it gets there. The naive instinct is to try to serve everyone: accept every connection, queue every request, retry every failure. That instinct is exactly what produces the collapse from two lessons ago, because accepting work you cannot finish does not help the user whose request you accepted and steals the resources of the users whose requests you could have finished. The counterintuitive truth of overload is that rejecting some requests is how you serve the rest — a service that sheds 20% of load at the door and serves the other 80% fast is vastly healthier than one that accepts 100% and serves all of it too slowly to be useful, then falls over and serves 0%. This is the shift from thinking about throughput (requests handled) to goodput (requests handled usefully, within their deadline). And here is the synthesis: every mechanism you learned is secretly a load-control knob. The pool bounds concurrent work. Timeouts free resources from doomed requests. The breaker sheds load from a failing dependency. Shutdown sheds load from a departing instance. Overload is the condition that makes all of them act at once, and operating a backend means tuning them so the service bends instead of breaking.

Throughput is the wrong target; goodput is the right one

The metric that matters under overload is not how many requests you accept but how many you complete usefully — fast enough that the answer still matters to the caller. Call that goodput. A request that you accept, hold for 8 seconds, and finally answer after the client has already timed out and retried is pure waste: it consumed a connection, a loop slot, and CPU, and produced negative value, because it also delayed real work. Beyond the saturation point, raising accepted load lowers goodput — the classic overload curve where throughput climbs, peaks, then falls off a cliff as the system spends itself on work that no longer matters. The entire discipline of overload handling is keeping the system at the top of that curve instead of sliding down the far side, and that means refusing work you cannot complete in time.

Load shedding: reject early, reject cheaply

Load shedding is the deliberate, early rejection of excess requests so the accepted ones succeed. Three properties make it work:

  • Early. Shed at the edge, before the request consumes a pooled connection or a downstream call. A request rejected at admission costs almost nothing; a request rejected after it has acquired resources has already done the damage. This is why an admission check or concurrency limiter sits in front of the expensive layers.
  • Cheap and honest. The rejection is a fast 503 Service Unavailable with a Retry-After, not a slow error. It tells the client don’t hammer me — which is the opposite of the silent slowness that triggers retry storms.
  • Prioritized. Not all load is equal. Shed the least valuable first — background refreshes before user-facing reads, anonymous before paid, retries before first attempts. A good shedder drops the right 20%, preserving goodput where it counts.

Backpressure: pushing the limit upstream

The dual of shedding is backpressure: instead of accepting work and dropping it, you signal upstream to slow down. A bounded queue that refuses new entries when full, a pool that makes acquisition fail fast instead of queueing unboundedly, an HTTP layer that stops reading the socket — each propagates “I am full” back toward the source, so the pressure is felt where it can actually be reduced (the client backs off, the upstream service routes elsewhere) rather than absorbed silently until something breaks. Shedding and backpressure are two halves of the same idea: make the limit explicit and enforce it at the boundary, rather than letting an implicit limit (memory, file descriptors, the pool) enforce itself by collapsing.

Every mechanism is a load-control knob

The synthesis the whole unit has been building toward: the seven mechanisms are not just correctness tools, they are the actuators of load control, and overload is when they combine:

  • Pool / concurrency limit — the hard cap on work in flight. The single most important overload defense: it converts “unbounded slowdown” into “bounded work plus explicit rejection.”
  • Timeouts — free resources from requests that will not finish in time, so doomed work stops stealing capacity from viable work.
  • Circuit breaker — sheds load from a failing dependency, which is overload localized to one downstream.
  • Retries (bounded, with backoff and jitter) — the load amplifier you must cap, because uncapped retries turn overload into collapse.
  • Graceful shutdown — sheds load from a departing instance without dropping its in-flight work onto the floor.
  • Observability — the feedback that tells you where you are on the goodput curve, so shedding can be triggered by real saturation, not guesses.

Tuning these together — pool sized to the slowest dependency, timeouts nested in a budget, retries capped, a shedder at the edge keyed off saturation metrics — is what makes a service degrade gracefully: serve less, but serve it well, and never fall to zero.

Why this works

Why is rejecting requests — deliberately failing some users — the correct behavior under overload, when it feels like the one thing a service exists to avoid? Because the alternative is not “serve everyone,” it is “serve no one,” and the arithmetic is unforgiving. A server has a finite capacity C of useful work per second. When demand D exceeds C, you cannot complete D; you can only choose what happens to the D − C requests you have no capacity for. If you accept them anyway, they do not vanish — they sit in queues and hold resources while they wait, which means even the C requests you could have served now wait behind them, so their latency rises past the point of usefulness too. Accepting work beyond capacity does not add served requests; it subtracts them, because the excess degrades the requests that were within capacity. This is the cruel inversion at the heart of overload: past the saturation point, trying harder to serve everyone serves fewer people, because the bottleneck resource gets spent on coordination, queueing, and timed-out work instead of completion. Shedding the D − C excess at the door — cheaply, before it acquires anything — is what protects the C: it keeps the queue short, so accepted requests stay fast, so they finish within their deadline and become goodput instead of waste. The reason it feels wrong is that a single rejected request looks like a failure, while the cost it would have imposed on others is diffuse and invisible — you see the 503 you returned but not the fifty timeouts you prevented. Senior judgment is precisely the ability to value the invisible many over the visible one: to shed the least-valuable load early and on purpose, so that the system stays on the productive side of the overload curve. And it connects every prior lesson — the pool is the mechanism that makes C explicit, the timeout is what enforces the deadline that defines goodput, the breaker is shedding applied to a dependency, and observability is what tells you D has crossed C in time to act. Overload handling is not a separate feature bolted on; it is what all seven mechanisms are, seen from the angle of a system being asked for more than it can give.

MechanismIts load-control roleWhat it prevents
Pool / concurrency capHard bound on work in flightUnbounded slowdown from accepting everything
TimeoutFree resources from doomed workDoomed requests stealing viable capacity
Circuit breakerShed load from a failing dependencyHammering a downstream that’s already down
Bounded retries + backoffCap the load amplifierRetry storm turning overload into collapse
Load shedding (edge)Reject excess cheaply, earlyQueue buildup that erases goodput
BackpressureSignal upstream to slow downSilent absorption until something breaks
Graceful shutdownShed load from a departing nodeDropped in-flight work on deploy
Quiz

A service is past its saturation point: demand exceeds capacity. It currently accepts and queues every request. Why does adding deliberate load shedding (fast 503s at the edge) increase the number of users served well?

Quiz

Why is 'goodput' a better target than 'throughput' when handling overload?

Order the steps

Order the layers of overload defense from the request edge inward:

  1. 1 Shed excess at admission: fast 503 + Retry-After, dropping the least-valuable load first
  2. 2 Bound work in flight with a pool / concurrency cap so accepted load is finite
  3. 3 Enforce nested timeouts so doomed requests free their resources quickly
  4. 4 Apply the breaker to failing dependencies and cap retries so they can't amplify
Recall before you leave
  1. 01
    Why is rejecting requests the correct behavior under overload, and what is goodput?
  2. 02
    What makes load shedding effective, how does backpressure relate, and how is each mechanism a load-control knob?
Recap

Above a certain load no service can serve everyone, and the only real choice is how it fails — so the discipline of overload is to reject some requests on purpose so the rest succeed. The target shifts from throughput (accepted) to goodput (completed usefully within deadline), because past saturation accepted-but-unfinishable work clogs queues and drags the in-capacity requests past their deadlines too: serving everyone serves fewer. Load shedding rejects excess early, cheaply, and by priority; backpressure pushes the limit upstream so it is felt where it can be reduced — two halves of making the limit explicit at the boundary. And the synthesis lands: every mechanism in the track is a load-control knob — the pool bounds work, timeouts free doomed requests, the breaker sheds a failing dependency, capped retries tame the amplifier, shutdown sheds a departing node, observability tells you where you are on the curve — tuned together so the service degrades gracefully instead of collapsing. We have now seen the seven mechanisms cooperate, fail, be observed, and be tuned under pressure. The final lesson collects all of it into a production-readiness review: the checklist that turns the whole track into a launch gate.

Connected lessons
Continue the climb ↑Production readiness: the launch checklist that is the whole track
shortcuts expand
search
K
prev piece
k
next piece
j
cycle tier
t
this menu
?
sources3
expand
  1. 01
  2. 02
  3. 03

Trademarks belong to their respective owners. Editorial reference only.