awesome-everything RU
↑ Back to the climb

Backend Architecture

Seeing the system: RED metrics, the p99 tail, and breaker state

Crux A backend you can''''t observe is one you can''''t operate. This lesson ties the mechanisms to telemetry: rate, errors, and duration (RED), the p99 tail behind the mean, pool saturation and queue depth, breaker transitions, and the error budget that gates shipping against reliabi
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at senior altitude — in orbit
◷ 17 min

The last lesson described a cascade you could narrate after the fact — but the engineer living through it at 3am sees none of that story. They see a dashboard. And if the dashboard shows only “average latency: 180ms, error rate: 0.2%,” they are blind, because the cascade hides in exactly the places an average erases. The mean latency stays calm while the p99 quietly triples, because the slow tail is a small fraction of requests drowned in the average. The pool is one acquisition away from empty, but “connections: 47” without the limit of 50 next to it tells you nothing. The breaker flapped open and closed four times in the last minute — the single most important signal that a downstream is failing — and it appears nowhere, because nobody emitted it as a metric. Every mechanism you built in this track has an internal state that is the early warning, and a backend that does not surface that state is a backend you are operating with your eyes closed. The previous lessons made the system; this one makes it visible — because you cannot operate, debug, or escape a failure you cannot see, and the composed failures from the last lesson are invisible precisely until you instrument the seams.

RED: the three numbers every service owes you

Start with the request-level view. The RED method says every service should emit three things, and together they answer “is this service healthy?” before you know anything about its internals:

  • Rate — requests per second. The load the service is carrying.
  • Errors — the rate (or fraction) of requests that fail. Not just 5xx; anything the caller experiences as failure, including breaker rejections.
  • Duration — the distribution of response times, not the average. This is where the tail lives.

RED is deliberately request-centric: it describes what a caller feels, which is the right outermost view. Underneath it, you add resource-level signals — saturation of the pool, the loop, the queue — that explain why the RED numbers move.

The mean is a liar; watch percentiles

The single most important habit this lesson teaches: never trust the average latency. A mean blends fast and slow requests into one number that describes none of them. If 99 requests take 20ms and one takes 4 seconds, the mean is ~60ms — a number that looks fine and is experienced by nobody. The metric that matters is the percentile: p50 (median, the typical request), p99 (the slow tail, one request in a hundred), p999 (the rare disaster). The tail is not noise — it is the signal, because the cascade from the last lesson begins as a rising p99 long before the mean moves at all. A pool starting to saturate, a downstream starting to slow, a GC pause — all show up in the tail first. Watch p99 and you see the cascade forming; watch the mean and you see it only after collapse.

Each mechanism has a state worth emitting

The track’s mechanisms are not just code — each has an internal state that is a leading indicator, and the job is to surface it:

  • Pool — emit in-use vs. limit and acquisition wait time / queue depth. Saturation (in-use / limit approaching 1.0) is the earliest sign of the cascade. “47 connections” is meaningless; “47 of 50, 200ms acquire wait” is an alarm.
  • Event loop — emit loop lag. Rising lag means something is blocking the loop, starving every concurrent request.
  • Breaker — emit state and transitions (closed → open → half-open). A flapping breaker is the clearest possible statement that a dependency is unhealthy; it must be a first-class metric, not a log line.
  • Retries — emit retry rate separately from request rate. A retry rate climbing toward the request rate is a storm forming.
  • Shutdown — emit drain duration and forced-kill count. Drains creeping toward the grace period predict dropped work on the next deploy.

The error budget: turning telemetry into a decision

Observability is not just for debugging; it drives a shipping decision. An SLO (service level objective) sets the target — say, 99.9% of requests succeed under 300ms. The gap between that target and 100% is the error budget: the failures you are allowed to spend. When the budget is healthy, you ship features fast and take risk. When telemetry shows the budget is nearly spent, you stop shipping features and spend the engineering on reliability instead. This converts the whole RED/percentile/saturation picture from passive dashboards into an active governor on the team’s behavior — the bridge from “we can see the system” to “the system’s health controls what we do next.”

Why this works

Why insist on percentiles and reject the average so absolutely — surely the mean latency is some useful summary of how the service is doing? Because the average answers a question nobody is asking and hides the one that matters. No user experiences “the mean”; each user experiences their own request, and the distribution of those individual experiences is the entire point. The mean collapses that distribution into a single number that is mathematically dominated by the bulk and structurally blind to the tail — and the tail is where every interesting failure lives. Consider the arithmetic: at scale, p99 is not a rare curiosity. A user who makes 100 requests to load one page hits their personal p99 on almost every page load — the “one in a hundred” slow request is a near-certainty across a session, so p99 is closer to “the experience of an active user” than the median is. Worse, the mean actively conceals the cascade: when a pool starts saturating, a handful of requests get slow while the rest stay fast, so the tail lifts while the mean barely twitches — by the time a rising mean forces your attention, the tail has already been catastrophic for minutes. There is a deeper structural reason too: latency distributions in real systems are not Gaussian, they are heavy-tailed and often multimodal (a fast path and a slow path, e.g. cache hit vs. miss, pool-available vs. pool-wait), and for such distributions the mean is not even a meaningful central tendency — it is an artifact sitting between two humps, describing neither. This is why every mechanism in the track must emit its state as a distribution or a discrete event, never as an average: an averaged breaker state is meaningless, an averaged queue depth hides the spikes that cause drops, an averaged drain time hides the deploy that nearly missed the deadline. The principle generalizes into a rule of operational maturity: monitor the experience at the tail, because the tail is both where users feel pain and where the system tells you — earliest and most clearly — that it is about to compose a failure.

SignalWhat it measuresWhy it’s the early warning
RateRequests / secondThe load the rest of the picture explains
ErrorsFailed-request fractionIncludes breaker rejections, not just 5xx
Duration (p99)Slow-tail latencyCascade shows here before the mean moves
Pool saturationin-use / limit, acquire waitEarliest sign of pool-driven cascade
Loop lagEvent-loop delaySomething is blocking the loop
Breaker stateclosed / open / half-openClearest signal a dependency is failing
Error budgetSLO target vs. actualTurns telemetry into a ship/stop decision
Quiz

A dashboard shows mean latency steady at 180ms and error rate at 0.2%, yet users are complaining the app is slow. What is the most likely blind spot?

Quiz

Why is an error budget more than just a dashboard — what decision does it drive?

Order the steps

Order how you'd instrument a service from outermost signal to shipping decision:

  1. 1 Emit RED at the request edge: rate, errors, and the duration distribution
  2. 2 Watch percentile latency (p99/p999), never the mean, to catch the slow tail
  3. 3 Add resource signals — pool saturation, loop lag, breaker state — to explain why RED moves
  4. 4 Define an SLO and track the error budget to gate feature shipping against reliability
Recall before you leave
  1. 01
    What is the RED method, and why must you watch percentiles instead of the mean latency?
  2. 02
    Which internal state should each mechanism emit, and how does the error budget turn telemetry into a decision?
Recap

The cascade of the last lesson is invisible at 3am if your dashboard shows only averages, so this lesson makes the system visible. RED — rate, errors, duration — is the outermost, caller-centric view, and the iron rule is to watch the duration distribution, never the mean: the cascade surfaces as a rising p99 long before the average twitches, because latency is heavy-tailed and the mean describes a request nobody makes. Every mechanism in the track owns an internal state that is a leading indicator and must be emitted as a metric: pool saturation and acquire wait, event-loop lag, breaker transitions, retry rate, drain duration. And observability is not passive — an SLO and its error budget turn the whole picture into a governor: ship features while the budget is healthy, switch to reliability work when it is nearly spent. Now you can both reason about composed failures and see them forming. The next lesson uses that visibility under the hardest condition — deliberate overload — where every mechanism doubles as a load-control knob and the question is whether the service degrades gracefully or collapses.

Connected lessons
appears again in1
Continue the climb ↑The service under overload: load shedding and graceful degradation
shortcuts expand
search
K
prev piece
k
next piece
j
cycle tier
t
this menu
?
sources3
expand
  1. 01
  2. 02
  3. 03

Trademarks belong to their respective owners. Editorial reference only.