Observability OBS · 04 · 02

Instrumenting RED in Prometheus: counters, histograms, and cardinality discipline

The three canonical Prometheus metrics for RED, why Duration must be a histogram (never an average), how histogram_quantile works, and the iron label discipline that keeps cardinality under control.

OBS Middle ◷ 14 min

Level

FoundationsJuniorMiddleSenior

A team alerts on average request latency. A bug fix pushes p99 from 200 ms to 800 ms — but barely moves the mean. The on-call misses the incident for 40 minutes. The SLO review finds the average-latency alert has never fired on a real user impact. Histograms would have fired in 2 minutes.

The three canonical RED metrics

When you instrument a service for RED, the goal is one unambiguous data source per dimension — not an ad-hoc collection of differently-named counters that each team interprets differently. Here is the canonical shape.

Every HTTP service should emit exactly three metric groups, named consistently:

http_requests_total        # counter — Rate
http_request_errors_total  # counter — Errors (5xx only, or a status label)
http_request_duration_seconds  # histogram — Duration

Prometheus PromQL then gives you all three RED dimensions:

Rate: rate(http_requests_total[5m])
Error rate: rate(http_request_errors_total[5m]) / rate(http_requests_total[5m])
Duration p99: histogram_quantile(0.99, sum by (le) (rate(http_request_duration_seconds_bucket[5m])))

Why Duration must be a histogram

The average hides everything users notice. A service with 99% of requests at 50 ms and 1% at 5000 ms has the same mean latency (~100 ms) as one with all requests at 100 ms. The first kills users on retries; the second does not.

Prometheus’s histogram_quantile(q, buckets) reads per-bucket counts accumulated over a time window and estimates the q-th percentile by linear interpolation between adjacent buckets. Accuracy depends entirely on bucket density near the percentile you care about.

The by (le) requirement. The correct form is always:

histogram_quantile(0.99, sum by (le) (rate(http_request_duration_seconds_bucket[5m])))

Dropping by (le) collapses all label dimensions including le (the bucket boundary label), leaving histogram_quantile with a single point rather than a distribution — the result is NaN or garbage. This is a real, common mistake that silently produces wrong values.

Latency signal	What it hides	Use it for
Average (sum/count)	Slow-tail behavior that users notice	Never for SLO alerts
Prometheus summary	Cannot aggregate across replicas	Single-replica-owns-data only
Prometheus histogram	Accuracy depends on bucket density	Fleet-wide p99 alerts

Bucket strategy

Default Prometheus client buckets — [0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10] seconds — are wrong for most services. For a checkout API with a 200 ms SLO, most traffic falls between 50 ms and 250 ms. One bucket covers that entire range (100 ms to 250 ms), so p99 could be anywhere in it — unreadable.

Production rule: 10–15 buckets, densest around the SLO target. For a 200 ms SLO:

[0.01, 0.025, 0.05, 0.1, 0.2, 0.4, 0.8, 1.6, 3.2, 10]

Three buckets below 200 ms (100, 25, 50 ms boundaries give resolution), three above (400, 800, 1600 ms), hard cap at the service timeout (10 s). Adjacent buckets differ by ≤2× near the SLO.

Label discipline — the iron rule

Every unique combination of label values on a Prometheus metric creates a separate time series. Before you add a label, ask yourself: is this dimension bounded — will it ever stop growing? A naive RED instrumentation labelled by user_id in a service with 100k active users grows from a few hundred series to hundreds of thousands within hours.

What belongs in labels:

route — the URL template (/cart, not /cart?u=12345)
method — HTTP verb (GET / POST / …)
status_class — 2xx / 4xx / 5xx (not the exact code)
service — injected by the deployment as a meta-label

Forbidden in labels: user IDs, request IDs, customer email, session tokens, query strings, country code unless small and bounded. All of these have unbounded cardinality.

The cost math: collapsing 200/201/204 into 2xx cuts 60 unique status codes down to 4 classes. For 20 routes × 4 methods: 60 × 20 × 4 = 4,800 series → 4 × 20 × 4 = 320 series, a 15× reduction with no loss of useful alerting power.

One label choice — exact status code vs status class — is the difference between 4,800 and 320 active series: a 15x cardinality cut with no loss of alerting power.

▸Why this works

If you genuinely need to alert on a specific status code on a specific route, build that alert from logs — not from a metric with a high-cardinality label. Logs are the natural home of high-cardinality data (each event is one record). Metrics are the home of aggregated, time-series counts (each series is a separate in-memory counter). The split is architectural, not preference.

A Node.js RED middleware

const client = require('prom-client');
const reqs = new client.Counter({
  name: 'http_requests_total',
  help: 'Total HTTP requests',
  labelNames: ['method', 'route', 'status_class'],
});
const errs = new client.Counter({
  name: 'http_request_errors_total',
  help: 'Failed HTTP requests (5xx)',
  labelNames: ['method', 'route'],
});
const dur = new client.Histogram({
  name: 'http_request_duration_seconds',
  help: 'Request duration',
  labelNames: ['method', 'route', 'status_class'],
  buckets: [0.01, 0.025, 0.05, 0.1, 0.2, 0.4, 0.8, 1.6, 3.2, 10],
});

app.use((req, res, next) => {
  const start = process.hrtime.bigint();
  res.on('finish', () => {
    const seconds = Number(process.hrtime.bigint() - start) / 1e9;
    const sclass = `${Math.floor(res.statusCode / 100)}xx`;
    const route = req.route?.path || 'unknown';
    reqs.inc({ method: req.method, route, status_class: sclass });
    dur.observe({ method: req.method, route, status_class: sclass }, seconds);
    if (res.statusCode >= 500) errs.inc({ method: req.method, route });
  });
  next();
});

req.route.path gives the matched template (/cart), not req.url which includes query strings. That one line prevents cardinality explosion.

A single middleware wraps the handler: on finish it increments the Rate counter, conditionally increments the Errors counter on 5xx, and records the latency into the Duration histogram — all three RED signals from one instrumentation point per request.

Quiz

A team alerts on the AVERAGE request latency across all replicas. Why is this dangerous?

Quiz

A service emits an Errors counter labelled by exact error_message string. After a buggy release that throws unique stack traces, the metrics backend bill triples overnight. Why?

Quiz

A senior engineer claims histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[5m]))) (no 'by (le)') gives the fleet-wide p99. Why is this wrong?

Recall before you leave

01
Why must RED Duration be a histogram rather than sum/count (average)?
02
What does the 'by (le)' clause do in a histogram_quantile query, and what happens without it?
03
Name three label values forbidden on RED metrics and one label value that is always allowed.

Recap

RED in Prometheus is three metric groups: http_requests_total (counter for Rate), http_request_errors_total (counter for Errors), and http_request_duration_seconds (histogram for Duration). Duration must be a histogram because the average masks tail behavior that users feel — histogram_quantile reads per-bucket counts and interpolates the percentile, but only when sum by (le) preserves the bucket-boundary label. Bucket selection decides p99 accuracy: choose 10–15 buckets densest around the SLO target with adjacent buckets differing by ≤2× near the SLO. Label discipline is the other half: use route templates, HTTP method, and status class — never user IDs, request IDs, or exact error messages. Each unique label combination is a separate time series, billed separately, and stored in RAM on the Prometheus server. Now when you wire up a new service, you will reach for the histogram first, pick your buckets around the SLO target, and ask of every proposed label: “Is this bounded?” before it ships.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 5 done

Connected lessons

builds on

RED and USE: two checklists, one triage disciplinejunior

unlocks

deepens into

appears again in170

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.