Observability OBS · 08 · 09

Observability capstone: reading signals and queries

Read a PromQL burn-rate expression, a traceparent header, a flame graph, and a correlated log line — predict the behaviour and pick the senior-level read.

OBS Senior ◷ 14 min

Level

FoundationsJuniorMiddleSenior

The track’s artefacts are queries and wire formats: a burn-rate PromQL line, a traceparent string, a flame graph, a log entry carrying a trace-id. Read each one the way you would at 2 am, then pick what a senior engineer concludes.

Goal

Practise reading the four concrete artefacts the chapter produces — an SLO alert query, a propagation header, a profile, and a correlated log — and converting each into the next funnel step.

Snippet 1 — the multi-window burn-rate alert

# SLO: 99.9% availability over 30 days. Error budget = 0.1%.
- alert: CheckoutFastBurn
  expr: |
    (
      sum(rate(http_requests_total{job="checkout",code=~"5.."}[5m]))
      / sum(rate(http_requests_total{job="checkout"}[5m]))
    ) > (14.4 * 0.001)
    and
    (
      sum(rate(http_requests_total{job="checkout",code=~"5.."}[1h]))
      / sum(rate(http_requests_total{job="checkout"}[1h]))
    ) > (14.4 * 0.001)
  for: 2m

Quiz

What does the 14.4 factor and the two-window AND clause achieve in this alert?

Snippet 2 — the traceparent header

traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01

Quiz

A downstream service receives this header. Reading the four hyphen-separated fields, what must it do to continue the trace correctly?

Snippet 3 — the flame graph (text form)

checkout.HandleOrder            100%  (root, 1.50s wall)
  inventory.Lookup               87%  (1.30s)
    json.Marshal                 73%  (1.10s)   <-- widest leaf
    grpc.Invoke                   9%  (0.13s)
  payment.Charge                  8%  (0.12s)

Quiz

This profile was filtered to the slow trace-id from the trace view. What is the correct read, and what is the next funnel step?

Snippet 4 — the correlated log line

{"level":"error","ts":"2026-05-29T02:14:07Z","service":"inventory",
 "msg":"marshal failed: schema v3 field overflow","trace_id":"4bf92f3577b34da6a3ce929d0e0e4736",
 "span_id":"00f067aa0ba902b7","http.route":"/inventory/lookup"}

Quiz

How does this single log line tie back to the trace and profile from the previous snippets, and why does that matter?

Recap

Four artefacts, one chain. The multi-window burn-rate query fires the alert (short window for speed, long window for confirmation, 14.4x = 2-day budget burn). The traceparent header keeps the trace-id constant while minting a new span-id at each hop. The flame graph names the widest leaf as the self-time hotspot — optimise the leaf, not the ancestor. And the structured log carries the same trace_id, so it joins precisely to that request. Read in sequence, these are the funnel made concrete.

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.