Observability capstone: reading signals and queries
Crux Read a PromQL burn-rate expression, a traceparent header, a flame graph, and a correlated log line — predict the behaviour and pick the senior-level read.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at senior altitude — in orbit
◷ 14 min
The track’s artefacts are queries and wire formats: a burn-rate PromQL line, a traceparent string, a flame graph, a log entry carrying a trace-id. Read each one the way you would at 2 am, then pick what a senior engineer concludes.
Goal
Practise reading the four concrete artefacts the chapter produces — an SLO alert query, a propagation header, a profile, and a correlated log — and converting each into the next funnel step.
What does the 14.4 factor and the two-window AND clause achieve in this alert?
Heads-up 14.4 is the burn-rate multiplier (how many times faster than budget-neutral), not the SLO. The windows are ANDed for confirmation, not averaged.
Heads-up The 5m window alone is twitchy and fires on momentary blips. The 1h window is the confirmation gate; removing it trades MTTD for a flood of false pages.
Heads-up Burn-rate alerting fires while the budget is being consumed too fast — a 14.4x burn would exhaust 30 days of budget in ~2 days. It is predictive, not post-mortem.
A downstream service receives this header. Reading the four hyphen-separated fields, what must it do to continue the trace correctly?
Heads-up A new trace-id breaks the trace into two disconnected trees — the whole point of propagation is that the trace-id is constant across every hop. Only the span-id changes.
Heads-up Reusing the incoming span-id as your own span-id merges your span with the caller's, destroying the parent/child causal graph. You inherit the trace-id and parent-id but mint a new span-id.
Heads-up 01 in trace-flags is the sampled bit (keep this trace), not a completion flag. Dropping the header severs propagation and orphans every downstream span.
This profile was filtered to the slow trace-id from the trace view. What is the correct read, and what is the next funnel step?
Heads-up inventory.Lookup is an ancestor frame — its 87% is mostly inherited from its child json.Marshal. You optimise the widest LEAF (the self-time), not the parent that merely calls it.
Heads-up 9% is small; the 73% in json.Marshal dwarfs it. This is a CPU-serialisation problem, not a network one.
Heads-up A single profile already names the widest leaf and its self-time share. Differential profiling helps for regressions, but here one filtered profile points straight at json.Marshal.
Snippet 4 — the correlated log line
{"level":"error","ts":"2026-05-29T02:14:07Z","service":"inventory", "msg":"marshal failed: schema v3 field overflow","trace_id":"4bf92f3577b34da6a3ce929d0e0e4736", "span_id":"00f067aa0ba902b7","http.route":"/inventory/lookup"}
Quiz
Completed
How does this single log line tie back to the trace and profile from the previous snippets, and why does that matter?
Heads-up Timestamp search returns every log in that second across all requests. The trace_id is what pins it to the one failing request precisely — that is the whole reason structured logs carry it.
Heads-up http.route matches every request to that endpoint, not the one slow request. Only the trace_id isolates this specific failing request from thousands of healthy ones.
Heads-up With trace_id propagated into structured logs, logs and traces are explicitly joined. That is the four-signal model: same trace-id on logs, metrics exemplars, traces, and profiles.
Recap
Four artefacts, one chain. The multi-window burn-rate query fires the alert (short window for speed, long window for confirmation, 14.4x = 2-day budget burn). The traceparent header keeps the trace-id constant while minting a new span-id at each hop. The flame graph names the widest leaf as the self-time hotspot — optimise the leaf, not the ancestor. And the structured log carries the same trace_id, so it joins precisely to that request. Read in sequence, these are the funnel made concrete.