Observability
Trace propagation: free-recall review
Retrieval beats re-reading. For each prompt, say or write a full answer from memory before you open the model answer — the effort of recall is what makes propagation reasoning stick.
Reconstruct the unit’s spine from memory — the W3C header, the missing-header fallback, consistent sampling, the tail-sampler RAM model, async-boundary fixes, and the propagation-health metrics — without looking back at the lessons.
- 01Why is broken trace propagation worse than no tracing at all, and what single metric exposes it?
- 02Walk through the four traceparent fields and what a service must do with an invalid header.
- 03How does consistent sampling let 12 services agree on keep/drop with no coordination, and why are partial traces unacceptable?
- 04Give the tail-sampler RAM model and the three independent OOM levers.
- 05Name the async boundaries that drop context and the correct fix for each, including the in-process vs cross-process distinction.
- 06When does the parent-child tree break down, and how do span-links and the 30-minute problem resolve it?
If you could reconstruct each answer from memory, you hold the unit’s spine: the 55-byte W3C traceparent stitches services, an invalid header means start fresh, and a uniformly random trace-id makes consistent sampling work without coordination. The tail-sampler trades RAM for outcome-aware selection — bound it with num_traces, replicas, and span-linked sub-traces. Async boundaries drop context (context.bind in-process, inject/extract cross-process), and because propagation fails silently, the orphan-span rate is the metric that proves the dashboard is telling the truth.