awesome-everything RU
↑ Back to the climb

Observability

What is trace propagation and why broken propagation is worse than none

Crux Trace propagation passes one shared identifier across every service a request touches — miss a single hop and the trace silently splits into orphan fragments that look fine on dashboards but hide the real bottleneck.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at junior altitude — the surface
◷ 10 min

A customer opens a support ticket: “checkout took 30 seconds.” Your tracing tool shows traces for every service — but each one is a single span, unconnected to anything else. You have all the data and none of the answers.

What trace propagation is

Trace propagation is the practice of passing a small identifier from one service to the next on every request, so that all the work done across many services for one user action gets stitched into a single picture.

Without propagation, a slow checkout looks like 50 separate stories. With it — one trace, top-to-bottom, navigable in 30 seconds.

The identifier is carried in an HTTP header called traceparent, defined by the W3C Trace Context specification. Every service that receives a request reads the traceparent, uses it as the parent for its own work, generates a new span-id for itself, and writes a new traceparent before making any outbound call of its own. The trace-id stays constant across every hop; span-ids form a parent-child tree.

The relay-race metaphor

Think of an Amazon delivery with one tracking number. The package leaves a warehouse, hops between sorting facilities, rides on three different trucks, and finally arrives at your door. Each hop scans the same tracking number, recording where it was, when, and what the next hop is.

If any one stop forgets to scan, the tracking page goes silent and you have no idea where the package is — even if it eventually arrives.

Trace propagation is the scanning. Every service must:

  1. Read the incoming traceparent (extract the trace-id and parent span-id).
  2. Create its own span (new span-id, parent = the incoming span-id).
  3. Write a new traceparent before any outbound call (same trace-id, its own span-id as the new parent-id).

Miss any one of these steps and the chain breaks.

A concrete scenario with Bea and Sven

An on-call engineer gets a customer support ticket: “checkout took 30 seconds.” She opens her tracing tool, types the request-id from the support ticket, and pulls up one trace. She sees: 50 ms in the API gateway, 80 ms in the auth service, 28 seconds in the inventory service waiting on a database query, 200 ms in payment, 100 ms back to the user. The 28-second bottleneck is named precisely.

Without propagation she would have had to manually correlate 50 log entries across 7 services and guess which ones came from this user. With one trace she knows in 30 seconds.

Why broken propagation is worse than no tracing at all

Without any tracing, you know you have no traces and you fall back to logs. With broken propagation, every service emits spans but none link to each other — the dashboard claims you are observing the system, but each trace covers only one service.

You think you are debugging end-to-end and you are actually debugging in fragments. The missing trace makes the slow service invisible: a request that is fast in service A and slow in service B looks like a fast trace in A and a separate slow trace in B with no causal link. Operators waste hours suspecting the wrong service.

The common failure pattern: A team adds tracing to one service but forgets to enable OTel HTTP-client auto-instrumentation. Every span starts a fresh trace; the dashboard shows traces, but each is one-span-deep. Customers report slowness and the team cannot find where time went — the trace they need is silently split into 50 pieces.

Propagation stateWhat you see in the dashboardWhat you can actually debug
No tracing at allNothingLogs only — you know you’re guessing
Broken propagationTraces everywhere, each 1 span deepNothing end-to-end — but the dashboard claims you can
Correct propagationFull tree: API → auth → inventory → paymentExact bottleneck in 30 seconds
Quiz

A trace is propagated across services by which HTTP header (in the W3C standard)?

Quiz

What is the most common production failure of trace propagation?

Order the steps

Order what happens when a request travels through three services with correct propagation:

  1. 1 Client A generates a new trace-id and a span-id, builds the traceparent header
  2. 2 Client A makes an HTTP request to Service B with the traceparent header
  3. 3 Service B extracts the trace-id, creates its own span (new span-id, parent = client's span-id)
  4. 4 Service B calls Service C: builds a fresh traceparent with the same trace-id but its own span-id as new parent-id
  5. 5 Service C extracts the trace-id, creates its own span (parent = B's span-id), does its work
  6. 6 Each service emits its span to the tracing backend; backend stitches by trace-id
  7. 7 Dashboard shows the full tree: A → B → C, each span sharing one trace-id
Complete the analogy

Fill in the blank: the standard HTTP header carrying the trace identifier across services is named _______.

Recall before you leave
  1. 01
    In one paragraph: why is missing trace propagation worse than no tracing at all?
  2. 02
    What three things must every service do when it receives a request with a traceparent header?
  3. 03
    Name the three states of tracing and what each means for debuggability.
Recap

Trace propagation stitches all the work done for one user request across every service into a single navigable trace. The W3C Trace Context standard does this with a 55-byte traceparent HTTP header carrying a 128-bit trace-id that stays constant across every hop. Every service reads the incoming header, creates a child span, and writes a new header before its own outbound calls. Miss any one hop and the trace splits into disconnected single-span orphans — a state that is actively worse than no tracing because dashboards report normal visibility while hiding the real bottleneck from the engineer who needs it most.

Connected lessons
appears again in40
Continue the climb ↑traceparent and tracestate: the W3C header format in full
shortcuts expand
search
K
prev piece
k
next piece
j
cycle tier
t
this menu
?
sources3
expand
  1. 01
  2. 02
  3. 03

Trademarks belong to their respective owners. Editorial reference only.