Observability OBS · 06 · 03

Baggage and async boundaries: carrying context across queues and callbacks

W3C baggage propagates key-value pairs alongside the trace, but its unbounded size and semi-public visibility make it a PII trap. Async boundaries — setTimeout, Kafka, workers — silently drop context unless bound with context.with() or message-level inject/extract.

OBS Middle ◷ 14 min

Level

FoundationsJuniorMiddleSenior

A developer writes setTimeout(() => doWork(), 100) inside a request handler. The deferred work runs fine, emits spans, and shows up in the tracing dashboard — as a completely separate trace with no connection to the request that triggered it.

Baggage: arbitrary key-values across the journey

Baggage is a second header (baggage) defined by W3C, propagated identically to traceparent. It carries free-form key-value pairs: tenant-id, region, feature-flag selection, A/B test cohort. Baggage is read by downstream services and can be attached as span attributes or used by code (for example, to enforce per-tenant quotas).

The catch: baggage is unbounded by spec, sized only by the carrier protocol. HTTP header limits (typically 8–64 KB per header field, server-configurable) are the practical cap.

Two real risks:

Latency bloat. Baggage is propagated on every request hop, every queue message, every async boundary. Each hop pays the cost twice: serialise on outbound, parse on inbound. A 16-KB baggage payload on 10k req/s of internal traffic means roughly 160 MB/s of pure baggage transport plus parsing time at every hop.
PII leakage. Baggage flows to every downstream service including third-party integrations and SaaS APIs that might log or persist headers — often in less-audited contexts than the source database. Anything in baggage is effectively semi-public within and beyond your engineering org.

Discipline: baggage holds operational tags, never identity data.

What to put in baggage: tenant-id (opaque token), region, feature-flag state, A/B cohort name, request-source label. What never to put in baggage: user emails, credit-card or payment tokens, API keys, auth tokens, customer PII, internal secrets.

Safe in baggage	Never in baggage
tenant-id (opaque token)	user email or full user-id
region / availability-zone	credit-card or payment token
feature-flag state (e.g. `checkout-v2=true`)	API keys or auth tokens
A/B cohort name	any customer PII

Async boundaries: the killer

Why does this matter so much? Because your service may look perfectly instrumented — every HTTP handler has spans, the dashboard shows traces — yet 30% of your traces are orphans from deferred work that nobody audited. HTTP propagation is the easy case — OTel auto-instrumentation handles it for supported HTTP servers and clients. The hard cases are queues, timers, callbacks.

The Node.js setTimeout trap. The in-process “current context” in Node.js flows through the call stack automatically, but setTimeout, setImmediate, and queueMicrotask schedule work to run later in a different call stack. When the callback fires, the original request’s context is gone. OTel creates a new root span (a fresh trace-id) for the deferred work. The result: a trace that ends abruptly at the setTimeout call site, and a separate orphan trace for the deferred work.

The fix: wrap the callback with context.bind(ctx, fn) or run it inside context.with(ctx, fn):

// BROKEN: traceparent lost across the setTimeout
app.post('/checkout', async (req, res) => {
  setTimeout(() => {
    doDeferredWork(); // new trace; not linked to request
  }, 100);
  res.status(202).end();
});

// FIXED: bind context across the boundary
app.post('/checkout', async (req, res) => {
  const ctx = context.active();
  setTimeout(context.bind(ctx, () => {
    doDeferredWork(); // now sees the request's trace-id
  }), 100);
  res.status(202).end();
});

context.bind(ctx, fn) returns a wrapped function that restores the captured context when invoked. The same pattern applies to setImmediate, queueMicrotask, callbacks passed to third-party libraries, and any custom thread or worker abstraction.

Kafka and message-queue propagation

When work crosses a process boundary via a message queue (Kafka, RabbitMQ, SQS, Cloud Pub/Sub), the in-process context cannot carry across. The trace-id must be written into the message’s wire format and read back on the other side:

Producer side: call propagator.inject(context, message.headers) before publishing. This writes traceparent (and baggage) into the message’s header map.
Consumer side: call context = propagator.extract(message.headers) after polling. This restores the trace-id, allowing the consumer to create a child span linked to the producer’s span.

OTel’s Kafka instrumentation handles both sides automatically — but only if the instrumentation library is loaded and configured.

The critical distinction: context.bind keeps in-process context alive across an async boundary in the producer, but it does not write traceparent into the message. Kafka lives in a separate process. The consumer’s process has no shared memory with the producer; the only way to carry the trace-id is to inject it into the message headers before publishing.

Same goal, different boundary: context.bind keeps context across in-process callbacks, while inject/extract is the only thing that carries the trace-id across a process boundary into a queue.

Baggage rides with the context through sync calls automatically. At an async hop (queue, timer, worker) the context is lost unless explicitly captured (bind/inject) and restored (with/extract) — only then does the deferred work keep the baggage.

Fix a broken setTimeout propagation in Node.js

1/3

Quiz

A service places a user's email address in the baggage header to pass it to downstream services. What is the risk?

Quiz

A Node.js service uses OTel auto-instrumentation and Kafka. Traces arriving from Kafka consumers are disconnected orphans. What is the most likely cause?

Recall before you leave

01
Explain why baggage size discipline matters in production, and what should never go into baggage.
02
What is the difference between context.bind (in-process) and inject/extract (cross-process) for trace propagation?
03
Name four async boundaries in Node.js that require explicit context propagation and explain the fix for each.

Recap

The W3C baggage header propagates operational key-values (tenant-id, feature flags, A/B cohort) alongside the trace, but every hop pays serialise and parse costs, and the header flows to every downstream including third-party integrations. PII or secrets in baggage are semi-public within and beyond the engineering org. Async boundaries are the leading cause of split traces in production: setTimeout, setImmediate, and queueMicrotask fire in a different call stack with no OTel context; fix them with context.bind or context.with. Queue consumers (Kafka, RabbitMQ, SQS) cross a process boundary — the fix is propagator.inject() into message headers at the producer and propagator.extract() at the consumer; context.bind alone is not enough. Now when you review a PR that adds a setTimeout, a worker, or a Kafka publish, you know to ask: where is the context.bind or the inject? If neither is there, the deferred work will become an orphan trace the moment it ships.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 5 done

Connected lessons

builds on

traceparent and tracestate: the W3C header format in fullmiddle

unlocks

Head sampling and tail sampling: deciding which traces survivemiddle

deepens into

appears again in40

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.