awesome-everything RU
↑ Back to the climb

Observability

Baggage and async boundaries: carrying context across queues and callbacks

Crux W3C baggage propagates key-value pairs alongside the trace, but its unbounded size and semi-public visibility make it a PII trap. Async boundaries — setTimeout, Kafka, workers — silently drop context unless bound with context.with() or message-level inject/extract.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at middle altitude — in the sky
◷ 14 min

A developer writes setTimeout(() => doWork(), 100) inside a request handler. The deferred work runs fine, emits spans, and shows up in the tracing dashboard — as a completely separate trace with no connection to the request that triggered it.

Baggage: arbitrary key-values across the journey

Baggage is a second header (baggage) defined by W3C, propagated identically to traceparent. It carries free-form key-value pairs: tenant-id, region, feature-flag selection, A/B test cohort. Baggage is read by downstream services and can be attached as span attributes or used by code (for example, to enforce per-tenant quotas).

The catch: baggage is unbounded by spec, sized only by the carrier protocol. HTTP header limits (typically 8–64 KB per header field, server-configurable) are the practical cap.

Two real risks:

  1. Latency bloat. Baggage is propagated on every request hop, every queue message, every async boundary. Each hop pays the cost twice: serialise on outbound, parse on inbound. A 16-KB baggage payload on 10k req/s of internal traffic means roughly 160 MB/s of pure baggage transport plus parsing time at every hop.

  2. PII leakage. Baggage flows to every downstream service including third-party integrations and SaaS APIs that might log or persist headers — often in less-audited contexts than the source database. Anything in baggage is effectively semi-public within and beyond your engineering org.

Discipline: baggage holds operational tags, never identity data.

What to put in baggage: tenant-id (opaque token), region, feature-flag state, A/B cohort name, request-source label. What never to put in baggage: user emails, credit-card or payment tokens, API keys, auth tokens, customer PII, internal secrets.

Safe in baggageNever in baggage
tenant-id (opaque token)user email or full user-id
region / availability-zonecredit-card or payment token
feature-flag state (e.g. checkout-v2=true)API keys or auth tokens
A/B cohort nameany customer PII

Async boundaries: the killer

HTTP propagation is the easy case — OTel auto-instrumentation handles it for supported HTTP servers and clients. The hard cases are queues, timers, callbacks.

The Node.js setTimeout trap. The in-process “current context” in Node.js flows through the call stack automatically, but setTimeout, setImmediate, and queueMicrotask schedule work to run later in a different call stack. When the callback fires, the original request’s context is gone. OTel creates a new root span (a fresh trace-id) for the deferred work. The result: a trace that ends abruptly at the setTimeout call site, and a separate orphan trace for the deferred work.

The fix: wrap the callback with context.bind(ctx, fn) or run it inside context.with(ctx, fn):

// BROKEN: traceparent lost across the setTimeout
app.post('/checkout', async (req, res) => {
  setTimeout(() => {
    doDeferredWork(); // new trace; not linked to request
  }, 100);
  res.status(202).end();
});

// FIXED: bind context across the boundary
app.post('/checkout', async (req, res) => {
  const ctx = context.active();
  setTimeout(context.bind(ctx, () => {
    doDeferredWork(); // now sees the request's trace-id
  }), 100);
  res.status(202).end();
});

context.bind(ctx, fn) returns a wrapped function that restores the captured context when invoked. The same pattern applies to setImmediate, queueMicrotask, callbacks passed to third-party libraries, and any custom thread or worker abstraction.

Kafka and message-queue propagation

When work crosses a process boundary via a message queue (Kafka, RabbitMQ, SQS, Cloud Pub/Sub), the in-process context cannot carry across. The trace-id must be written into the message’s wire format and read back on the other side:

  • Producer side: call propagator.inject(context, message.headers) before publishing. This writes traceparent (and baggage) into the message’s header map.
  • Consumer side: call context = propagator.extract(message.headers) after polling. This restores the trace-id, allowing the consumer to create a child span linked to the producer’s span.

OTel’s Kafka instrumentation handles both sides automatically — but only if the instrumentation library is loaded and configured.

The critical distinction: context.bind keeps in-process context alive across an async boundary in the producer, but it does not write traceparent into the message. Kafka lives in a separate process. The consumer’s process has no shared memory with the producer; the only way to carry the trace-id is to inject it into the message headers before publishing.

Fix a broken setTimeout propagation in Node.js

1/3
Quiz

A service places a user's email address in the baggage header to pass it to downstream services. What is the risk?

Quiz

A Node.js service uses OTel auto-instrumentation and Kafka. Traces arriving from Kafka consumers are disconnected orphans. What is the most likely cause?

Recall before you leave
  1. 01
    Explain why baggage size discipline matters in production, and what should never go into baggage.
  2. 02
    What is the difference between context.bind (in-process) and inject/extract (cross-process) for trace propagation?
  3. 03
    Name four async boundaries in Node.js that require explicit context propagation and explain the fix for each.
Recap

The W3C baggage header propagates operational key-values (tenant-id, feature flags, A/B cohort) alongside the trace, but every hop pays serialise and parse costs, and the header flows to every downstream including third-party integrations. PII or secrets in baggage are semi-public within and beyond the engineering org. Async boundaries are the leading cause of split traces in production: setTimeout, setImmediate, and queueMicrotask fire in a different call stack with no OTel context; fix them with context.bind or context.with. Queue consumers (Kafka, RabbitMQ, SQS) cross a process boundary — the fix is propagator.inject() into message headers at the producer and propagator.extract() at the consumer; context.bind alone is not enough.

Connected lessons
appears again in40
Continue the climb ↑Head sampling and tail sampling: deciding which traces survive
shortcuts expand
search
K
prev piece
k
next piece
j
cycle tier
t
this menu
?
sources3
expand
  1. 01
  2. 02
  3. 03

Trademarks belong to their respective owners. Editorial reference only.