awesome-everything RU
↑ Back to the climb

Observability

Async context per language, service mesh, B3 migration, and security

Crux Every runtime has a context substrate — Node AsyncLocalStorage, Python contextvars, Java thread-local, Go context.Context — each with distinct failure modes. Mesh propagates headers for HTTP not queues. traceparent in browsers is a privacy risk unless scope-limited.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at senior altitude — in orbit
◷ 16 min

A Java service migrates from thread-pool-based request handling to Project Loom virtual threads. After the migration, 30% of traces are orphans. The OTel SDK version didn’t change. What broke?

Async context propagation: language by language

Each runtime has its own mechanism for “what is the current execution context?” — and OTel hooks into that mechanism. When you cross a primitive the runtime doesn’t carry context through automatically, context is silently lost.

Node.js: AsyncLocalStorage (built into Node 12+) is the substrate. OTel hooks into it for in-process propagation. Pitfalls: setTimeout, setImmediate, queueMicrotask, and any third-party promise wrapping that creates a new AsyncLocalStorage context can lose trace context. OTel auto-instrumentation patches the common ones, but custom libraries break it. Fix: context.bind(ctx, fn) before passing a callback to any async boundary.

Python: contextvars from PEP 567 is the substrate; works automatically for asyncio but not threading without ContextVar.copy(). If you spawn threads manually, the OTel context from the parent thread is not inherited. Fix: pass the current context to the child thread and set it via context.attach(ctx) on entry.

Java: traditional ThreadLocal plus a Span.makeCurrent() try-with-resources block. Project Loom virtual threads are mostly transparent for simple cases, but require careful Scope nesting — if a virtual thread is created inside a Scope, the scope must outlive the virtual thread, otherwise the context is closed while the thread is still running. Fix: structure virtual-thread creation to happen inside the scope, not outside it.

Go: explicit context.Context plumbing — every function takes ctx, every span lives in ctx. Go made the right architectural choice early; context never flows implicitly. The failure mode in Go is not losing context accidentally, but forgetting to thread ctx through a function call chain. Fix: pass ctx everywhere; use go vet and staticcheck to flag missing context parameters.

Browsers: zone.js (Angular’s solution for patching async primitives), or the TC39 AsyncContext proposal; OTel-JS supports both via plugins. Service workers and Web Workers require explicit context pass-through.

RuntimeContext substrateCommon failure modeFix
Node.jsAsyncLocalStoragesetTimeout / custom async wrapperscontext.bind(ctx, fn)
Pythoncontextvars (PEP 567)Manual threading bypasses asyncioContextVar.copy() on thread spawn
JavaThreadLocal + ScopeLoom virtual threads outlive ScopeCreate virtual thread inside Scope; use context-aware executor
GoExplicit context.Contextctx not threaded through a functionPass ctx everywhere; vet/staticcheck

B3 vs W3C: the migration story and safe sequence

Before W3C Trace Context, Twitter’s Zipkin/Brave used B3, with two variants: B3 multi-header (X-B3-TraceId, X-B3-SpanId, X-B3-Sampled as separate headers) and B3 single-header (all three combined in one). B3’s original trace-id was 64-bit; it was extended to 128-bit for W3C compatibility.

Safe migration sequence:

  1. Audit: identify every service still emitting B3-only headers.
  2. Deploy composite propagator everywhere: register W3C TraceContext + B3 multi + B3 single at every service. Write outbound W3C only; extract inbound from both. This is the “read-both, write-W3C” phase.
  3. Verify: confirm orphan-span rate doesn’t regress.
  4. Phase out B3 outbound: after downstream services are confirmed to read W3C, disable B3 outbound at each upstream.
  5. Remove B3 extractor: after a quarter at zero B3 inbound, replace the composite propagator with W3C-only.

What goes wrong if steps are skipped:

  • Skipping step 2: upstream sends W3C while downstream only reads B3 → traces split.
  • Skipping step 3: a propagation regression goes silent for weeks.
  • Skipping step 5: double header bytes per request indefinitely.

Trace context across service mesh

Envoy, Linkerd, Istio, Cilium data planes participate in tracing two ways:

  1. Pass-through (always): the sidecar forwards traceparent/tracestate/baggage headers on every HTTP and gRPC request transparently.
  2. Emit their own spans (optional but recommended): when enabled, the sidecar creates a span for the network hop, showing sidecar latency, connection pooling, and TLS handshake timing distinct from application latency.

Configuration: the mesh proxy needs the tracing-collector address and the sampling decision (usually inherit the incoming flag). The mesh’s sampling decision must agree with the application’s; mismatched rates produce inconsistent traces.

The limit: service mesh only handles HTTP and gRPC. Queue consumers, timers, and fire-and-forget callbacks still require explicit application-level propagation. The mesh is not a substitute for OTel SDK instrumentation; it adds a network-hop span, it does not replace application spans.

Why this works

When the mesh emits its own span, you gain a three-span view of one HTTP call: client app, mesh sidecar, server app. This lets you distinguish “the application was slow” from “the mesh was slow” — a critical distinction during incidents involving sidecar upgrades, connection pool exhaustion, or mTLS certificate renewal storms.

Security: the trace-id as a tracking identifier

Trace-ids are unique per request, 128 bits of entropy, propagated in HTTP headers and visible to anyone who can inspect traffic between the client and the origin. This makes them powerful debugging tools and equally powerful potential trackers.

The risk: if a third party (a CDN, a marketing pixel, a CSP-allowed analytics service) can read the traceparent header from the user’s outbound requests, it can correlate user activity across sites that share the same tracing infrastructure.

Mitigations:

  • The W3C spec recommends that user-facing services do not propagate traceparent in responses (responses are to the user, not part of an upstream call).
  • Browser-side OTel SDKs should limit propagation to same-origin and explicitly-allowed CORS origins (TraceContextPropagator.allowedOrigins list).
  • Production teams maintain an allowlist of downstream hostnames that receive traceparent and audit it quarterly.
  • Baggage applies identically — anything in baggage is observable by every downstream including third parties.
Senior-tier trace propagation numbers
Default OTel propagator
TraceContext + Baggage composite
B3 single-header trace-id width (original)
64 bits (later extended to 128)
Service-mesh sidecar tracing overhead
~1–2% extra CPU
Per-request header bytes (traceparent + tracestate small)
~80–200 bytes
W3C Trace Context Level 1
Recommendation 2020-02
W3C Trace Context Level 2
Recommendation 2024
Quiz

A team migrates from B3 to W3C propagation. They deploy W3C-write on upstream services before deploying W3C-read on downstream services. What happens?

Quiz

A service mesh (Envoy) is configured to propagate traceparent and emit its own mesh-hop spans. After enabling it, the team still sees orphan traces for some Kafka consumers. Why?

Recall before you leave
  1. 01
    A Java service migrates to Project Loom virtual threads and orphan-span rate spikes to 30%. Diagnose and fix.
  2. 02
    Describe the safe 5-step sequence for migrating from B3 to W3C TraceContext and what breaks if you skip step 2.
  3. 03
    What is the traceparent privacy risk in browser applications and what are the three mitigations?
Recap

Context propagation in each runtime hooks into a different substrate: Node’s AsyncLocalStorage, Python’s contextvars, Java’s ThreadLocal plus Scope, Go’s explicit context.Context. Each has its own failure mode when you cross a primitive the runtime doesn’t auto-carry — the fix is always explicit context binding at that boundary. B3 migration to W3C requires read-both before write-W3C, verified by orphan-rate monitoring. Service mesh passes traceparent for HTTP/gRPC transparently and can emit mesh-hop spans, but does not instrument queue consumers — those still require application-level inject/extract. The traceparent header in browser requests is a tracking vector if propagated to cross-origin third parties; restrict it to same-origin and explicitly-allowed CORS origins.

Connected lessons
appears again in40
Continue the climb ↑Production propagation failures, span links, and platform design
shortcuts expand
search
K
prev piece
k
next piece
j
cycle tier
t
this menu
?
sources3
expand
  1. 01
  2. 02
  3. 03

Trademarks belong to their respective owners. Editorial reference only.