Observability
traceparent and tracestate: the W3C header format in full
An on-call engineer sees a traceparent value of 99-aaaaaaaa-bbbbbbbb-01 in an HTTP header. Is it valid? Should the service use it or generate a fresh trace? The answer is in the spec, and getting it wrong silently corrupts the backend’s trace store.
The traceparent header bit-by-bit
The W3C Trace Context spec (Recommendation since 2020, Level 2 since 2024) defines a single fixed-width HTTP header. The value is exactly 55 bytes:
00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01Four dash-separated fields:
| Field | Width | Current value | Rules |
|---|---|---|---|
| Version | 2 hex (1 byte) | 00 | ff is reserved and forbidden |
| Trace-id | 32 lowercase hex (16 bytes) | 128-bit random | All-zeros is invalid |
| Parent-id | 16 lowercase hex (8 bytes) | 64-bit span-id of the caller | All-zeros is invalid |
| Trace-flags | 2 hex (1 byte) | 01 = sampled | Only bit 0 (sampled) is currently defined |
The width is deliberately fixed so a router or sidecar can parse the header without a regex engine — just split on dashes and check byte lengths. The compact form means trace context costs about 0.04 KB per HTTP request — measurable but negligible relative to typical request payloads.
| Position in string | Field | Example value | Meaning |
|---|---|---|---|
| 0–1 | version | 00 | Spec version; ff is forbidden |
| 3–34 | trace-id | 4bf92f3577b34da6a3ce929d0e0e4736 | 128-bit random; must not be all-zeros |
| 36–51 | parent-id | 00f067aa0ba902b7 | Caller’s span-id; all-zeros invalid |
| 53–54 | trace-flags | 01 | Bit 0 = sampled; rest reserved |
Trace-id must be uniformly random
The spec is firm: trace-id must be generated as a cryptographically-strong random 128-bit value. It must not encode timestamps, IPs, or user-ids because:
- Downstream tools assume uniform distribution for sampling decisions (consistent hashing relies on this).
- Embedded info leaks operational data.
OpenTelemetry SDKs use the platform’s secure random source: crypto.getRandomValues in browsers, /dev/urandom on Linux, java.security.SecureRandom on JVM. Generating one 128-bit id is sub-microsecond — never a measurable cost. At 128-bit randomness, the collision probability across one trillion traces is on the order of 10⁻¹⁵, effectively zero.
Trace-id uniqueness is the foundation: the tracing backend keys all spans by trace-id, joins them at query time, and if two unrelated requests share an id the backend will silently merge them into one nonsense trace.
Validation and the missing-header fallback
When a service receives a request with no traceparent, or with one that fails validation (wrong version, all-zero trace-id, malformed hex), the receiver must generate a brand-new trace-id and start a fresh trace. The spec explicitly defines this fallback so a single misbehaving caller does not cascade into broken traces.
Validation failures should be logged as a span attribute (invalid_traceparent_received=true) so the trace appears in the backend with a marker, letting operators find broken upstream callers. Production teams treat a non-zero rate of invalid_traceparent spans as a propagation regression.
The 99-aaaaaaaa-bbbbbbbb-01 example: version 99 is unknown (current max is 00); the trace-id and span-id have wrong lengths (8 and 8 hex chars, not 32 and 16). This header is invalid — the receiving service starts a fresh trace.
The tracestate companion
Where traceparent is the universal identifier, tracestate carries vendor-specific extra context: which sampler decided this trace, which canary version emitted it, what extra ids the backend wants.
Format: comma-separated key=value list members, up to 32 members, each key and value up to 256 chars. The spec mandates that vendors propagate at least 512 characters of cumulative tracestate even if they do not understand the keys — this preserves end-to-end tracing across heterogeneous vendor tooling.
Mutation rules:
- A vendor may add a new key (prepend to the list).
- A vendor may update its own key (move to front).
- A vendor must not delete keys belonging to other vendors.
The header is the escape hatch for things that cannot be captured in the rigid traceparent.
The propagator interface
OpenTelemetry generalises the header logic with two operations on a Context object:
inject(context, carrier)— writes the relevant headers into an outgoing request.extract(carrier) -> context— reads them from an incoming request.
Carriers are pluggable (HTTP headers, gRPC metadata, AMQP properties, plain dict) and propagators are pluggable (W3C TraceContext, W3C Baggage, B3 single, B3 multi, Jaeger). The default OTel composite propagator is TraceContext + Baggage. B3 is supported for interop with Zipkin-era systems.
The library handles bit-encoding, validation, and “if no incoming header then generate” logic so application code never touches raw hex.
- traceparent header size
- 55 bytes (fixed)
- trace-id width
- 128 bits / 32 hex chars
- span-id (parent-id) width
- 64 bits / 16 hex chars
- trace-flags currently defined bits
- 1 (sampled flag, bit 0)
- tracestate max list members
- 32
- tracestate min propagated chars per vendor
- 512
- trace-id collision probability at 1 trillion traces
- ~10⁻¹⁵
- Cost per request
- ~0.04 KB header overhead
The traceparent header value is `00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01`. What does the final `01` mean?
An HTTP request arrives at your service with `traceparent: 99-aaaaaaaa-bbbbbbbb-01`. What does your service do per the W3C spec?
A vendor reads a tracestate header containing keys from two other vendors. Which mutation is forbidden by the spec?
- 01List the four fields of traceparent in order and their purpose.
- 02Why must trace-id be uniformly random and why can it never encode timestamps or IPs?
- 03What is the propagator inject/extract interface and what problem does it solve?
The W3C traceparent header carries four fixed-width fields — version, 128-bit trace-id, 64-bit parent-id, and trace-flags — in exactly 55 bytes. The fixed width is a deliberate design choice: any router or sidecar can parse it without a regex engine. The trace-id must be uniformly random, never encoding timestamps or IPs, because consistent samplers and privacy both depend on it. An invalid header (unknown version, wrong field length, all-zero trace-id or parent-id) must be discarded and a fresh trace started, logged as invalid_traceparent_received for regression tracking. The tracestate companion carries vendor-specific extras up to 32 key=value pairs; vendors must propagate others’ keys even without understanding them. The OTel propagator interface (inject/extract) wraps all of this so application code never touches raw hex.
- Sampling consistency and the tail-sampling Collector tiersenior
- Async context per language, service mesh, B3 migration, and securitysenior
- Production propagation failures, span links, and platform designsenior
- Trace propagation: stitch a broken system into one tracesenior
- Trace propagation: multiple-choice reviewsenior
- Trace propagation: code and header readingsenior
- Trace propagation: free-recall reviewsenior
appears again in40
- Federation and lookahead: batching beyond DataLoadermiddle
- Senior GraphQL API: scheduling contract, tenant isolation, observabilitysenior
- Invalidation, dirty bits, and containmiddle
- Compositor layers: promotion, overlap, and GPU memorymiddle
- Production observability: LoAF, INP, and the full attack surfacesenior
- Hidden classes, transition trees, and memory layoutmiddle
- V8 in production: isolates, pointer compression, and real failuressenior
- What workers are and why they existjunior
- Web worker mechanics: dedicated, shared, and OffscreenCanvasmiddle
- Structured clone and transferablesmiddle
- SharedArrayBuffer, Atomics, and cross-origin isolationsenior
- Worker pools, Comlink, and production observabilitysenior
- Eight layers traced: from the service worker to the second navigationmiddle
- Five canonical breaks: where production reliably diessenior
- The three-track method: reading traces and building a monitored systemsenior
- Lock and single-flight: bounding concurrent rebuildsmiddle
- Stale-while-revalidate and CDN request coalescingmiddle
- Detecting stampedes and designing TTL for productionmiddle
- Metastable failure, fencing tokens, and production postmortemssenior
- What a relation is: tables, rows, keys, and constraintsjunior
- Constraints, keys, and Postgres data typesmiddle
- JSONB, arrays, and when a side table winsmiddle
- Schema integrity: deferral, versioning, and production failure modessenior
- Where data fetching happens — and why it decides LCPjunior
- React Server Components and Suspense streamingmiddle
- Senior internals: RSC payload, caching layers, and production failure modessenior
- The IP envelopejunior
- Reading the IP headermiddle
- What TLS does and why it existsjunior
- Key schedule, SNI, ALPN, and extensionssenior
- 0-RTT defenses, ECH, hybrid PQ, and production TLSsenior
- The twelve layers: one URL, seven actorsjunior
- Resilience: cascading retries, circuit breakers, and error budgetssenior
- At-most-once, at-least-once, exactly-once: the three delivery contractsjunior
- Consumer-side dedup: the cheapest path to exactly-once processingmiddle
- Exactly-once in production: impossibility proof, hybrid patterns, and real incidentssenior
- What OAuth is and why passwords are not the answerjunior
- Authorization code flow with PKCEmiddle
- Sender-constrained tokens: DPoP and mTLSsenior
- OAuth in production: audience attacks, observability, and real failuressenior