Observability
Head sampling and tail sampling: deciding which traces survive
Your service handles 10,000 requests per second. Storing every trace is prohibitively expensive. But if you sample randomly at 1%, the one slow request that triggered the customer complaint has a 99% chance of being silently discarded.
Head-based sampling
The keep/drop decision is made at trace start — encoded in the traceparent trace-flags 01 (sampled) bit. The decision is propagated to all downstream services, which honour it by default.
Common strategies:
- Probabilistic: keep N% of traces (1–5% typical). Simple, predictable, scales linearly with traffic.
- Rate-limiting: keep at most K traces per second regardless of traffic volume.
Cost: cheap — the decision is made once at the root span, before any work is done. Unsampled requests generate no spans at all, saving CPU, network, and storage.
Drawback: the decision is blind to the outcome. A slow or error-prone request that happens to be in the unsampled 99% is invisible in the tracing backend. If an incident hits 0.5% of traffic and you sample at 1%, you will keep roughly half the incident traces — but you might keep none if the incident is brief.
The sampled flag: when the flag is 01, downstream services record and export their spans. When it is 00, OTel SDKs create spans internally but do not export them by default. This is consistent sampling: either the whole trace is kept or none of it, never a fragment. A downstream service may override the incoming flag (for example, always sample its own errors), but partial overrides produce fragmentary traces that are nearly useless for debugging.
Tail-based sampling
The OTel Collector buffers all spans for a trace-id for a configurable decision window (30s–5min), then decides whether to keep the trace based on policies:
- Error present → keep 100%.
- Duration > threshold → keep 100%.
- Specific attribute (e.g.
user.tier=premium) → keep 100%. - Probabilistic top-up → keep 1% of the rest for baseline visibility.
Advantages:
- Catches every error trace, even at 0.1% traffic rate.
- Catches every slow trace above the latency threshold.
- Provides the kind of selectivity that makes tail sampling the dominant pattern at high-traffic services.
Cost: the collector must hold every trace’s spans in memory until decision time.
Memory model: active_traces × avg_spans_per_trace × bytes_per_span. At 50,000 in-flight traces × 100 spans × 1 KB per span = 5 GB RAM. The decision window directly controls the RAM footprint.
Load-balancing exporter requirement: with multiple collector replicas, random span distribution scatters a trace’s spans across different instances. Each instance only sees fragments and cannot make a correct keep/drop decision. The solution is a load-balancing exporter that hashes by trace-id and routes all spans for one trace to the same collector instance. This is mandatory for tail sampling to work correctly.
| Dimension | Head sampling | Tail sampling |
|---|---|---|
| Decision time | At trace start (head) | After all spans complete (tail) |
| Sees outcome? | No | Yes (error, latency, attributes) |
| Collector RAM | Minimal | Proportional to active traces × spans × span size |
| Routing requirement | None (stateless) | Load-balancing exporter (trace-id hash) |
| Misses error traces? | Yes (at rate = 1 − sample%) | No (if error policy = 100%) |
The hybrid pattern (dominant in production)
Head-sample at 100% (every request enters the pipeline), then tail-sample by policy:
- Error traces → 100% keep.
- Latency > 99th percentile threshold → 100% keep.
- Everything else → 1% probabilistic.
This gives the volume control of head sampling and the selectivity of tail sampling, at the cost of one piece of additional infrastructure: the tail-sampling Collector tier with load-balancing exporter and sufficient RAM.
At 10k req/s with 30s decision window, 10 spans/trace, 1 KB/span:
10,000 × 30 × 10 × 1,024 B = ~3 GB collector RAM. Doable with 4–8 collector replicas.
Why this works
The hybrid pattern is why “we need to keep all error traces” and “we can’t afford to store everything” are not mutually exclusive. Head sampling enters every request without committing storage; the tail-sampling tier then applies the 100%-for-errors policy. Engineers who try to solve this with head sampling alone either store everything (expensive) or miss errors (unreliable). The two-tier design resolves both constraints.
- Typical head sample rate
- 0.5–5% of traces
- Typical tail decision window
- 30s–5 min
- Tail-sampler RAM at 50k traces × 100 spans × 1 KB
- ~5 GB
- Tail-sampler RAM at 10k req/s, 30s window, 10 spans, 1 KB
- ~3 GB
- Load-balancing exporter: routing key
- trace-id hash
- Consistent sampling: trace is kept/dropped
- 100% or 0% — never a fragment
A team chooses tail-based sampling so they can keep all error traces. What is the operational catch they must plan for?
When traceparent arrives with the sampled flag set to `00`, what should the receiving service do by default?
A tail-sampling Collector OOMs every few hours. The metrics show trace count is steady but spans-per-trace is growing. What is the likely cause?
- 01Why does head sampling miss error traces and what is the rate of missing?
- 02Explain the load-balancing exporter and why tail sampling breaks without it.
- 03Describe the hybrid head-100% + tail-policy pattern and when each tier acts.
Head sampling makes the keep/drop decision at trace start using the traceparent sampled flag, propagating the decision to all downstream services. It is cheap — unsampled requests generate no spans at all — but blind to outcomes: a 1% head rate drops 99% of error traces alongside 99% of normal ones. Tail sampling buffers all spans in the OTel Collector until the decision window closes (30s–5min), then applies policies: keep all errors, keep all slow traces, keep 1% baseline. The cost is collector RAM (active-traces × spans/trace × bytes/span) and a mandatory load-balancing exporter that routes all spans for one trace to the same collector instance. The hybrid head-100% + tail-policy pattern is the production standard: head at 100% feeds everything into the pipeline; the tail tier decides what to persist.
- Sampling consistency and the tail-sampling Collector tiersenior
- Async context per language, service mesh, B3 migration, and securitysenior
- Production propagation failures, span links, and platform designsenior
- Trace propagation: stitch a broken system into one tracesenior
- Trace propagation: multiple-choice reviewsenior
- Trace propagation: code and header readingsenior
- Trace propagation: free-recall reviewsenior