Observability OBS · 06 · 04

Head sampling and tail sampling: deciding which traces survive

Head sampling is cheap but blind — it makes the keep/drop decision at trace start without seeing the outcome. Tail sampling sees the whole trace before deciding, catching errors and slow requests that head sampling drops, at the cost of collector RAM and routing discipline.

OBS Middle ◷ 13 min

Level

FoundationsJuniorMiddleSenior

Your service handles 10,000 requests per second. Storing every trace is prohibitively expensive. But if you sample randomly at 1%, the one slow request that triggered the customer complaint has a 99% chance of being silently discarded.

Head-based sampling

The keep/drop decision is made at trace start — encoded in the traceparent trace-flags 01 (sampled) bit. The decision is propagated to all downstream services, which honour it by default.

Common strategies:

Probabilistic: keep N% of traces (1–5% typical). Simple, predictable, scales linearly with traffic.
Rate-limiting: keep at most K traces per second regardless of traffic volume.

Cost: cheap — the decision is made once at the root span, before any work is done. Unsampled requests generate no spans at all, saving CPU, network, and storage.

Drawback: the decision is blind to the outcome. A slow or error-prone request that happens to be in the unsampled 99% is invisible in the tracing backend. If an incident hits 0.5% of traffic and you sample at 1%, you will keep roughly half the incident traces — but you might keep none if the incident is brief.

The sampled flag: when the flag is 01, downstream services record and export their spans. When it is 00, OTel SDKs create spans internally but do not export them by default. This is consistent sampling: either the whole trace is kept or none of it, never a fragment. A downstream service may override the incoming flag (for example, always sample its own errors), but partial overrides produce fragmentary traces that are nearly useless for debugging.

The head sampling decision is encoded once in the traceparent sampled bit and carried with the context downstream, so every service makes the same keep/drop choice — the whole trace is kept (01) or dropped (00) together, never a fragment. Tail sampling instead defers this decision to the collector, after all spans arrive.

Tail-based sampling

The OTel Collector buffers all spans for a trace-id for a configurable decision window (30s–5min), then decides whether to keep the trace based on policies:

Error present → keep 100%.
Duration > threshold → keep 100%.
Specific attribute (e.g. user.tier=premium) → keep 100%.
Probabilistic top-up → keep 1% of the rest for baseline visibility.

Advantages:

Catches every error trace, even at 0.1% traffic rate.
Catches every slow trace above the latency threshold.
Provides the kind of selectivity that makes tail sampling the dominant pattern at high-traffic services.

Cost: the collector must hold every trace’s spans in memory until decision time.

Memory model: active_traces × avg_spans_per_trace × bytes_per_span. At 50,000 in-flight traces × 100 spans × 1 KB per span = 5 GB RAM. The decision window directly controls the RAM footprint.

Load-balancing exporter requirement: with multiple collector replicas, random span distribution scatters a trace’s spans across different instances. Each instance only sees fragments and cannot make a correct keep/drop decision. The solution is a load-balancing exporter that hashes by trace-id and routes all spans for one trace to the same collector instance. This is mandatory for tail sampling to work correctly.

Dimension	Head sampling	Tail sampling
Decision time	At trace start (head)	After all spans complete (tail)
Sees outcome?	No	Yes (error, latency, attributes)
Collector RAM	Minimal	Proportional to active traces × spans × span size
Routing requirement	None (stateless)	Load-balancing exporter (trace-id hash)
Misses error traces?	Yes (at rate = 1 − sample%)	No (if error policy = 100%)

The hybrid pattern (dominant in production)

Head-sample at 100% (every request enters the pipeline), then tail-sample by policy:

Error traces → 100% keep.
Latency > 99th percentile threshold → 100% keep.
Everything else → 1% probabilistic.

Together these three policies mean you never miss the traces that matter most — every error trace and every outlier-latency trace is guaranteed to survive — while the 99% of normal, fast, successful traffic is thinned to 1% for cost. Without the hybrid design you are forced to choose between “store everything” (expensive) and “miss errors” (unreliable).

The hybrid policy keeps every error and every slow outlier (100%) while thinning normal traffic to 1% — selectivity is what buys back tail sampling's RAM cost.

This gives the volume control of head sampling and the selectivity of tail sampling, at the cost of one piece of additional infrastructure: the tail-sampling Collector tier with load-balancing exporter and sufficient RAM.

At 10k req/s with 30s decision window, 10 spans/trace, 1 KB/span: 10,000 × 30 × 10 × 1,024 B = ~3 GB collector RAM. Doable with 4–8 collector replicas.

▸Why this works

The hybrid pattern is why “we need to keep all error traces” and “we can’t afford to store everything” are not mutually exclusive. Head sampling enters every request without committing storage; the tail-sampling tier then applies the 100%-for-errors policy. Engineers who try to solve this with head sampling alone either store everything (expensive) or miss errors (unreliable). The two-tier design resolves both constraints.

Sampling cost reference numbers

Typical head sample rate: 0.5–5% of traces
Typical tail decision window: 30s–5 min
Tail-sampler RAM at 50k traces × 100 spans × 1 KB: ~5 GB
Tail-sampler RAM at 10k req/s, 30s window, 10 spans, 1 KB: ~3 GB
Load-balancing exporter: routing key: trace-id hash
Consistent sampling: trace is kept/dropped: 100% or 0% — never a fragment

Quiz

A team chooses tail-based sampling so they can keep all error traces. What is the operational catch they must plan for?

Quiz

When traceparent arrives with the sampled flag set to `00`, what should the receiving service do by default?

Quiz

A tail-sampling Collector OOMs every few hours. The metrics show trace count is steady but spans-per-trace is growing. What is the likely cause?

Recall before you leave

01
Why does head sampling miss error traces and what is the rate of missing?
02
Explain the load-balancing exporter and why tail sampling breaks without it.
03
Describe the hybrid head-100% + tail-policy pattern and when each tier acts.

Recap

Head sampling makes the keep/drop decision at trace start using the traceparent sampled flag, propagating the decision to all downstream services. It is cheap — unsampled requests generate no spans at all — but blind to outcomes: a 1% head rate drops 99% of error traces alongside 99% of normal ones. Tail sampling buffers all spans in the OTel Collector until the decision window closes (30s–5min), then applies policies: keep all errors, keep all slow traces, keep 1% baseline. The cost is collector RAM (active-traces × spans/trace × bytes/span) and a mandatory load-balancing exporter that routes all spans for one trace to the same collector instance. The hybrid head-100% + tail-policy pattern is the production standard: head at 100% feeds everything into the pipeline; the tail tier decides what to persist. Now when your team debates whether to use head or tail sampling, you know the answer is usually both: head at 100% for volume control, tail policy for selectivity — and a collector RAM budget so the second tier doesn’t OOM under incident traffic.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 5 done

Connected lessons

builds on

Baggage and async boundaries: carrying context across queues and callbacksmiddle

unlocks

Sampling consistency and the tail-sampling Collector tiersenior

deepens into

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.