Observability OBS · 03 · 05

Sampling strategies: head, tail, and parent-based

Head sampling is cheap and predictable; tail sampling is expensive but curated. ParentBased(TraceIdRatioBased(0.2)) at the SDK plus tail sampling on the gateway — 100% errors and slow tails, 1% baseline — is the textbook production combination.

OBS Middle ◷ 13 min

Level

FoundationsJuniorMiddleSenior

At 10k requests per second, recording 100% of traces fills a backend in hours and generates a five-figure monthly bill. But if you drop 99% of traces, you lose exactly the errors and slow requests you need for debugging. Sampling is the tradeoff between cost and fidelity.

Head sampling: decide early, cheaply

By the end of this lesson you will know why a naive “drop 99% of traces” strategy loses every error, and how to configure a sampler that keeps 100% of what matters at a fraction of the storage cost.

The SDK’s Sampler runs on the very first span of a trace (the root span) and decides whether to record the trace. The decision is propagated to downstream services via the W3C traceparent header’s sampled flag.

AlwaysOn — records every trace. Full fidelity, maximum cost.

AlwaysOff — records nothing. Useful for a canary or a service under load test.

TraceIdRatioBased(p) — records a random p-fraction based on a hash of the trace_id. Deterministic across services: if two services both use TraceIdRatioBased(0.1), they independently make the same decision for the same trace because the hash is of the trace_id, which is shared across the request. At p=0.1, 10% of traces are recorded.

ParentBased — defers to the parent span’s sampling decision (via traceparent.sampled). When an upstream service decides to sample, all downstream services follow. When upstream does not sample, downstream typically does not, unless the local sampler overrides. Prevents inconsistent partial traces.

Production combination: ParentBased(root=TraceIdRatioBased(0.2)). The root span samples 20% of traces. All downstream spans inherit the decision. The result: either the full trace is recorded or nothing is — no partial traces with orphaned child spans.

Tail sampling: decide late, expensively

Tail sampling runs in the Collector after all spans of a trace have arrived and the trace is complete (or close enough). It can examine the full trace context: final status code, total latency, presence of error spans anywhere in the tree, business attributes on any span.

Typical tail sampling policies:

status_code=ERROR — keep 100% of traces with any ERROR span
latency > 1000ms — keep 100% of slow traces
probabilistic 1% — keep 1% of everything else as a baseline

The combination covers all interesting traffic (errors, slowness) and provides a cost-bounded baseline for normal traffic.

Approach	Decision time	Can see errors?	Cost	Stateful?
Head (TraceIdRatioBased)	At root span start	No — decides before error happens	Low (SDK, <1 μs)	No
Tail (Collector processor)	After trace complete	Yes — sees full trace context	High (buffers spans in RAM)	Yes — must see all spans
ParentBased head	At each span start	No — same limitation as head	Low	No

The sticky-routing constraint

Tail sampling requires that every span of a trace arrive at the same Collector gateway instance. If spans of one trace land on two different instances, neither can evaluate the full trace context and the sampling decision is wrong.

The OTel Collector’s loadbalancing exporter on the agent tier solves this: it hashes by trace_id and routes deterministically to the same gateway pod for all spans of a trace.

Tradeoff: the gateway becomes stateful. A pod restart or scale-up event reshuffles the hash ring and loses in-flight traces. Production mitigations:

Pre-warm new pods before they enter the load-balancing ring
Use conservative scaling policies (avoid frequent scale-up/down during traffic peaks)
Size the gateway’s num_traces buffer for peak rate × decision_wait × safety factor (~2x)

Numbers

Tail sampling buffer: ~1-2 GB per 50k active traces (100 spans each at ~1 KB/span)
decision_wait: 30-60 seconds (must exceed p99 trace duration)
num_traces: size for peak_rate × decision_wait × 2 — at 2,000 traces/sec and 30s window: ~120k
Head sampling baseline: 10-20% in production
Tail sampling keeps: 100% errors + 100% slow (threshold 1s) + 1-5% baseline
Net retained: ~3-5% of total traffic, 100% of interesting traffic

Volume collapses from 100% to ~3-5%, but fidelity holds: head trims the buffer 80% while tail keeps every error and slow tail.

Together these numbers define the sizing formula for any production tail-sampling setup: num_traces is the one parameter you must right-size first, because underestimating it causes the buffer to overflow during spikes and silently drop the traces you most need to keep.

▸Why this works

Why not tail sample 100% without head sampling? Because tail sampling buffers every span in RAM until the decision_wait window closes. At 10k RPS × 50 spans per request × 30s window, that is 15 million spans in memory — several GB. Head sampling reduces the in-flight buffer proportionally. At 20% head + tail-curated, the buffer is 20% of the full volume, and all errors and slow tails are preserved because the tail sampler always overrides to keep. The combination gives you cost control (head) and fidelity (tail) with a bounded and predictable memory footprint.

Quiz

Why is ParentBased(TraceIdRatioBased(0.2)) the standard head-sampler combination for distributed systems?

Quiz

Tail sampling on the gateway requires all spans of a trace to land on the same Collector instance. What mechanism ensures this in the agent-to-gateway pattern?

Order the steps

Order the events in tail sampling from span creation to keep/drop decision:

1 Application emits spans to the agent Collector
2 Agent's loadbalancing exporter hashes trace_id, routes to deterministic gateway replica
3 Gateway buffers spans in the tail_sampling processor (num_traces buffer, in RAM)
4 After decision_wait (30-60s), the full trace context is evaluated
5 Policy check: ERROR status? Latency over 1s? Probabilistic baseline?
6 Keep or drop — kept traces are forwarded to the exporter; dropped are discarded

Head sampling decides at trace start — cheap, but blind to the outcome. Tail sampling buffers the whole trace and decides only after completion, when errors and latency are visible — costly buffering, curated keep.

Recall before you leave

01
What is the key difference between head sampling and tail sampling, and when does each fail?
02
Why must tail sampling use sticky routing by trace_id, and what breaks if spans of one trace land on two gateway replicas?
03
How do you size the tail_sampling buffer (num_traces), and what happens if you under-size it?

Recap

Sampling is the tradeoff between cost and fidelity. Head sampling (TraceIdRatioBased, AlwaysOn, ParentBased) decides at the root span start — cheap (<1 μs), stateless, but blind to errors and latency. ParentBased ensures all spans of a trace follow the root’s decision, preventing partial traces. Tail sampling in the Collector decides after the full trace arrives — it can keep 100% of ERROR traces and all traces over 1s latency, plus 1% baseline. It requires a stateful in-memory buffer (size for peak_rate × 30s × 2 safety factor, typically 1-2 GB per 50k active traces) and sticky trace_id routing via the loadbalancing exporter. Production pattern: ParentBased(TraceIdRatioBased(0.2)) at the SDK reduces the buffer volume by 80%, while the tail sampler preserves all interesting traffic. Net result: ~3-5% of total volume retained, 100% of interesting traces captured. Now when someone proposes “just drop 99% of traces to cut costs,” you know the right response: combine 20% head sampling with tail-curated keeps — you get the cost savings without losing a single error trace.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 5 done

Connected lessons

builds on

The OTel Collector: receivers, processors, exporters, and deployment patternsmiddle

unlocks

Vendor neutrality, eBPF instrumentation, the Operator, and browser/serverless OTelsenior

deepens into

appears again in170

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.