awesome-everything RU
↑ Back to the climb

Observability

Sampling strategies: head, tail, and parent-based

Crux Head sampling is cheap and predictable; tail sampling is expensive but curated. ParentBased(TraceIdRatioBased(0.2)) at the SDK plus tail sampling on the gateway — 100% errors and slow tails, 1% baseline — is the textbook production combination.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at middle altitude — in the sky
◷ 13 min

At 10k requests per second, recording 100% of traces fills a backend in hours and generates a five-figure monthly bill. But if you drop 99% of traces, you lose exactly the errors and slow requests you need for debugging. Sampling is the tradeoff between cost and fidelity.

Head sampling: decide early, cheaply

The SDK’s Sampler runs on the very first span of a trace (the root span) and decides whether to record the trace. The decision is propagated to downstream services via the W3C traceparent header’s sampled flag.

AlwaysOn — records every trace. Full fidelity, maximum cost.

AlwaysOff — records nothing. Useful for a canary or a service under load test.

TraceIdRatioBased(p) — records a random p-fraction based on a hash of the trace_id. Deterministic across services: if two services both use TraceIdRatioBased(0.1), they independently make the same decision for the same trace because the hash is of the trace_id, which is shared across the request. At p=0.1, 10% of traces are recorded.

ParentBased — defers to the parent span’s sampling decision (via traceparent.sampled). When an upstream service decides to sample, all downstream services follow. When upstream does not sample, downstream typically does not, unless the local sampler overrides. Prevents inconsistent partial traces.

Production combination: ParentBased(root=TraceIdRatioBased(0.2)). The root span samples 20% of traces. All downstream spans inherit the decision. The result: either the full trace is recorded or nothing is — no partial traces with orphaned child spans.

Tail sampling: decide late, expensively

Tail sampling runs in the Collector after all spans of a trace have arrived and the trace is complete (or close enough). It can examine the full trace context: final status code, total latency, presence of error spans anywhere in the tree, business attributes on any span.

Typical tail sampling policies:

  • status_code=ERROR — keep 100% of traces with any ERROR span
  • latency > 1000ms — keep 100% of slow traces
  • probabilistic 1% — keep 1% of everything else as a baseline

The combination covers all interesting traffic (errors, slowness) and provides a cost-bounded baseline for normal traffic.

ApproachDecision timeCan see errors?CostStateful?
Head (TraceIdRatioBased)At root span startNo — decides before error happensLow (SDK, <1 μs)No
Tail (Collector processor)After trace completeYes — sees full trace contextHigh (buffers spans in RAM)Yes — must see all spans
ParentBased headAt each span startNo — same limitation as headLowNo

The sticky-routing constraint

Tail sampling requires that every span of a trace arrive at the same Collector gateway instance. If spans of one trace land on two different instances, neither can evaluate the full trace context and the sampling decision is wrong.

The OTel Collector’s loadbalancing exporter on the agent tier solves this: it hashes by trace_id and routes deterministically to the same gateway pod for all spans of a trace.

Tradeoff: the gateway becomes stateful. A pod restart or scale-up event reshuffles the hash ring and loses in-flight traces. Production mitigations:

  • Pre-warm new pods before they enter the load-balancing ring
  • Use conservative scaling policies (avoid frequent scale-up/down during traffic peaks)
  • Size the gateway’s num_traces buffer for peak rate × decision_wait × safety factor (~2x)

Numbers

  • Tail sampling buffer: ~1-2 GB per 50k active traces (100 spans each at ~1 KB/span)
  • decision_wait: 30-60 seconds (must exceed p99 trace duration)
  • num_traces: size for peak_rate × decision_wait × 2 — at 2,000 traces/sec and 30s window: ~120k
  • Head sampling baseline: 10-20% in production
  • Tail sampling keeps: 100% errors + 100% slow (threshold 1s) + 1-5% baseline
  • Net retained: ~3-5% of total traffic, 100% of interesting traffic
Why this works

Why not tail sample 100% without head sampling? Because tail sampling buffers every span in RAM until the decision_wait window closes. At 10k RPS × 50 spans per request × 30s window, that is 15 million spans in memory — several GB. Head sampling reduces the in-flight buffer proportionally. At 20% head + tail-curated, the buffer is 20% of the full volume, and all errors and slow tails are preserved because the tail sampler always overrides to keep. The combination gives you cost control (head) and fidelity (tail) with a bounded and predictable memory footprint.

Quiz

Why is ParentBased(TraceIdRatioBased(0.2)) the standard head-sampler combination for distributed systems?

Quiz

Tail sampling on the gateway requires all spans of a trace to land on the same Collector instance. What mechanism ensures this in the agent-to-gateway pattern?

Order the steps

Order the events in tail sampling from span creation to keep/drop decision:

  1. 1 Application emits spans to the agent Collector
  2. 2 Agent's loadbalancing exporter hashes trace_id, routes to deterministic gateway replica
  3. 3 Gateway buffers spans in the tail_sampling processor (num_traces buffer, in RAM)
  4. 4 After decision_wait (30-60s), the full trace context is evaluated
  5. 5 Policy check: ERROR status? Latency over 1s? Probabilistic baseline?
  6. 6 Keep or drop — kept traces are forwarded to the exporter; dropped are discarded
Recall before you leave
  1. 01
    What is the key difference between head sampling and tail sampling, and when does each fail?
  2. 02
    Why must tail sampling use sticky routing by trace_id, and what breaks if spans of one trace land on two gateway replicas?
  3. 03
    How do you size the tail_sampling buffer (num_traces), and what happens if you under-size it?
Recap

Sampling is the tradeoff between cost and fidelity. Head sampling (TraceIdRatioBased, AlwaysOn, ParentBased) decides at the root span start — cheap (<1 μs), stateless, but blind to errors and latency. ParentBased ensures all spans of a trace follow the root’s decision, preventing partial traces. Tail sampling in the Collector decides after the full trace arrives — it can keep 100% of ERROR traces and all traces over 1s latency, plus 1% baseline. It requires a stateful in-memory buffer (size for peak_rate × 30s × 2 safety factor, typically 1-2 GB per 50k active traces) and sticky trace_id routing via the loadbalancing exporter. Production pattern: ParentBased(TraceIdRatioBased(0.2)) at the SDK reduces the buffer volume by 80%, while the tail sampler preserves all interesting traffic. Net result: ~3-5% of total volume retained, 100% of interesting traces captured.

Connected lessons
appears again in167
Continue the climb ↑Vendor neutrality, eBPF instrumentation, the Operator, and browser/serverless OTel
shortcuts expand
search
K
prev piece
k
next piece
j
cycle tier
t
this menu
?
sources3
expand
  1. 01
  2. 02
  3. 03

Trademarks belong to their respective owners. Editorial reference only.