awesome-everything RU
↑ Back to the climb

Observability

OTel architecture: one SDK, four signals, one wire format

Crux How OpenTelemetry unifies logs, metrics, traces, and profiles through one collector pipeline — and how the trace-id becomes the join key that lets all four signals answer one query.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at middle altitude — in the sky
◷ 14 min

Before OpenTelemetry, wiring a new service into your observability stack meant four separate SDK installs, four collector configs, and four query languages. Switching vendors meant re-instrumenting everything. OTel collapsed all four signal types into one SDK and one wire protocol. Switching backends is now a config change.

The full stack architecture

An OpenTelemetry-based production stack looks the same in 2026 regardless of vendor.

Application layer. Every service runs one OTel SDK (or OTel auto-instrumentation) that emits four signals:

  • Logs — high-cardinality event records with a trace-id field attached.
  • Metrics — low-cardinality counters and histograms; histogram exemplars carry the trace-id of a representative request.
  • Traces — per-request causal graphs as OTLP span batches.
  • Profiles — function-level CPU and allocation samples; each sample carries the trace-id of the active request.

Every signal carries the same trace-id when applicable, so cross-signal joins are possible at query time.

Collector layer. A local OTel Collector agent (DaemonSet on every node, or sidecar per pod) receives all four signal types via OTLP gRPC (localhost:4317). It batches and ships to a gateway tier. The gateway tier performs:

  • Tail sampling on traces — buffer spans until the trace closes, then keep 100% of errors and slow traces plus a small random baseline.
  • Metric aggregation.
  • Optional PII scrubbing on log fields and profile symbol filtering.
  • Routing signals to one or more backends.

Backend layer. Any combination of: Tempo (traces), Prometheus / Mimir (metrics), Loki (logs), Pyroscope (profiles) for the self-hosted Grafana stack; or Datadog, Honeycomb, New Relic for vendor-managed. OTLP is accepted by every major backend as of 2024–2026. Switching the backend is a collector exporter config change; the application code is unchanged.

SignalCardinalityQuery patternTypical backend
LogsHigh”error contains X”Loki, OpenSearch
MetricsLowDashboards + alertsPrometheus, Mimir
TracesMediumPer-request drillTempo, Jaeger, Honeycomb
ProfilesMediumFlame graph drillPyroscope, Parca

A concrete OTel collector config

The Node checkout service ships all four signals via a single Collector:

# otel-collector.yaml (DaemonSet on every node)
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
processors:
  batch:
    send_batch_size: 512
    timeout: 10s
  tail_sampling:
    policies:
      - { name: errors, type: status_code, status_code: { status_codes: [ERROR] } }
      - { name: slow,   type: latency,     latency: { threshold_ms: 2000 } }
      - { name: rest,   type: probabilistic, probabilistic: { sampling_percentage: 5 } }
exporters:
  otlphttp/tempo:         { endpoint: http://tempo:4318 }
  prometheusremotewrite:  { endpoint: http://prometheus:9090/api/v1/write }
  loki:                   { endpoint: http://loki:3100/loki/api/v1/push }
  otlphttp/pyroscope:     { endpoint: http://pyroscope:4040/ingest }
service:
  pipelines:
    traces:   { receivers: [otlp], processors: [tail_sampling, batch], exporters: [otlphttp/tempo] }
    metrics:  { receivers: [otlp], processors: [batch], exporters: [prometheusremotewrite] }
    logs:     { receivers: [otlp], processors: [batch], exporters: [loki] }
    profiles: { receivers: [otlp], processors: [batch], exporters: [otlphttp/pyroscope] }

Switching from this self-hosted Grafana stack to Datadog means changing the four exporter endpoint URLs. The service code is unchanged.

Why tail sampling requires a separate processor

Head sampling decides at trace start whether to keep the trace. If it keeps only 5% randomly, then 95% of errors and slow traces are discarded — lost exactly when you need them.

Tail sampling buffers all spans in memory until the trace closes (or a decision window of ~10 seconds expires), then decides. The canonical production policy:

  • Keep 100% of traces with any ERROR span.
  • Keep 100% of traces with total duration above the slow threshold (e.g. 2 s).
  • Keep a random 5% baseline of everything else.

This drops trace volume 90%+ while preserving every interesting trace.

Why this works

Head + tail together: the SDK head-samples at 100% (all spans are created and emitted), and the tail-sampling collector processor is the only thing that discards. This avoids the case where a head-sampled-away trace later becomes an error — if you never emitted the spans, you cannot retrieve them after the fact.

The trace-id as load-bearing join key

Trace-id is the identifier that makes the four-signal stack feel like one tool rather than four silos.

  • Logs emit it as a structured field (trace_id: "abc123").
  • Metrics emit it as histogram exemplars (a concrete trace-id attached to a bucket sample).
  • Traces are keyed by it natively.
  • Profiles attach it to each stack sample.

With consistent trace-id propagation (W3C traceparent through every service call), a query for “everything related to this specific request” returns log entries, metric exemplars, the full trace tree, and profile samples — all joined by the same hex string. Without it the four signals are four silos and the engineer manually correlates by timestamp + service name + best guess.

The 128-bit randomness ensures global uniqueness. The W3C standardisation ensures it survives vendor and language transitions. The OTel SDKs handle propagation automatically across HTTP, gRPC, queues, and async calls.

Order the steps

Order the OTel data flow for one HTTP request through one service:

  1. 1 Browser fetch generates traceparent, sends it with the request
  2. 2 Service receives request; OTel SDK extracts traceparent, creates a span
  3. 3 Span attributes populated (http.route, http.response.status_code, custom attrs)
  4. 4 SDK emits the span to the local Collector via OTLP gRPC
  5. 5 Collector batches spans, applies tail sampling, ships to backend
  6. 6 Profile sample fires; SDK attaches current trace-id to the stack sample
  7. 7 Metric counters increment; histogram records duration; exemplar attaches trace-id
Quiz

Which signal carries the trace-id that lets logs, metrics, traces, and profiles join at query time?

Quiz

Why is the tail_sampling processor needed in addition to head sampling at the SDK?

OTel stack: typical production parameters
Trace head + tail sampling typical
5% baseline + 100% errors / slow
Tail sampling decision window
10–30 seconds
Trace volume reduction from tail sampling
90%+
OTel profile signal status (2026)
Release candidate (Q1 2026)
Logs volume reduction (mature orgs, 5 years)
40–60%
Collector DaemonSet model
1 per node, 4 pipelines
Recall before you leave
  1. 01
    Explain why the trace-id is the load-bearing identifier across all four observability signals.
  2. 02
    What is the role of the OTel Collector's tail_sampling processor, and what is the canonical production policy?
  3. 03
    How does switching observability vendors work in an OTel-first setup compared to a pre-OTel setup?
Recap

An OpenTelemetry production stack is three layers: application code emitting four signals via one OTel SDK, a local Collector agent applying tail sampling and routing, and interchangeable backends receiving OTLP. The trace-id propagated by the W3C traceparent header is the join key that lets all four signals answer one query — without it, the stack is four disconnected silos. Tail sampling requires buffering spans until trace close to preserve 100% of errors and slow traces while discarding 90%+ of baseline volume; head sampling alone discards interesting traces before knowing they are interesting. By 2026 the OTel profile signal reached release candidate, completing the four-signal story and making profiling as portable as metrics or traces. Switching backends in an OTel-first setup is a collector config change, not a re-instrumentation project.

Connected lessons
appears again in202
Continue the climb ↑Cost discipline: keeping observability under 5% of infra spend
shortcuts expand
search
K
prev piece
k
next piece
j
cycle tier
t
this menu
?
sources4
expand
  1. 01
  2. 02
  3. 03
  4. 04

Trademarks belong to their respective owners. Editorial reference only.