Observability OBS · 08 · 02

OTel architecture: one SDK, four signals, one wire format

How OpenTelemetry unifies logs, metrics, traces, and profiles through one collector pipeline — and how the trace-id becomes the join key that lets all four signals answer one query.

OBS Middle ◷ 14 min

Level

FoundationsJuniorMiddleSenior

Before OpenTelemetry, wiring a new service into your observability stack meant four separate SDK installs, four collector configs, and four query languages. Switching vendors meant re-instrumenting everything. OTel collapsed all four signal types into one SDK and one wire protocol. Switching backends is now a config change. By the end of this lesson you will understand exactly how that pipeline is wired and why the trace-id is the pin that holds all four signals together.

The full stack architecture

An OpenTelemetry-based production stack looks the same in 2026 regardless of vendor.

Application layer. Every service runs one OTel SDK (or OTel auto-instrumentation) that emits four signals:

Logs — high-cardinality event records with a trace-id field attached.
Metrics — low-cardinality counters and histograms; histogram exemplars carry the trace-id of a representative request.
Traces — per-request causal graphs as OTLP span batches.
Profiles — function-level CPU and allocation samples; each sample carries the trace-id of the active request.

Every signal carries the same trace-id when applicable, so cross-signal joins are possible at query time.

Collector layer. A local OTel Collector agent (DaemonSet on every node, or sidecar per pod) receives all four signal types via OTLP gRPC (localhost:4317). It batches and ships to a gateway tier. The gateway tier performs:

Tail sampling on traces — buffer spans until the trace closes, then keep 100% of errors and slow traces plus a small random baseline.
Metric aggregation.
Optional PII scrubbing on log fields and profile symbol filtering.
Routing signals to one or more backends.

Backend layer. Any combination of: Tempo (traces), Prometheus / Mimir (metrics), Loki (logs), Pyroscope (profiles) for the self-hosted Grafana stack; or Datadog, Honeycomb, New Relic for vendor-managed. OTLP is accepted by every major backend as of 2024–2026. Switching the backend is a collector exporter config change; the application code is unchanged.

Signal	Cardinality	Query pattern	Typical backend
Logs	High	”error contains X”	Loki, OpenSearch
Metrics	Low	Dashboards + alerts	Prometheus, Mimir
Traces	Medium	Per-request drill	Tempo, Jaeger, Honeycomb
Profiles	Medium	Flame graph drill	Pyroscope, Parca

The app emits all four signals — each tagged with the same trace-id — through one SDK into one Collector, which fans out to specialized backends. The four signals stay correlated by trace_id and resource; swapping a backend is a Collector exporter change.

A concrete OTel collector config

The Node checkout service ships all four signals via a single Collector:

# otel-collector.yaml (DaemonSet on every node)
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
processors:
  batch:
    send_batch_size: 512
    timeout: 10s
  tail_sampling:
    policies:
      - { name: errors, type: status_code, status_code: { status_codes: [ERROR] } }
      - { name: slow,   type: latency,     latency: { threshold_ms: 2000 } }
      - { name: rest,   type: probabilistic, probabilistic: { sampling_percentage: 5 } }
exporters:
  otlphttp/tempo:         { endpoint: http://tempo:4318 }
  prometheusremotewrite:  { endpoint: http://prometheus:9090/api/v1/write }
  loki:                   { endpoint: http://loki:3100/loki/api/v1/push }
  otlphttp/pyroscope:     { endpoint: http://pyroscope:4040/ingest }
service:
  pipelines:
    traces:   { receivers: [otlp], processors: [tail_sampling, batch], exporters: [otlphttp/tempo] }
    metrics:  { receivers: [otlp], processors: [batch], exporters: [prometheusremotewrite] }
    logs:     { receivers: [otlp], processors: [batch], exporters: [loki] }
    profiles: { receivers: [otlp], processors: [batch], exporters: [otlphttp/pyroscope] }

Switching from this self-hosted Grafana stack to Datadog means changing the four exporter endpoint URLs. The service code is unchanged.

Why tail sampling requires a separate processor

Head sampling decides at trace start whether to keep the trace. If it keeps only 5% randomly, then 95% of errors and slow traces are discarded — lost exactly when you need them.

Tail sampling buffers all spans in memory until the trace closes (or a decision window of ~10 seconds expires), then decides. The canonical production policy:

Keep 100% of traces with any ERROR span.
Keep 100% of traces with total duration above the slow threshold (e.g. 2 s).
Keep a random 5% baseline of everything else.

This drops trace volume 90%+ while preserving every interesting trace.

▸Why this works

Head + tail together: the SDK head-samples at 100% (all spans are created and emitted), and the tail-sampling collector processor is the only thing that discards. This avoids the case where a head-sampled-away trace later becomes an error — if you never emitted the spans, you cannot retrieve them after the fact.

The trace-id as load-bearing join key

When you look at a slow-request alert and want to jump straight to the offending code, what connects the metric, the trace, the log, and the flame graph? The answer is a single 128-bit hex string.

Trace-id is the identifier that makes the four-signal stack feel like one tool rather than four silos.

Logs emit it as a structured field (trace_id: "abc123").
Metrics emit it as histogram exemplars (a concrete trace-id attached to a bucket sample).
Traces are keyed by it natively.
Profiles attach it to each stack sample.

With consistent trace-id propagation (W3C traceparent through every service call), a query for “everything related to this specific request” returns log entries, metric exemplars, the full trace tree, and profile samples — all joined by the same hex string. Without it the four signals are four silos and the engineer manually correlates by timestamp + service name + best guess.

One 128-bit hex string is attached to all four signals — as a log field, a metric exemplar, the native trace key, and a profile stack sample — so a single query reaches every signal for one request.

The 128-bit randomness ensures global uniqueness. The W3C standardisation ensures it survives vendor and language transitions. The OTel SDKs handle propagation automatically across HTTP, gRPC, queues, and async calls.

Order the steps

Order the OTel data flow for one HTTP request through one service:

1 Browser fetch generates traceparent, sends it with the request
2 Service receives request; OTel SDK extracts traceparent, creates a span
3 Span attributes populated (http.route, http.response.status_code, custom attrs)
4 SDK emits the span to the local Collector via OTLP gRPC
5 Collector batches spans, applies tail sampling, ships to backend
6 Profile sample fires; SDK attaches current trace-id to the stack sample
7 Metric counters increment; histogram records duration; exemplar attaches trace-id

Quiz

Which signal carries the trace-id that lets logs, metrics, traces, and profiles join at query time?

Quiz

Why is the tail_sampling processor needed in addition to head sampling at the SDK?

OTel stack: typical production parameters

Trace head + tail sampling typical: 5% baseline + 100% errors / slow
Tail sampling decision window: 10–30 seconds
Trace volume reduction from tail sampling: 90%+
OTel profile signal status (2026): Release candidate (Q1 2026)
Logs volume reduction (mature orgs, 5 years): 40–60%
Collector DaemonSet model: 1 per node, 4 pipelines

Recall before you leave

01
Explain why the trace-id is the load-bearing identifier across all four observability signals.
02
What is the role of the OTel Collector's tail_sampling processor, and what is the canonical production policy?
03
How does switching observability vendors work in an OTel-first setup compared to a pre-OTel setup?

Recap

An OpenTelemetry production stack is three layers: application code emitting four signals via one OTel SDK, a local Collector agent applying tail sampling and routing, and interchangeable backends receiving OTLP. The trace-id propagated by the W3C traceparent header is the join key that lets all four signals answer one query — without it, the stack is four disconnected silos. Tail sampling requires buffering spans until trace close to preserve 100% of errors and slow traces while discarding 90%+ of baseline volume; head sampling alone discards interesting traces before knowing they are interesting. By 2026 the OTel profile signal reached release candidate, completing the four-signal story and making profiling as portable as metrics or traces. Switching backends in an OTel-first setup is a collector config change, not a re-instrumentation project. Now when you instrument a new service, your first question should be: “Is the trace-id flowing through every signal?” — if it is not, you have four silos that look like one stack.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 5 done

Connected lessons

builds on

unlocks

Cost discipline: keeping observability under 5% of infra spendmiddle

deepens into

Cost discipline: keeping observability under 5% of infra spendmiddle

appears again in205

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.