Observability OBS · 03 · 04

The OTel Collector: receivers, processors, exporters, and deployment patterns

The Collector is a YAML-configured pipeline — receivers accept telemetry, processors transform it in-flight, exporters ship it to backends. Three deployment patterns dominate: agent, gateway, and agent-to-gateway.

OBS Middle ◷ 13 min

Level

FoundationsJuniorMiddleSenior

A new backend requirement lands: route traces to vendor-A, logs to vendor-B, and redact PII from both, starting next Monday. Without a Collector, that is three separate application deploys across 50 teams. With a Collector, it is one YAML change and one Collector restart.

Receivers, processors, exporters

After this lesson, when a new routing requirement lands on Monday morning, you will know exactly which YAML block to touch — and which 50 teams you do not have to coordinate with.

The Collector is a YAML-configured pipeline with three stages:

Receivers accept incoming telemetry:

otlp — gRPC (port 4317) and HTTP (port 4318), the primary receiver for OTel-instrumented services
filelog — tails log files from the filesystem (useful for legacy apps that write to stdout/files)
prometheus — scrapes /metrics endpoints (bridges Prometheus exporters into OTel)
kafka — reads telemetry from Kafka topics
Vendor-specific receivers for non-OTel data

Processors transform records in-flight:

memory_limiter — drops new records when the Collector is above a RAM threshold; prevents OOM by design
batch — groups records into efficient batches (fewer network round trips to exporters)
resource — adds service.name and other Resource attributes (e.g., inject deployment.environment=production)
attributes — redacts, renames, adds fields; used for PII scrubbing and Semantic Convention enforcement
tail_sampling — keeps/drops traces based on the full trace context (errors, latency, business criteria) — covered in the next lesson
transform — general-purpose transformations via OTTL (OpenTelemetry Transformation Language)
k8sattributes — enriches spans/logs with Kubernetes pod, namespace, node, and deployment metadata

Together, these processors mean you can enforce redaction, cap memory usage, and route by business criteria without touching a single service’s code. Skip memory_limiter and the Collector OOMs under any traffic spike.

Exporters ship records to backends:

otlp — to another OTel-aware backend (another Collector, Grafana Tempo, Jaeger, etc.)
datadog — vendor-specific
prometheusremotewrite — to Prometheus or Grafana Mimir
loki — to Grafana Loki for logs
Vendor-specific exporters for New Relic, Honeycomb, Splunk, Elastic, etc.

Each stage is a menu you compose in YAML: pick receivers, chain processors in order, fan out to exporters — changing routing, redaction, or sampling without touching application code.

A minimal production-grade Collector YAML — one pipeline for traces:

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  memory_limiter:
    check_interval: 1s
    limit_percentage: 80
    spike_limit_percentage: 25
  batch:
    timeout: 10s
    send_batch_size: 512
  attributes:
    actions:
      - key: user.email
        action: delete

exporters:
  otlp/tempo:
    endpoint: tempo:4317
    tls:
      insecure: true

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, batch, attributes]
      exporters: [otlp/tempo]

This accepts OTLP on 4317/4318, caps Collector RAM at 80% (drops new records before OOM), batches spans (10 s max wait, 512 max batch), redacts user.email from every span, and exports OTLP to a Tempo backend.

Three deployment patterns

Agent — a Collector runs as a sidecar (one per pod) or DaemonSet (one per node). Receives telemetry from the local application, does minimal processing (resource enrichment, batching), exports directly to the backend. Cheapest in network hops, hardest to run tail sampling (each agent sees only one node’s traffic).

Gateway — applications export directly to a central pool of Collector replicas that does heavy processing (tail sampling, redaction, multi-backend routing). Easier to scale processing centrally, but every span crosses the network from the app to the gateway.

Agent-to-gateway (the production-canonical pattern) — a DaemonSet agent on every node does minimal local processing and forwards via OTLP to a centralised gateway pool that handles tail sampling, redaction, and routing. The agent adds Kubernetes metadata (pod, namespace, node) via the k8sattributes processor before forwarding.

Pattern	Agent location	Tail sampling possible?	Best for
Agent only	DaemonSet per node	No — each agent sees only one node	Simple setups, head sampling only
Gateway only	Central pool	Yes — sees all traffic	Small fleets, simple topologies
Agent-to-gateway	DaemonSet + central gateway	Yes — gateway sees full traces via sticky routing	Production Kubernetes, all sizes

In the agent-to-gateway pattern, the agent’s loadbalancing exporter routes by trace_id hash to ensure all spans of a trace land on the same gateway replica — necessary for tail sampling, covered in the next lesson.

Telemetry flows left to right: receivers accept it, an ordered chain of processors (batch, filter, redact, tail-sample) transforms it in-flight, and exporters fan it out to one or more backends — all configured in YAML, not application code.

▸Why this works

Why does the Collector deserve to be a separate process from the application, even though the SDK could export to backends directly? Three reasons. (1) Buffering: when the backend slows or fails, the Collector buffers (in memory and on disk) so the application is not blocked. A direct SDK export would either drop telemetry (data loss) or back up into the application’s request handler (latency regression). (2) Policy: tail sampling, redaction, multi-backend routing belong outside application code — platform engineers update YAML without coordinating a redeploy across 50 teams. (3) Heterogeneous fleet: different services run different languages and SDK versions. The Collector normalises everything to OTLP and applies uniform policy regardless of upstream.

Quiz

A team's OTel Collector is dropping spans during a traffic spike (otelcol_processor_dropped_spans is non-zero). Which processor is most likely engaging the drop?

Quiz

Why is the agent-to-gateway pattern the production-canonical deployment for Kubernetes?

Order the steps

Order the Collector pipeline stages a span passes through in a standard trace pipeline:

1 OTLP receiver accepts the span batch from the application SDK
2 memory_limiter checks current RAM and drops if over threshold
3 k8sattributes adds pod, namespace, and node metadata
4 batch groups spans for efficient export
5 attributes redacts PII fields
6 OTLP exporter sends the batch to the backend

Recall before you leave

01
What does the memory_limiter processor do and why must it come before other processors in the pipeline?
02
Why does the agent-to-gateway pattern use a loadbalancing exporter on the agent tier instead of a simple round-robin?
03
Name three processors and explain the production order they should appear in a traces pipeline.

Recap

The OTel Collector is a YAML-configured pipeline with three stages: receivers (otlp, filelog, prometheus, kafka) that accept incoming telemetry; processors (memory_limiter, batch, resource, attributes, k8sattributes, tail_sampling, transform) that transform records in-flight; and exporters (otlp, datadog, prometheusremotewrite, loki) that ship records to backends. memory_limiter must come first — it drops records cheaply before expensive processing when the Collector is under memory pressure. Three deployment patterns: agent (DaemonSet per node, simple, no tail sampling), gateway (central pool, tail sampling possible), and agent-to-gateway (the production-canonical Kubernetes pattern — DaemonSet agents enrich with Kubernetes metadata and forward via trace_id-hashed routing to a central gateway that runs tail sampling and multi-backend routing). The Collector’s value is decoupling policy (redaction, routing, sampling) from instrumentation (application code) — update YAML, not code. Now when a new routing requirement lands — “send traces to vendor-A and logs to vendor-B starting Monday” — your first thought is which exporter block to add, not which teams to deploy.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 5 done

Connected lessons

builds on

Auto-instrumentation and manual spans: the 80/20 of OTelmiddle

unlocks

Sampling strategies: head, tail, and parent-basedmiddle

deepens into

appears again in205

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.