Observability OBS · 03 · 01

What is OpenTelemetry: API, SDK, Collector, OTLP

OTel is four pieces stacked: an API your code calls, an SDK that builds telemetry, a Collector that routes it, and OTLP that carries it — together they let you instrument once and swap backends without rewriting code.

OBS Junior ◷ 10 min

Level

FoundationsJuniorMiddleSenior

A company switches observability backends to cut cost. With the vendor SDK hard-wired, the migration estimate is four engineer-weeks. With OTel, it is one day — change an exporter config block, redeploy the Collector, done.

The four pieces

In ten minutes you will know which of OTel’s four pieces lives in your application, which lives outside it, and exactly where to pull the lever when you need to change vendors.

Before OTel, every observability vendor shipped a proprietary agent — Datadog’s dd-trace, New Relic’s agent, AppDynamics’s Java agent — and changing vendor meant rewriting instrumentation across every service. The CNCF’s answer is OpenTelemetry: four pieces layered so each is replaceable and each speaks a stable contract.

API — language-specific public interfaces (Java’s io.opentelemetry.api, Python’s opentelemetry.trace, Go’s go.opentelemetry.io/otel). This is what application code and third-party libraries call. It is intentionally lightweight: the default implementation is a no-op. The application code never imports a vendor library — it imports the OTel API.

SDK — the runtime piece that turns API calls into telemetry records. The SDK owns sampling decisions (which traces to keep), batching (when to flush to the exporter), and serialization (OTLP wire format). It is installed by the application owner, not the library author.

Collector — a standalone process (binary or container) that receives OTLP, runs configurable processors, and exports to one or more backends. This is the policy layer: tail sampling, redaction, multi-backend routing — all in YAML, outside the application.

OTLP — the wire format. Protobuf-encoded messages over gRPC or HTTP, defined in the OTel specification and stable across versions. Any pair of OTel-aware components communicate over OTLP.

Piece	Lives in	Role	Replaceable?
API	Application code	Stable public interface; no-op by default	Stable — never changes
SDK	Application runtime	Builds records, samples, batches, serializes	Yes — swap SDK without touching app code
Collector	Sidecar / gateway	Process, route, export telemetry	Yes — update config, not code
OTLP	Wire format	Carries spans/metrics/logs between pieces	Stable — all pieces speak it

One vendor-neutral pipeline: the API/SDK instruments and serializes, OTLP carries the bytes, the Collector routes — and any backend plugs in at the end. This single chain replaces the per-vendor proprietary agents OTel was built to retire.

The postal metaphor

Picture a national postal system. The API is the mailbox at your house — you drop a letter in, you do not care how it gets sorted. The SDK is the local post-office staff who pick up the letter, weigh it, stamp it, put it in the right outgoing bag. The Collector is the regional sorting depot where mail from many houses meets, gets batched, filtered, and routed by destination. OTLP is the standard envelope and address format every depot understands. Change the destination country (the backend vendor)? Same envelope, different routing table. When you see a migration ticket that says “switch from Datadog to Honeycomb,” this metaphor is the map that tells you only the routing table needs rewriting — not the letters themselves.

The portability story

Bea · Browser the platform engineer is told the company is moving from Datadog to Honeycomb to cut cost. She panics: “Do we rewrite every service?” Sven · Origin server the backend developer reassures her: “We are on OTel — application code calls only the OTel API, the SDK emits OTLP to the Collector, and the Collector exports to whichever backend the config says. We change one block in the Collector YAML — exporter, endpoint, API key — and re-deploy. No app code changes.” Two days later the migration is done.

The portability payoff in numbers: a hard-wired vendor SDK means re-instrumenting every service (~4 engineer-weeks); OTel turns the same backend switch into a one-day Collector config change.

▸Why this works

The API/SDK split solves the third-party library problem. Before OTel, a library that wanted to emit telemetry had to pick a vendor (locking its users to that vendor) or build its own abstraction. With OTel, a library depends only on the OTel API package — a small set of interfaces with a no-op default — so it can emit spans without forcing any SDK on its users. The application owner installs whichever SDK they choose at deploy time. This is why “instrument once, route anywhere” is architecturally sound rather than just marketing.

Quiz

Which four pieces make up the OTel architecture?

Quiz

A team has 50 microservices instrumented with OTel and wants to add a new tracing backend. Where do they make the change?

Order the steps

Order the path of a single span from application code to the backend:

1 Application code calls tracer.start_span() (OTel API)
2 OTel SDK builds the span record (start time, attributes, trace context)
3 When the span ends, SDK passes it to a batch span processor
4 Processor batches spans and hands them to an exporter
5 Exporter sends OTLP-encoded data over gRPC or HTTP to the Collector
6 Collector receives, applies processors (tail sampling, redaction), batches
7 Collector exports to one or more backends (Datadog, Honeycomb, Tempo, ...)

Complete the analogy

Fill in the blank: _______ is the standard envelope and address format every OTel-aware piece understands — change the recipient, the envelope stays the same.

Recall before you leave

01
In two sentences, what is the difference between the OTel API and the OTel SDK?
02
Why does OTel split the API from the SDK, and what concrete problem does this solve?
03
What is the vendor-neutrality contract in one sentence?

Recap

OpenTelemetry is four pieces stacked: the API (what application code imports — a stable, vendor-free interface), the SDK (the runtime that turns API calls into telemetry records, batches them, and exports OTLP), the Collector (a standalone process that receives OTLP, runs configurable processors like tail sampling and redaction, and exports to any backend), and OTLP (the protobuf wire format all pieces speak). In 2026 every major vendor accepts OTLP — Datadog, Honeycomb, Grafana Cloud, Elastic, Splunk, New Relic. The portability contract is: emit OTLP at the edge, change backends in a Collector YAML, never rewrite instrumentation. Now when you see a vendor-migration task land in the backlog, your first question is not “how much code changes” but “does our edge already speak OTLP?” — everything else follows from that answer.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 5 done

Connected lessons

builds on

unlocks

deepens into

appears again in40

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.