Observability OBS · 03 · 06

Vendor neutrality, eBPF instrumentation, the Operator, and browser/serverless OTel

True vendor neutrality lives at the application edge, not the Collector. eBPF covers breadth; SDK instrumentation covers depth. The OTel Operator turns instrumentation into a platform-team concern. Browser and serverless have OTLP/HTTP constraints.

OBS Senior ◷ 14 min

Level

FoundationsJuniorMiddleSenior

A team uses a vendor-distro Collector with custom proprietary processors. They believe they are OTel-neutral because their application emits OTLP. A year later, migrating to a new vendor requires rewriting not just the Collector config but the custom processing logic that only exists in the old vendor’s distribution. Vendor-neutrality was silently eroded.

The vendor-neutrality trap

After this lesson, when someone on your team says “we are OTel-neutral,” you will know the two questions to ask before agreeing — because true neutrality can erode silently in ways a passing audit will miss.

Many vendors ship “OTel-compatible” Collector distributions — Datadog Agent, Splunk OTel Collector, New Relic distro — that include proprietary processors and exporters. Using one is fine, but with two caveats that frequently create silent lock-in.

(1) Proprietary processors: if you rely on a processor that exists only in vendor-X’s distribution (e.g., a vendor-specific routing-by-cost feature, a vendor-specific RUM correlator), your Collector config is no longer portable. Moving to vendor-Y means rewriting that processor — it does not exist upstream.

(2) Edge OTLP: the contract that defines vendor-neutrality is “does the application edge emit OTLP?” If application code calls the vendor SDK directly (dd-trace, New Relic agent) and the data is in vendor format at the application boundary, you are not vendor-neutral even if a Collector is in the middle — because the migration cost lives in the application code, in every service.

The Honeycomb position: vendor distros are fine for collection-tier conveniences, but if application instrumentation is vendor-specific, you face an instrumentation-cost migration next time.

The portability audit: run quarterly. Grep application code for vendor-SDK imports. Inspect Collector YAML for processor names that exist only in a vendor distro. Review Semantic Convention adherence per service. Vendor-neutrality is not binary — it is a posture maintained by discipline.

Emitting OTLP is necessary but not sufficient. Lock-in hides either in app code (a vendor SDK) or in the Collector config (a proprietary processor) — the quarterly portability audit checks both layers.

Application code emits OTLP once. The Collector fans out to interchangeable backends via exporters — switching vendor changes an exporter config, not the instrumentation in every service.

OTel and the wide-event question

Why does it matter whether the industry is moving toward wide-events? Because the backend you adopt today should not box you out of the data model you will want in two years.

OTel’s stable data model is still three-pillar — traces, metrics, logs as separate APIs and OTLP message types. But the industry is converging on a more unified model. A wide-event store (Honeycomb, ClickHouse) can ingest OTel logs (with trace context attached) and treat them as one stream. Future OTel work — the Profiles signal, wide-event correlation discussions in the spec working group — moves toward a more unified data model.

For senior teams in 2026, the implication: adopting OTel now is forward-compatible with both 1.0 (pillar-separated) backends and 2.0 (wide-event) backends. Switching backends later is a Collector config change, not an instrumentation rewrite.

eBPF auto-instrumentation: the no-code path

A growing class of OTel-compatible instrumentation runs in the kernel via eBPF — Grafana Beyla, Pixie, Datadog USM. These tools attach kernel probes that observe syscalls, HTTP traffic on sockets, and gRPC RPCs without any application code or library changes.

What eBPF sees: HTTP + gRPC + DB socket calls, latency, status codes. Service identity from the cgroup (service.name inferred from the process name or cgroup label). OTel-shaped spans with Semantic Conventions attribution.

What eBPF cannot see: business attributes — customer.segment, feature_flag, batch_size — because these live inside the application, not in the syscall. Slow-tail attribution to specific application logic is harder.

Pattern in practice: eBPF for the breadth (free coverage of every service, including vendor binaries you cannot modify), SDK instrumentation for the depth (business spans and attributes). Both emit OTLP to the same Collector, both speak Semantic Conventions, both show up in the same dashboards.

Approach	Coverage	Business attributes?	Code changes?
eBPF (Beyla, Pixie)	HTTP + gRPC + DB sockets	No	None
OTel SDK auto-instrumentation	Framework calls (HTTP, DB, queues)	No (framework-level only)	Agent flag or require hook
Manual SDK instrumentation	Business operations	Yes	Yes — per business operation

The OTel Operator: instrumentation as platform responsibility

The OTel Operator (a Kubernetes Operator under the CNCF umbrella) manages the Collector deployment declaratively.

OpenTelemetryCollector CRD: defines agent and gateway Collector specs in YAML. The Operator creates the DaemonSet, Deployment, and Services from the CRD; updating the CRD triggers a rolling restart with zero downtime.

Instrumentation CRD: configures auto-instrumentation injection via annotations on pods. Example:

apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
  name: my-instrumentation
spec:
  exporter:
    endpoint: http://otel-collector:4317
  propagators: [tracecontext, baggage]
  sampler:
    type: parentbased_traceidratio
    argument: "0.25"
  java:
    image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-java:latest
  nodejs:
    image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-nodejs:latest

A pod annotated with instrumentation.opentelemetry.io/inject-java: "true" gets the Java Agent injected via an initContainer at startup — no Dockerfile changes, no code changes.

Strategic value: instrumentation becomes a platform-team responsibility, not a per-service responsibility. A new service deployed with the right annotation gets full auto-instrumentation without touching its Dockerfile or code. Mature platforms make this the default; opting out requires an explicit annotation.

Browser and serverless: OTel/HTTP and cold-start constraints

Browser: The OTel JavaScript SDK supports the browser environment with @opentelemetry/sdk-trace-web — instrumenting fetch, XMLHttpRequest, document load, user interactions. Browsers emit OTLP/HTTP (not gRPC; gRPC is browser-incompatible) to a Collector endpoint configured with CORS. The resulting browser spans join the server-side trace via the W3C traceparent header propagated on outgoing requests — true end-to-end traces from user click to database query.

Lambda: Cold-start cost of OTel SDK initialization is significant (50-200 ms for full Lambda instrumentation). The standard approach uses the AWS OTel Lambda Layer — the layer initializes the SDK in the runtime extension and amortises startup cost across invocations.

Cloudflare Workers / Vercel Edge: Still maturing OTel support, with limitations on async-context propagation in the V8 isolate model.

▸Why this works

Why does gRPC not work in browsers? gRPC requires HTTP/2 framing at the application layer — specifically, it relies on the ability to control HTTP/2 stream framing directly. Browsers do not expose raw HTTP/2 framing to JavaScript; the Fetch API only supports HTTP/1.1 and HTTP/2 at the network level but not the gRPC-specific framing. gRPC-Web (a proxy-based protocol) is one workaround, but OTel chose OTLP/HTTP over HTTP/1.1 as the browser transport for maximum compatibility — it works through any CORS-enabled Collector endpoint without requiring a proxy.

Quiz

A team uses a vendor-distro Collector with a custom vendor-specific routing processor. They consider themselves OTel-neutral because application code emits OTLP. Senior review — agree or disagree?

Quiz

What does eBPF instrumentation (Grafana Beyla, Pixie) observe that the OTel Java Agent cannot, and what does the Java Agent observe that eBPF cannot?

Recall before you leave

01
Articulate precisely what makes OTel 'vendor-neutral' and the two most common ways this neutrality silently erodes in production.
02
What is the strategic value of the OTel Operator, and what does it unlock for a platform team?
03
Why does browser OTel use OTLP/HTTP instead of OTLP/gRPC, and what does this mean for end-to-end trace stitching?

Recap

True vendor-neutrality requires OTLP at the application edge — not just “we use a Collector.” Two silent erosion paths: (1) vendor SDK in application code (dd-trace, New Relic agent) means the migration cost lives in every service’s code, not just the Collector config; (2) proprietary Collector processors lock the policy plane to one vendor’s distribution. Run a quarterly portability audit to detect both. eBPF instrumentation (Grafana Beyla, Pixie) attaches kernel probes to observe HTTP, gRPC, and DB sockets for any process without code changes — business attributes invisible to eBPF are added by SDK manual instrumentation. The OTel Operator manages both the Collector deployment and per-service instrumentation injection via CRDs — a pod annotation triggers auto-instrumentation with no Dockerfile or code changes. Browser OTel uses OTLP/HTTP (gRPC is browser-incompatible) and propagates W3C traceparent for true end-to-end traces. Lambda uses a Layer to amortise the 50-200 ms OTel cold-start cost. Now when a colleague claims “we are OTel-neutral,” you know to ask two things: does the application edge emit OTLP, and does the Collector config contain any processors that only exist in one vendor’s distribution?

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 5 done

Connected lessons

builds on

Sampling strategies: head, tail, and parent-basedmiddle

unlocks

Operating the OTel Collector: reliability, version skew, failure modes, and governancesenior

appears again in205

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.