Observability
OTel: build a vendor-neutral pipeline
Reading about the four pieces is not the same as wiring them together under load. Stand up a small multi-service app, instrument it with OTel, route it through an agent-to-gateway Collector pipeline with tail sampling, and prove — with evidence — that you can swap backends without touching application code.
Turn the unit’s mental model into a working pipeline: emit OTLP at the edge, enrich and curate in the Collector, sample head + tail, defend Semantic Conventions, and demonstrate vendor-neutrality and Collector self-monitoring with measurements, not claims.
Instrument a 2-3 service app with OTel, route telemetry through an agent-to-gateway Collector pipeline that does tail sampling and PII redaction, then prove vendor-neutrality (swap the backend by config only) and Collector self-monitoring (a Collector failure is itself observable).
- A continuous end-to-end trace from the entry service through the downstream service and database appears in the backend, with the manual business span and its typed attributes visible and an error trace showing recordException + ERROR status.
- A diff (or two screenshots) showing the backend swap was a Collector-YAML-only change — application code and Resource config are byte-identical before and after.
- Evidence that tail sampling kept 100% of error and slow traces and only ~1% of baseline traffic — a count or query against the backend, not an assertion.
- Collector self-metrics captured under load, plus a demonstrated failure drill: kill a gateway replica and show traffic survives via the other replicas, and that a total-gateway outage fires an alert from the meta-monitoring rather than going unnoticed.
- Add a Semantic Convention governance check: a tiny linter or Collector transform rule that flags or rewrites a non-standard attribute name (http_route -> http.route) and prove it catches a deliberately-misnamed service.
- Add a persistent queue (file_storage) on the gateway exporter and demonstrate it absorbs a simulated 5-minute backend outage with zero span loss after recovery.
- Reproduce and fix a cardinality leak: enable url.full on the HTTP client instrumentation, show the metric series count exploding, then switch to http.route or strip query strings via an attributes processor and show series count bounded.
- Add browser or serverless coverage: instrument a frontend with the OTel web SDK emitting OTLP/HTTP to the gateway (CORS-enabled) and show the browser span stitched into the server trace via traceparent.
This is the loop you will run building real observability: emit OTLP at the edge with correct Semantic Conventions and well-formed manual spans, enrich and curate in an agent-to-gateway Collector with memory_limiter-first ordering and tail sampling on full traces, prove portability by swapping the backend in YAML alone, and make the Collector itself observable so its failure cannot hide. Doing it once on a toy fleet makes the production version — where the Collector is critical-path infrastructure — muscle memory.