Observability
Three pillars: multiple-choice review
Six questions that cut across the whole unit. Each one mirrors a decision you make mid-incident or at instrumentation time — not a definition to recite, but a cost tradeoff to weigh.
Confirm you can pick the right signal for a question, reason about its cost axis, and connect the three pillars through join keys — the synthesis the individual lessons built toward.
A product manager wants per-customer error rates broken down by individual customer_id across 200k active customers, queryable for the last 30 days. In a 1.0 stack, which signal carries this data, and why not the others?
Two services share a heavy endpoint. Service A holds a stable label set (route × method × status_class = 400 series) but logs every request. Service B has tiny log volume but added user_id to one counter. Which threatens its observability backend first, and on which axis?
A team runs tail-based sampling at the collector keeping 100% of errors and 2% of successes. Their trace bill barely drops versus 100% retention, even though stored volume fell sharply. Why?
Metrics use route='/checkout', logs use http_path='/checkout', and traces use http.route='/checkout'. The data is all correct and flowing. What actually breaks, and what is the fix?
A 40-service team spends ~2 engineer-weeks per quarter policing cardinality budgets and ~3 hours per incident on cross-pillar pivots. They also keep 5 years of low-cardinality SLO metrics for compliance. What is the senior recommendation?
During an incident the on-call finds the trace they need was never sampled — the SDK uses ParentBased(TraceIdRatioBased) but the trace_id is derived from a hash of the request path. What is the root cause, and what is the durable fix?
The through-line of the unit is one decision: the question picks the signal at its cheapest cost axis. Metrics are cheap-to-query but die on cardinality; logs preserve everything but pay in ingestion bytes; traces preserve causality but must sample (and tail sampling shifts cost from storage to collector). Join keys — service.name, trace_id, http.route via OpenTelemetry Semantic Conventions — are what make the three compose into one navigable surface. The 2.0 wide-event model collapses all three when the engineering cost of cardinality discipline and cross-pillar pivots exceeds the bill of a unified backend — an economic call, not an ideological one.