Observability
Golden signals, dashboard layout, and service mesh auto-RED
A team has five services. Each RED dashboard looks different — different metric names, different panel layouts, different label structures. When an incident crosses services, the on-call spends 30 minutes reading documentation before starting triage. A single consistent dashboard pattern across all services would reduce that to seconds.
The four golden signals
Google’s SRE book (predating RED and USE in spirit, published later) names four signals: Latency, Traffic, Errors, Saturation. The overlap with RED is direct: Latency = Duration, Traffic = Rate, Errors = Errors. The novel piece is Saturation — “how full” the service is relative to a known capacity limit.
Unlike USE’s Saturation (which is about physical resources), the golden-signal Saturation is at the service level: in-flight request count, queue length, active session count, connection pool occupancy. A service can be at 50% CPU but have its connection pool at 100% — the service is full, users are queueing, and USE on the host shows nothing.
Many production teams treat RED + Saturation as the practical evolution of the SRE-book signals:
| Signal | RED | Golden signals |
|---|---|---|
| Request rate | Rate | Traffic |
| Failed requests | Errors | Errors |
| Response latency | Duration | Latency |
| Capacity headroom | (implicit in Rate plateau) | Saturation |
The senior dashboard layout
A well-structured service dashboard reads top-to-bottom during an incident:
Row 1: RED — Rate panel | Errors panel | Duration p50/p95/p99 panel
Row 2: USE — CPU util+PSI | Memory util+PSI | Disk %util+queue
Row 3: Downstreams — DB response time + errors | Cache hit rate | Queue depthReading rhythm:
- Which RED metric moved?
- Which USE row matches the timeline?
- Which downstream dependency followed the same timeline?
- Click the exemplar to confirm in a trace.
The key discipline: RED and USE share the same time axis in the same dashboard. Cross-service correlation requires consistent label keys (service, route, region) so panels can be filtered by the same label set.
Alerting severity split
The mature pattern is to alert on user-facing symptoms (RED) at page-grade and on internal cause signals (USE) at warning-grade:
| Alert | Metric | Severity |
|---|---|---|
| Latency SLO breach | p99 Duration > 200 ms for 5 min | Page |
| Error rate spike | Errors > 1% for 2 min | Page |
| CPU saturation | PSI cpu some > 20% for 10 min | Warning / Slack |
| Memory pressure | PSI memory full > 0 for 5 min | Warning / Slack |
| Disk headroom | Disk > 90% for 5 min | Warning / Slack |
The split is deliberate. RED alerts wake humans because users are affected. USE alerts notify a slower channel so capacity teams plan ahead — not so on-call wakes up every time CPU touches 70%. This one architectural choice is among the strongest levers against alert fatigue.
| Layer | Channel | Reason |
|---|---|---|
| RED alert (user impact) | PagerDuty / on-call page | User is already unhappy |
| USE alert (resource headroom) | Slack / ticket | Plan capacity before it becomes a RED incident |
Service mesh auto-RED
Service meshes (Envoy, Linkerd, Istio) emit RED metrics at the sidecar without application code changes. Envoy’s downstream_rq_total, downstream_rq_xx, and downstream_rq_time give Rate, Errors-by-status, and Duration histogram per cluster, labelled consistently across the fleet.
Advantage: a polyglot estate (Node, Go, Python, JVM, Rust) gets one RED dashboard pattern even though languages disagree on Prometheus client conventions.
Catch: the sidecar sees only what the network sees. A request that succeeds at the sidecar but is mis-served by buggy application code shows as 2xx in mesh metrics.
Production pattern: run both layers — sidecar RED for breadth and consistency, application-emitted RED for business-logic fidelity (e.g. a payment-success counter labelled by gateway that the sidecar cannot see). Keep label keys aligned across both layers so dashboards join them.
RED transfers across protocols
RED’s request-centric shape transfers to every protocol; only the definition of “request” and “error” changes:
| Protocol | Rate | Errors | Duration |
|---|---|---|---|
| HTTP | req/s | HTTP 5xx | p99 response time |
| gRPC | RPCs/s | status != OK | end-to-end for unary, first-byte for streaming |
| Queues (Kafka, SQS) | messages/s consumed | dead-letter rate | publish-to-consume wait time |
| Batch jobs | jobs/min | failed jobs | wall-clock per job |
| Serverless | invocations/s | error rate + throttle rate | includes cold-start tail |
For async and queue-based services, queue depth (backlog) is the Saturation signal that pure RED misses. A queue with Duration p99 of 10 s per job may have a 10-minute backlog — the user waits 10 min + 10 s. RED captures the job-processing time; Saturation captures the queue-wait before processing begins.
Why this works
Batch and async systems need a separate queue-depth metric that RED alone cannot provide. The pattern is to emit queue_depth_seconds — the age of the oldest unprocessed item in wall-clock seconds. If this grows, users are waiting longer than the job Duration suggests. This is the Saturation signal at the service level, complementing USE’s resource-level Saturation.
What does 'Saturation' mean in the four golden signals, and how does it differ from USE's Saturation?
An on-call engineer sees RED Duration p99 climb but Rate and Errors are flat. Which USE row should they check first?
Order the RED + USE incident response steps in production:
- 1 Page fires — symptom description includes service name and severity
- 2 Open the service's RED dashboard, identify which of R / E / D moved
- 3 Read the time series — single spike or sustained drift?
- 4 Cross-reference USE on the same hosts and on direct downstream dependencies
- 5 If RED-D moved with USE-CPU saturation: capacity issue → scale
- 6 If RED-E moved with USE-Errors on a dependency: dependency failure → failover
- 7 If both RED and USE look fine but pager fired: revisit the alert source, suspect false positive
- 01What does Google's SRE book add to RED that RED itself does not cover?
- 02What is the practical advantage and limitation of service-mesh auto-RED?
- 03For a Kafka consumer, what are Rate, Errors, and Duration, and what extra signal is needed?
Google’s four golden signals — Latency, Traffic, Errors, Saturation — extend RED by adding a service-level Saturation dimension: how full the service is relative to its logical capacity (connection pool, in-flight count, queue depth), not just physical resources. A mature RED+USE dashboard puts RED panels in the top row, USE panels in the middle, and downstream dependency RED+USE in the bottom row, all on the same time axis with consistent label keys. Alerting discipline is the other half: RED alerts page on-call (user is affected), USE alerts go to Slack or a ticket (capacity team plans ahead). Service meshes emit RED for free at the sidecar level — consistent across languages — but only see what the network sees; application-emitted RED is needed for business-logic fidelity. RED’s shape transfers to gRPC, queues, batch jobs, and serverless with minimal adjustment — only the definition of “request” and “error” changes.
appears again in167
- The journey of a request: seven stops from socket to responsejunior
- Accept and parse: from kernel queue to a typed requestmiddle
- Routing and middleware: choosing what runs, and in what ordermiddle
- Handler and response: from business logic to bytes on the wiremiddle
- Streaming and backpressure: when the client reads slower than you writesenior
- Timeouts and tail latency: budgets, deadlines, and the fan-out trapsenior
- Middleware and DI: the two patterns that shape every backendjunior
- Writing middleware: signatures, next(), and the three framework modelsmiddle
- Inversion of control: how dependencies reach a classmiddle
- DI scopes and lifecycles: singleton, request, transientmiddle
- DI as a testing seam: fakes, mocks, and the boundary that matterssenior
- DI containers in production: resolution graphs, circular deps, and when not tosenior
- Blocking vs non-blocking I/O: two ways to waitjunior
- The event loop: one thread, ordered phasesmiddle
- What blocks the loop: CPU work and sync callsmiddle
- Offloading CPU work: worker threads and the libuv poolmiddle
- Backpressure and bounded concurrencysenior
- Throughput under load: tail latency and saturationsenior
- Why pool: the cost of creating a connectionjunior
- Pool sizing: why bigger is not fastermiddle
- Acquisition and timeouts: the wait queue is the real latency dialmiddle
- Retry strategies: backoff, jitter, and thundering herdmiddle
- Observability, production failures, and global-scale designsenior
- Tasks, microtasks, and scheduler.yield()middle
- Timer accuracy, throttling, and idle workmiddle
- Node.js event loop: phases, nextTick, and loop lagsenior
- Rendering strategies: SSG, SSR, ISR, streaming, and hydrationjunior
- SSG, SSR, ISR, streaming, and RSC — how each worksmiddle
- Hydration cost: selective, progressive, islands, resumabilitymiddle
- Core Web Vitals: what LCP, INP, and CLS measurejunior
- LCP: four phases, one dominant costmiddle
- INP: input delay, processing, presentationmiddle
- Lab vs field: why the two disagree and how to use eachmiddle
- Metric tradeoffs, RUM attribution, and the CI+field loopsenior
- The full picture: URL to LCP to INP as a relay racejunior
- Eight layers traced: from the service worker to the second navigationmiddle
- Five canonical breaks: where production reliably diessenior
- The three-track method: reading traces and building a monitored systemsenior
- What an index is and how it speeds up queriesjunior
- The leading-column rule and composite index designmiddle
- Partial, expression, and covering indexesmiddle
- Index types: GIN, GiST, BRIN, Hash, Bloom, and HOT updatesmiddle
- Index-only scans, the Visibility Map, and INCLUDEsenior
- Production failure modes and the index audit playbooksenior
- Index design exercise: full-text search strategysenior
- EXPLAIN and execution plans: what the planner decides and whyjunior
- Scan types: Seq, Index, Bitmap, Index-Onlymiddle
- Join algorithms and the row-estimate cascademiddle
- pg_statistic, ANALYZE, and production observabilitymiddle
- Extended statistics: fixing correlated-column estimate failuressenior
- Plan cache, cost-constant tuning, and planner internalssenior
- Production failure modes and plan stabilitysenior
- Connection pools: amortising the cost of a Postgres backendjunior
- PgBouncer session, transaction, and statement modesmiddle
- Pool sizing: the (cores × 2) + spindles formula and the two-layer stackmiddle
- Pool exhaustion and idle-in-transaction: the 3 AM failure modemiddle
- Migrating to transaction mode: rollout playbook and PgBouncer 1.21 prepared statementsmiddle
- The Postgres process model and why raising max_connections degrades throughputsenior
- Pooler landscape 2026, serverless connection storms, and the full failure-mode taxonomysenior
- ADD COLUMN: instant in PG 11+ vs rewrite in older Postgresjunior
- The lock-queue failure mode: why instant DDL can freeze the databasemiddle
- Safe DDL patterns: NOT VALID, CONCURRENTLY, and unsafe-op fixesmiddle
- Migration failure taxonomy and production disciplinesenior
- Shard-key selection: hash, range, list, and directory strategiesmiddle
- Co-location and Citus: the invariant that makes sharding usablemiddle
- The hot-shard failure mode: detection, isolation, and durable policymiddle
- Online resharding, 2PC, and the operational cost of shardingsenior
- The seven acts: from CREATE TABLE to Citusjunior
- Acts 1–3 in depth: schema, indexes, and planner statisticsmiddle
- Acts 4–6 in depth: MVCC bloat, connection pooling, and safe migrationsmiddle
- Act 7 in depth: sharding, co-location, and the seven-tier tradeoff cascademiddle
- Observability, anti-patterns, and production triagesenior
- Bits on the wirejunior
- Latency mathmiddle
- Bufferbloat and congestionsenior
- The physical frontiersenior
- Sequence numbers and connection statemiddle
- Flow control and congestion controlmiddle
- BBR, production observability, and beyond TCPsenior
- CDN: putting content next doorjunior
- Anycast and GeoDNS: routing to the nearest edgemiddle
- Tiered cache and Cache-Controlmiddle
- Vary header and cache keysmiddle
- Stale-while-revalidate and cache stampedesenior
- Edge workers and edge-side compositionsenior
- CDN operations and observabilitysenior
- WebSocket: the HTTP upgrade handshakejunior
- WebSocket vs SSE vs long-polling: choosing the right transportmiddle
- WebSocket backpressure: when clients can''''t keep upmiddle
- Reconnection: jittered backoff, thundering herd, message resumptionsenior
- WebSocket at scale: HTTP/2 multiplexing, permessage-deflate, C10Msenior
- WebSocket in production: proxies, security, and distributed architecturesenior
- What reverse proxies dojunior
- Balancing algorithms: round-robin to power-of-two-choicesmiddle
- L4 vs L7 load balancing and client-IP preservationmiddle
- Health checks, connection draining, and slow startmiddle
- Retry storms, circuit breakers, and load sheddingsenior
- Resilient LB architecture: anycast, zone-aware routing, and observabilitysenior
- Why QUIC and not TCP+TLSjunior
- QUIC streams and head-of-line blockingjunior
- Integrated handshake and 1-RTTmiddle
- Connection IDs and network migrationmiddle
- Loss detection and congestion controlmiddle
- 0-RTT resumption and packet encryptionsenior
- Deployment tradeoffs and CPU costsenior
- DDoS: what it is and why it worksjunior
- Amplification attacks and state exhaustionmiddle
- Rate limiting: algorithms and architecturemiddle
- WAFs, firewalls, mTLS, and HSTSmiddle
- DNS cache poisoning and BGP hijackingsenior
- Defense-in-depth architecture and attack economicssenior
- The twelve layers: one URL, seven actorsjunior
- DNS, TCP, TLS in sequence: where the milliseconds gomiddle
- Critical render path and Core Web Vitalsmiddle
- Proxy intercepts and security gates: rate limiters, WAF, mTLSmiddle
- Alternate paths: QUIC 0-RTT, WebSocket upgrade, connection migrationmiddle
- Observability: distributed traces, USE/RED, and samplingsenior
- Resilience: cascading retries, circuit breakers, and error budgetssenior
- Why profile first: measure where time actually goesjunior
- Amdahl''''s law and self-time: the ceiling on every speedup you can shipmiddle
- The measurement loop: microbench, macrobench, prod profile, observer effectmiddle
- Reading flame graphs: shapes, per-language profilers, and the 60-second scanmiddle
- Statistical baselines: why one run is not a measurementmiddle
- Profiler history and microbenchmark pitfalls: Knuth to GWPsenior
- Hardware counters, cold-start profiles, and profile securitysenior
- Continuous profiling at scale: costs, CI gates, trace correlation, and anti-patternssenior
- What makes a hot path: symptom vs causejunior
- Five shapes of hotspot: CPU, alloc, cache, lock, syscallmiddle
- Reading parent and child chains: where to apply the fixmiddle
- JIT deopt, the fix-and-verify loop, and PR-time profilingmiddle
- Hardware counters and Intel TMA: sub-category diagnosissenior
- False sharing and native-bridge hot pathssenior
- Hot paths in production: security, tail latency, and tooling lineagesenior
- Memory hierarchy: why the same O(N) loop can be 17x slowerjunior
- Row-major vs column-major: access order and the 9x gapjunior
- Branch prediction and branchless codemiddle
- Hardware prefetcher, TLB, and memory-level parallelismsenior
- GC basics: what the runtime taxes you forjunior
- GC algorithms: generational, concurrent, and per-runtimemiddle
- GC tradeoffs: pause, throughput, heap — and object poolingmiddle
- GC tuning: pacing, heap shape, and allocation observabilitymiddle
- GC internals: tri-color invariant, write barriers, and per-runtime deep-divessenior
- GC in production: observability, security, edge cases, and fleet governancesenior
- N+1: one logical operation, many round-tripsjunior
- Fix families: JOIN, IN, preload, and DataLoadermiddle
- Detecting N+1: query logs, APM traces, and CI gatesmiddle
- DataLoader: batching across resolver treesmiddle
- Cross-protocol N+1: HTTP fan-out and Redis MGETmiddle
- N+1 at scale: pool exhaustion, plan changes, and denormalisationsenior
- Batching: amortize fixed cost per operationjunior
- The batching window: size and wait timemiddle
- Batching in Kafka and Postgresmiddle
- io_uring and observability of batchingmiddle
- From Nagle to io_uring: evolution of batchingmiddle
- Backpressure, failure isolation, and batch security in productionsenior
- What a bundle actually costs: download, parse, compile, executejunior
- Core Web Vitals: LCP, INP, and CLSmiddle
- Code splitting: route-level, component-level, vendor splittingmiddle
- Tree shaking and compression: removing what you don''''t usemiddle
- Third-party scripts: the silent budget killermiddle
- CI enforcement and RUM: making budgets stickmiddle
- V8 JIT pipeline, HTTP priorities, and bundle securitysenior
- The performance loop: discipline, not a projectjunior
- Classify and fix: matching bottleneck families to remediesmiddle
- Observability stack and CI gates: catching regressions before they shipmiddle
- Incident to enforcement: SLO burn to verified fix in 35 minutesmiddle
- Culture, economics, and org-scale performancesenior