Performance
Culture, economics, and org-scale performance
A VP of Engineering inherits an org with p99 at 800 ms and LCP at 3.5 s. Every quarter, there are two or three SLO-burning incidents, each consuming 20 to 40 engineer-days. She has 6 months and $500k. The question is not how to fix the current incidents — it is how to make performance a property of the org that outlasts her tenure.
Error budgets: the operational tradeoff
Google’s SRE book formalised performance and reliability as a continuous tradeoff via error budgets.
An SLO defines the target: “99.9% of /checkout requests under 200 ms over 30 days.” The error budget is the allowed shortfall: 0.1% = approximately 43 minutes per month. When the budget is healthy, the team can ship features faster (more risk tolerated). When the budget is exhausted, the team must focus on performance and reliability until it recovers.
This converts the “should we optimise or ship features?” argument into a quantitative tradeoff. Every release ships with a predicted error budget impact. Releases that would burn more than a set percentage of the remaining budget require explicit risk acceptance.
The budget ladder: Service-level SLOs sit at the top. Route-level budgets (per-page bundle size, per-endpoint query count, per-service allocation rate) sit in the middle. Feature-level budgets sit at the bottom. Every PR is accountable to the budget closest to it. When per-route and per-feature budgets are met, the headline SLO is met. When sub-budgets drift, the SLO is the first to suffer.
The economics of performance
Performance is a cost lever, not just a user-experience lever.
Infrastructure cost: a service with 2x better throughput needs half the infrastructure for the same load. AWS bills scale with vCPU, memory, and bandwidth. Concrete examples:
- Discord’s structured logger rewrite cut per-request allocations 90%, dropping Go GC overhead from 20% to under 2%. Infrastructure cost reduced 40% for the chat service.
- Shopify’s storefront LCP optimisation (bundle audit + lazy-load) restored LCP from 4.5 s to 1.9 s on mobile. Bounce rate dropped 12%, directly attributable to page speed.
- Stripe’s server-side profile-first programme returns an estimated $5 to $10 in saved infrastructure cost for every engineer-hour invested.
Engineering velocity: teams with mature performance discipline spend 5 to 10% of engineering time on performance as steady-state maintenance. Teams without it spend 20 to 40% in crisis mode — mostly reactive. The difference is 15 to 30 percentage points of engineering capacity, permanently freed for product work.
Recruiting and retention: fast software is a competitive differentiator. Engineers who join teams known for performance discipline stay longer and produce more. The measurement is indirect but the correlation is strong across multiple company studies.
| Investment | Return | Payback window |
|---|---|---|
| Observability stack (~$500/mo OSS) | MTTR cut 50–80%, incidents caught earlier | First prevented incident |
| 4 CI gates (week of eng time) | 90% of known regressions prevented at PR time | First quarter |
| 2x throughput improvement (1–2 months eng) | 50% infra cost reduction for that workload | 3–6 months of saved cloud bill |
| Performance culture (ongoing) | 5–10% eng time on perf vs 20–40% crisis mode | 12–24 months |
Toil reduction: converting firefighting to infrastructure
SRE’s toil framework asks: what manual work is repeated, automatable, and grows with scale? Performance firefighting is classic toil — page-out, manual triage, fix, repeat in three months. The loop converts toil into infrastructure.
A healthy team holds toil under 50% of engineering time per SRE’s guidance. Many teams sit at 70 to 80% pre-discipline and 20 to 30% post. The investment in observability, gates, and runbooks pays back not just in fewer incidents but in reclaimed engineer-time that is permanently redirected to product work.
Measure it: track the number of performance incidents per quarter and the average engineer-hours per incident. In Q1 of a mature programme, these drop 50 to 70% from the baseline. After 12 months, they stabilise at near-zero for known failure classes.
Distributed ownership: avoiding the bottleneck
The anti-pattern: centralise all performance work into one “performance team.” This team becomes a bottleneck — every product team waits for it, every regression is someone else’s problem until the crisis arrives.
The pattern that scales: each role owns its domain.
- Frontend engineers: pieces 02 + 07 (hot paths + bundle budgets). Own per-route CWV.
- Backend engineers: pieces 02 + 04 + 05 + 06 (hot paths + GC + N+1 + batching). Own per-endpoint latency and query budgets.
- SRE / DevOps: piece 01 (profile-first infrastructure, continuous profiling). Build and maintain CI gates.
- Platform engineers: piece 03 (cache vs big-O — fundamental patterns). Maintain shared observability stack.
The platform team builds the infrastructure that lets every team own their performance. Without distribution, performance degrades silently as product teams ship features without accountability. With distribution, every PR is checked against budgets, every team retros on regressions, and the platform team accelerates rather than blocks.
Cultural mechanics that make it stick
Three practices build durable culture:
1. Performance in every PR review, not a separate phase. The PR template includes a checklist item: “performance impact considered.” Code reviewers are trained to spot the seven-piece signals (lazy loading skipped, N+1 introduced, unnecessary allocations) and ask. Quarterly engineering surveys ask “did your reviewer flag performance?” — measures whether culture is sticking.
2. Engineering manager OKRs include performance. EM OKRs include “maintain or improve route SLOs” alongside delivery metrics. Senior engineer promotion criteria include “demonstrated performance improvements to systems they own.” Without this, engineers see performance work as career-distracting when it competes with feature velocity. With it, performance work is career-supporting.
3. Blameless retros after every performance incident, always ending with a new gate. Retro structure: what was the symptom, what was the root cause, what gate would have caught it earlier, who owns adding the gate. The accumulated CI gates and runbook entries become the team’s institutional memory — new engineers inherit it on day one instead of re-discovering the same failure modes.
Why this works
The hardest part of performance culture is making it self-sustaining after the initial push. The key mechanism is making performance “table stakes” — a property assumed in every PR, not argued about in every incident. Teams that reach this point typically have: (a) visible performance metrics on the engineering all-hands dashboard, (b) explicit gate failures in CI with clear owners, (c) a history of engineers being recognised for performance contributions in reviews. Without all three, performance culture decays within 12 to 18 months of the initial programme. With all three, it compounds.
- Eng-time on perf (crisis mode, no discipline)
- 20–40%
- Eng-time on perf (steady state, with discipline)
- 5–10%
- Infrastructure cost reduction from 2x throughput improvement
- ~50%
- Stripe infrastructure ROI per eng-hour on profiling
- $5–10 saved
- Error budget (0.1% SLO at 30 days)
- ~43 min/month allowed
- Toil ratio pre-discipline (typical)
- 70–80%
- Toil ratio post-discipline (target)
- under 30%
A team's error budget is 99.9% (0.1% shortfall allowed). After a deploy, p99 regressions consume 80% of the month's budget in 4 days. Senior response?
Why does centralising all performance work in a dedicated 'performance team' fail to scale?
- 01How do error budgets convert the 'optimise vs ship features' argument into a quantitative tradeoff?
- 02Describe the three cultural practices that make performance discipline self-sustaining, and why each is necessary.
- 03What does 'distributed ownership' of performance mean, and why does it scale better than a centralised performance team?
Error budgets, introduced in Google’s SRE book, quantify the tradeoff between reliability and feature velocity. An SLO of 99.9% over 30 days gives 43 minutes of allowed degradation per month; when the budget is healthy, the team can ship faster; when it is exhausted, performance work takes priority. The economics of performance discipline are compelling: 2x throughput halves the AWS bill for that workload, continuous profiling returns $5–10 in infrastructure savings per engineer-hour at Stripe’s scale, and teams with mature discipline spend 5–10% of time on performance versus 20–40% in crisis mode. Cultural mechanics — PR review criteria, EM OKRs, blameless retros ending with new gates — are the highest-leverage investment because they compound indefinitely and survive team turnover. Distributed ownership prevents the centralised-team bottleneck: platform builds the infrastructure, every product team owns its layer, and performance becomes table stakes rather than a periodic crisis.
appears again in260
- Why GraphQL gets N+1junior
- DataLoader mechanics: tick-boundary batchingmiddle
- Batch function contracts: ordering, shapes, errorsmiddle
- Federation and lookahead: batching beyond DataLoadermiddle
- Query complexity defences: depth, cost, persisted queriesmiddle
- Senior GraphQL API: scheduling contract, tenant isolation, observabilitysenior
- The journey of a request: seven stops from socket to responsejunior
- Accept and parse: from kernel queue to a typed requestmiddle
- Routing and middleware: choosing what runs, and in what ordermiddle
- Handler and response: from business logic to bytes on the wiremiddle
- Streaming and backpressure: when the client reads slower than you writesenior
- Timeouts and tail latency: budgets, deadlines, and the fan-out trapsenior
- Middleware and DI: the two patterns that shape every backendjunior
- Writing middleware: signatures, next(), and the three framework modelsmiddle
- Inversion of control: how dependencies reach a classmiddle
- DI scopes and lifecycles: singleton, request, transientmiddle
- DI as a testing seam: fakes, mocks, and the boundary that matterssenior
- DI containers in production: resolution graphs, circular deps, and when not tosenior
- Blocking vs non-blocking I/O: two ways to waitjunior
- The event loop: one thread, ordered phasesmiddle
- What blocks the loop: CPU work and sync callsmiddle
- Offloading CPU work: worker threads and the libuv poolmiddle
- Backpressure and bounded concurrencysenior
- Throughput under load: tail latency and saturationsenior
- Why pool: the cost of creating a connectionjunior
- Pool sizing: why bigger is not fastermiddle
- Acquisition and timeouts: the wait queue is the real latency dialmiddle
- Why idempotency: making retries safejunior
- Server-side state machine: four states of an idempotency keymiddle
- Retry strategies: backoff, jitter, and thundering herdmiddle
- Outbox and inbox: effectively-once across the dual-write boundarymiddle
- Concurrency and cache architecture for idempotency at scalesenior
- Observability, production failures, and global-scale designsenior
- The event loop: one thread, three queuesjunior
- Tasks, microtasks, and scheduler.yield()middle
- Timer accuracy, throttling, and idle workmiddle
- Microtask starvation, Long Tasks, and LoAFsenior
- Node.js event loop: phases, nextTick, and loop lagsenior
- React, Vue, and INP observability in productionsenior
- The render pipeline: six stages from bytes to pixelsjunior
- Stage costs and the renderer process modelmiddle
- Invalidation, dirty bits, and containmiddle
- Compositor layers: promotion, overlap, and GPU memorymiddle
- DevTools flame strip and the frame lifecyclemiddle
- Layout thrash: forced synchronous layoutsenior
- BeginMainFrame, compositor-driven animations, and GPU memorysenior
- Production observability: LoAF, INP, and the full attack surfacesenior
- What V8 is and why performance varies 100×junior
- V8''''s four-tier JIT pipeline and profile-guided tieringmiddle
- Hidden classes, transition trees, and memory layoutmiddle
- Inline caches, IC states, and deoptimizationmiddle
- Orinoco GC: parallel scavenger, concurrent marking, and write barriersmiddle
- TurboFan''''s speculative engine and the deopt-loop trapsenior
- V8 in production: isolates, pointer compression, and real failuressenior
- Service worker lifecycle and cache strategiesmiddle
- Service worker edge cases: version skew, durability, and navigation trapssenior
- What the reconciler does: render vs commitjunior
- The fiber object and the double-buffer treemiddle
- Render phase purity and commit phase sub-stepsmiddle
- Reconciliation: diffing heuristics and the key trapmiddle
- Priority lanes, time-slicing, and useTransitionmiddle
- Bailout, memoisation, and tearingsenior
- React Profiler, the Compiler, and production observabilitysenior
- Rendering strategies: SSG, SSR, ISR, streaming, and hydrationjunior
- SSG, SSR, ISR, streaming, and RSC — how each worksmiddle
- Hydration cost: selective, progressive, islands, resumabilitymiddle
- Hydration mismatch: causes, detection, and the determinism rulesenior
- RSC, per-route strategy, and production observabilitysenior
- Core Web Vitals: what LCP, INP, and CLS measurejunior
- LCP: four phases, one dominant costmiddle
- INP: input delay, processing, presentationmiddle
- CLS: why layout shifts happen and how to stop themmiddle
- Lab vs field: why the two disagree and how to use eachmiddle
- Metric tradeoffs, RUM attribution, and the CI+field loopsenior
- The full picture: URL to LCP to INP as a relay racejunior
- Eight layers traced: from the service worker to the second navigationmiddle
- Five canonical breaks: where production reliably diessenior
- The three-track method: reading traces and building a monitored systemsenior
- What is a cache stampede and why it makes things worsejunior
- Lock and single-flight: bounding concurrent rebuildsmiddle
- XFetch: coordination-free probabilistic early expirationmiddle
- Stale-while-revalidate and CDN request coalescingmiddle
- Detecting stampedes and designing TTL for productionmiddle
- Metastable failure, fencing tokens, and production postmortemssenior
- What a relation is: tables, rows, keys, and constraintsjunior
- Constraints, keys, and Postgres data typesmiddle
- Normal forms, denormalization, and why schemas stickmiddle
- JSONB, arrays, and when a side table winsmiddle
- Heap storage, TOAST, and column alignmentsenior
- Schema integrity: deferral, versioning, and production failure modessenior
- Relational vs document, wide-column, graph, and key-valuesenior
- What an index is and how it speeds up queriesjunior
- The leading-column rule and composite index designmiddle
- Partial, expression, and covering indexesmiddle
- Index types: GIN, GiST, BRIN, Hash, Bloom, and HOT updatesmiddle
- Index-only scans, the Visibility Map, and INCLUDEsenior
- Production failure modes and the index audit playbooksenior
- Index design exercise: full-text search strategysenior
- EXPLAIN and execution plans: what the planner decides and whyjunior
- Scan types: Seq, Index, Bitmap, Index-Onlymiddle
- Join algorithms and the row-estimate cascademiddle
- pg_statistic, ANALYZE, and production observabilitymiddle
- Extended statistics: fixing correlated-column estimate failuressenior
- Plan cache, cost-constant tuning, and planner internalssenior
- Production failure modes and plan stabilitysenior
- MVCC: why readers and writers never wait for each otherjunior
- Row versions and snapshots: the on-disk mechanicsmiddle
- HOT updates and isolation levels: what you gain and what you paymiddle
- Vacuum and bloat: keeping the storage tax boundedmiddle
- CLOG, XID wraparound, and MultiXact: deep visibility internalssenior
- SSI internals and production autovacuum tuningsenior
- Real-world MVCC failures, deployment patterns, and distributed snapshotssenior
- Connection pools: amortising the cost of a Postgres backendjunior
- PgBouncer session, transaction, and statement modesmiddle
- Pool sizing: the (cores × 2) + spindles formula and the two-layer stackmiddle
- Pool exhaustion and idle-in-transaction: the 3 AM failure modemiddle
- Migrating to transaction mode: rollout playbook and PgBouncer 1.21 prepared statementsmiddle
- The Postgres process model and why raising max_connections degrades throughputsenior
- Pooler landscape 2026, serverless connection storms, and the full failure-mode taxonomysenior
- What a schema migration is and why it replaces ad-hoc DDLjunior
- ADD COLUMN: instant in PG 11+ vs rewrite in older Postgresjunior
- The lock-queue failure mode: why instant DDL can freeze the databasemiddle
- Safe DDL patterns: NOT VALID, CONCURRENTLY, and unsafe-op fixesmiddle
- Expand-contract: zero-downtime for breaking schema changesmiddle
- Advisory locks, migration tools, and deploy coordinationsenior
- Migration failure taxonomy and production disciplinesenior
- Why sharding exists: the single-Postgres ceilingjunior
- Shard-key selection: hash, range, list, and directory strategiesmiddle
- Partitioning vs sharding: same word, two different thingsmiddle
- Co-location and Citus: the invariant that makes sharding usablemiddle
- The hot-shard failure mode: detection, isolation, and durable policymiddle
- Schema-based sharding and multi-tenancy alternativessenior
- Online resharding, 2PC, and the operational cost of shardingsenior
- The seven acts: from CREATE TABLE to Citusjunior
- Acts 1–3 in depth: schema, indexes, and planner statisticsmiddle
- Acts 4–6 in depth: MVCC bloat, connection pooling, and safe migrationsmiddle
- Act 7 in depth: sharding, co-location, and the seven-tier tradeoff cascademiddle
- Observability, anti-patterns, and production triagesenior
- Raft roles, terms, and why majority quorums prevent split brainjunior
- How Raft replicates a log entry and decides it is safe to commitmiddle
- Raft leader election: timeouts, voting rules, and the four safety propertiesmiddle
- Raft in the real world: partitions, slow disks, and client routingmiddle
- Raft extensions: pre-vote, learners, snapshots, and linearizable readssenior
- Raft in production: membership changes, Multi-Raft, and observabilitysenior
- Where data fetching happens — and why it decides LCPjunior
- Fetch waterfalls — diagnosis and the Promise.all curemiddle
- React Server Components and Suspense streamingmiddle
- Client-side cache: TanStack Query, SWR, and stale-while-revalidatemiddle
- LCP, prefetch, and race conditions in interactive fetchingmiddle
- Senior internals: RSC payload, caching layers, and production failure modessenior
- Bits on the wirejunior
- Latency mathmiddle
- Bufferbloat and congestionsenior
- The physical frontiersenior
- The three-way handshakejunior
- Sequence numbers and connection statemiddle
- Flow control and congestion controlmiddle
- BBR, production observability, and beyond TCPsenior
- DNS: what it does and why it existsjunior
- The resolver walk: referrals, record types, and gluemiddle
- TTL, caching, and DNS propagationmiddle
- The 1-RTT handshake: key shares and ECDHEmiddle
- Session resumption and 0-RTTmiddle
- CDN: putting content next doorjunior
- Anycast and GeoDNS: routing to the nearest edgemiddle
- Tiered cache and Cache-Controlmiddle
- Vary header and cache keysmiddle
- Stale-while-revalidate and cache stampedesenior
- Edge workers and edge-side compositionsenior
- CDN operations and observabilitysenior
- WebSocket: the HTTP upgrade handshakejunior
- WebSocket frame format: opcodes, masking, fragmentationmiddle
- WebSocket vs SSE vs long-polling: choosing the right transportmiddle
- WebSocket backpressure: when clients can''''t keep upmiddle
- Reconnection: jittered backoff, thundering herd, message resumptionsenior
- WebSocket at scale: HTTP/2 multiplexing, permessage-deflate, C10Msenior
- WebSocket in production: proxies, security, and distributed architecturesenior
- What reverse proxies dojunior
- Balancing algorithms: round-robin to power-of-two-choicesmiddle
- L4 vs L7 load balancing and client-IP preservationmiddle
- Health checks, connection draining, and slow startmiddle
- Session affinity, consistent hashing, and the right fixmiddle
- Retry storms, circuit breakers, and load sheddingsenior
- Resilient LB architecture: anycast, zone-aware routing, and observabilitysenior
- Why QUIC and not TCP+TLSjunior
- QUIC streams and head-of-line blockingjunior
- Integrated handshake and 1-RTTmiddle
- Connection IDs and network migrationmiddle
- Loss detection and congestion controlmiddle
- 0-RTT resumption and packet encryptionsenior
- Deployment tradeoffs and CPU costsenior
- DDoS: what it is and why it worksjunior
- Amplification attacks and state exhaustionmiddle
- Rate limiting: algorithms and architecturemiddle
- WAFs, firewalls, mTLS, and HSTSmiddle
- DNS cache poisoning and BGP hijackingsenior
- Defense-in-depth architecture and attack economicssenior
- The twelve layers: one URL, seven actorsjunior
- DNS, TCP, TLS in sequence: where the milliseconds gomiddle
- Critical render path and Core Web Vitalsmiddle
- Proxy intercepts and security gates: rate limiters, WAF, mTLSmiddle
- Alternate paths: QUIC 0-RTT, WebSocket upgrade, connection migrationmiddle
- Observability: distributed traces, USE/RED, and samplingsenior
- Resilience: cascading retries, circuit breakers, and error budgetssenior
- What the three signals are: logs, metrics, and tracesjunior
- Metrics and cardinality: the cost model of a time-series databasemiddle
- Logs and volume: the cost model of structured loggingmiddle
- Traces and sampling: the cost model of distributed tracingmiddle
- Join keys and exemplars: making the three signals composemiddle
- Observability 2.0: wide events and the cost shiftsenior
- Failure modes and engineering practice: cardinality budgets, PII, and samplingsenior
- Why structured logs exist: the diary vs the spreadsheetjunior
- The production log schema: fields every line must carrymiddle
- Log levels and alert routingmiddle
- Sampling strategies and log costmiddle
- PII redaction and log injectionsenior
- Trace context propagation in logssenior
- OTel Logs Data Model and audit logs as a subsystemsenior
- OTel signals, Semantic Conventions, and the OTLP wire formatmiddle
- Auto-instrumentation and manual spans: the 80/20 of OTelmiddle
- The OTel Collector: receivers, processors, exporters, and deployment patternsmiddle
- Sampling strategies: head, tail, and parent-basedmiddle
- Vendor neutrality, eBPF instrumentation, the Operator, and browser/serverless OTelsenior
- Operating the OTel Collector: reliability, version skew, failure modes, and governancesenior
- RED and USE: two checklists, one triage disciplinejunior
- Instrumenting RED in Prometheus: counters, histograms, and cardinality disciplinemiddle
- USE on Linux: CPU, memory, disk, network, and PSImiddle
- Golden signals, dashboard layout, and service mesh auto-REDmiddle
- Cardinality as a cost driver: labels, PII, exemplars, and samplingmiddle
- Native histograms, SLO tie-in, and production failure patternsmiddle
- SLI, SLO, and the error budget: reliability by the numbersjunior
- Choosing SLIs and SLO targets: ratios, not feelingsmiddle
- Multi-window multi-burn-rate alerting: why AND beats ORmiddle
- Error budget policy, latency SLOs, and composite journeysmiddle
- Iceberg SLIs, composite SLO math, and SLA vs SLOsenior
- Production SLO failures, self-observability, security, and the big picturesenior
- Flame graphs: reading the picture that shows where time goesjunior
- Sampling vs instrumentation profiling: why 99 Hz wins in productionmiddle
- Profile types: CPU, memory, off-CPU, mutex — which one to reach formiddle
- Continuous profiling: always-on flame graphs with eBPF and trace-id correlationmiddle
- How flame graphs are built from samples, and the production workflows that use themmiddle
- Linux perf, eBPF internals, PGO, and the limits of samplingsenior
- Profiling in production: security, war stories, OTel profiles, and the infrastructure designsenior
- The debugging funnel: SLO → RED → trace → profilejunior
- OTel architecture: one SDK, four signals, one wire formatmiddle
- Cost discipline: keeping observability under 5% of infra spendmiddle
- The incident loop: from pager to postmortem to preventionmiddle
- Scale, security, and the ROI of observable systemssenior
- At-most-once, at-least-once, exactly-once: the three delivery contractsjunior
- The three failure legs — where duplicates and losses actually happenmiddle
- Consumer-side dedup: the cheapest path to exactly-once processingmiddle
- Kafka exactly-once semantics: idempotent producer and transactionsmiddle
- SQS visibility timeout, DLQ, and the outbox patternmiddle
- Exactly-once in production: impossibility proof, hybrid patterns, and real incidentssenior
- What OAuth is and why passwords are not the answerjunior
- Authorization code flow with PKCEmiddle
- ID token validation and JWKS cache managementmiddle
- Refresh token rotation and scope-based least privilegemiddle
- Sender-constrained tokens: DPoP and mTLSsenior
- OAuth in production: audience attacks, observability, and real failuressenior