Browser & Frontend Runtime
Production observability: LoAF, INP, and the full attack surface
Your page scores 95 on Lighthouse in the office. On a user’s Pixel 4a in Jakarta with 3G throttling, the INP is 340 ms. These are two different measurements of two different things. Only one of them matters.
LoAF and INP: two complementary signals
PerformanceLongAnimationFrameTiming (LoAF, shipped 2023–2024 in Chromium) reports any frame that took longer than 50 ms, with a breakdown of what dominated: render time, blocking JS, forced layout. It is a diagnostic tool — it tells you what ran long, not what the user felt.
INP (Interaction to Next Paint) measures the time from a user input event (click, key, tap) to the next paint that visibly responds. INP became a Core Web Vital in March 2024, replacing FID. A poor INP score (>200 ms p75) almost always traces to one of two pipeline problems:
- A long JS task on the input handler delaying rAF
- A forced sync layout from reading geometry inside the handler
LoAF gives you the data to attribute these in production; INP gives you the metric the user feels. Together they close the loop from production telemetry back to specific pipeline diagnostics.
INP diagnosis path
INP > 200 ms — user perceives sluggish interactions
LoAF — which frame, what dominated (JS / layout / render)
Off-main-thread scroll
Browsers have shipped compositor-thread scrolling for a decade: when the user scrolls a normal page, the compositor translates the viewport on the GPU without touching the main thread. A page can scroll smoothly even while a long JS task runs.
The catch: any element with a JS scroll handler attached non-passively (without {passive: true}) forces the browser to fall back to main-thread scrolling, because the handler might call preventDefault(). Always pass passive: true to scroll, wheel, and touchmove listeners unless you actually need preventDefault. Modern browsers warn in DevTools when a non-passive listener delays scrolling.
Display locking and content-visibility
The mechanism behind content-visibility: auto is “display locking”: the browser pauses rendering for the locked subtree (no style calc, no layout, no paint) and replaces it with an intrinsic-size placeholder. When the subtree intersects the viewport via the browser’s internal IntersectionObserver, it unlocks and renders.
Numbers: a 10 000-row table that previously cost 1 200 ms of style + layout renders in ~10 ms with content-visibility: auto.
Trade-off: scrolling into a previously locked region briefly pauses to render it. If rows are expensive, you see a single-frame stutter at the unlock boundary. Pair with contain-intrinsic-size to give the browser a realistic placeholder size so scroll position is not jumpy.
Reduced-motion as a render-budget escape valve
@media (prefers-reduced-motion: reduce) is set by users who experience motion sickness or want lower power consumption. Beyond accessibility, it is a render-budget escape valve: any compositor-driven animation can be replaced by an instant state change when reduced-motion is on, freeing the compositor of per-frame work. Battery-constrained users on a budget Android phone get a meaningful battery saving.
Web Workers and main-thread offload
When the main thread hits its ceiling, the only way to lower it is to move work. Web Workers execute JS on a separate thread without DOM access; serialisation via postMessage costs ~1 ms/MB, which is worthwhile for CPU-heavy tasks (large JSON parse, compression, cryptography, markup processing).
OffscreenCanvas gives direct canvas API access from a worker, bypassing the main thread entirely. SharedArrayBuffer + Atomics provide synchronisation primitives between threads for high-frequency data (audio, real-time sensor feeds). The entry cost is high (structured cloning, message-passing architecture), but the performance ceiling is proportionally higher: a 4-core phone with a busy main thread still has 3 idle cores most applications never use.
Service Worker and the first-paint pipeline
Service Workers are not part of the render pipeline, but they critically affect its input. A Service Worker that answers requests from cache (cache-first for static assets, network-first for API) delivers first paint in 50 ms on return visits instead of 500 ms — 4 frames vs 30 frames at 60 fps.
Key constraint: the Service Worker script itself runs on a separate thread, but its startup (when it intercepts the first request) has a small latency cost. Keep the Service Worker thin and fast; a 200 ms startup on the fetch interception delays the first HTML byte and the entire parse pipeline with it.
Performance CI: catching regressions before merge
Budgets hold only when regressions are caught before merge. A realistic CI pipeline has three levels:
- Lighthouse CI in headless Chrome on every PR — gates on absolute metrics (LCP < 2.5 s, INP < 200 ms, CLS < 0.1), blocks merge on regression.
- Synthetic benchmarks on critical scenarios (open page, scroll list 1000×, click 5 buttons) — measures p95 frame duration and p95 LoAF, compares to baseline branch.
- RUM (real user monitoring) on production — sends INP, LCP, CLS percentiles to Datadog or similar, alerts when p75 INP crosses 200 ms on any user segment (country, device, app version).
Without all three levels, a render regression lands in production, lives there silently for months, and is discovered when a user complaint finally makes someone look.
Profiling on real hardware vs DevTools throttling. The “4× CPU slowdown” in DevTools Performance is an approximation, not a replacement for real hardware. An M2 MacBook at 4× slowdown still has a different memory model, different GPU, and different thermal throttling profile than a Pixel 6a. Profile on a physical mid-range Android at least once a week using Chrome for Android + remote debugging, and compare frame durations. A >30% difference means you have a mobile regression invisible on desktop.
Edge cases
Layer squashing is the compositor’s answer to overlap-induced layer explosion. When many adjacent non-animating elements are promoted (due to overlap with a single animated layer), the compositor squashes them into a single shared “squashed layer” bitmap. This reduces GPU memory at the cost of a larger single bitmap — if one element in the squash changes, the entire squashed layer must repaint. The squashing heuristic is not user-controllable; the only remedy is to isolate animated layers from non-animating neighbours so the overlap rule does not trigger.
Design the scrolling behaviour for a virtualised chat list that holds 50 000 messages and must hit 60 fps on a mid-range Android phone.
- Frame budget: 16.67 ms. Realistic main-thread budget after browser overhead: ~10 ms.
- Layout must not depend on off-screen rows.
- Composite-only path during scroll. Layout and paint allowed only when new rows enter the viewport.
- GPU memory: assume 200 MB available. Layer count must stay below 30 at any time.
- Resize handler must not loop reads and writes (no forced reflow).
- Virtualisation caps DOM size regardless of dataset size.
- Transform-based positioning routes scroll through the compositor.
- Scroll handler does no measurements — pure math from scrollTop.
- Will-change is scoped to the animating element and the active window.
- Resize is batched: all reads, then all writes.
A click handler runs in 80 ms. The user perceives a delay before the UI updates. Which Core Web Vital measures this and what does the pipeline have to do with it?
A touchmove listener is attached without `{passive: true}`. What performance regression does this cause?
- 01What does LoAF report, and how does it differ from INP?
- 02Why do non-passive scroll/touchmove listeners cause jank?
- 03Name the three CI levels for catching render regressions before production.
LoAF attributes long frames; INP measures the interaction latency the user feels — poor INP (>200 ms p75) traces to a long JS task or a forced sync layout in the input handler. Non-passive scroll listeners fall back to main-thread scrolling; passive: true restores compositor-thread scroll. content-visibility: auto display-locks off-screen subtrees, dropping a 1200 ms layout to 10 ms. @media (prefers-reduced-motion) is both an accessibility requirement and a render-budget optimisation. Web Workers offload CPU-heavy JS from the main thread; Service Workers serve first paint from cache. CI needs three layers: Lighthouse per PR, synthetic p95 LoAF on critical paths, and RUM alerts on production INP percentiles. The complete attack surface — parser-blocking scripts, oversized CSS, complex selectors, deep flex layouts, paint-heavy filters, layer overflow, layout thrash, non-passive listeners, tasks >50 ms, and long input handlers — maps one-to-one onto specific pipeline stages.
appears again in162
- Why GraphQL gets N+1junior
- DataLoader mechanics: tick-boundary batchingmiddle
- Batch function contracts: ordering, shapes, errorsmiddle
- Federation and lookahead: batching beyond DataLoadermiddle
- Query complexity defences: depth, cost, persisted queriesmiddle
- Senior GraphQL API: scheduling contract, tenant isolation, observabilitysenior
- Why idempotency: making retries safejunior
- Server-side state machine: four states of an idempotency keymiddle
- Outbox and inbox: effectively-once across the dual-write boundarymiddle
- Concurrency and cache architecture for idempotency at scalesenior
- Observability, production failures, and global-scale designsenior
- What is a cache stampede and why it makes things worsejunior
- Lock and single-flight: bounding concurrent rebuildsmiddle
- XFetch: coordination-free probabilistic early expirationmiddle
- Stale-while-revalidate and CDN request coalescingmiddle
- Detecting stampedes and designing TTL for productionmiddle
- Metastable failure, fencing tokens, and production postmortemssenior
- What a relation is: tables, rows, keys, and constraintsjunior
- Constraints, keys, and Postgres data typesmiddle
- Normal forms, denormalization, and why schemas stickmiddle
- JSONB, arrays, and when a side table winsmiddle
- Heap storage, TOAST, and column alignmentsenior
- Schema integrity: deferral, versioning, and production failure modessenior
- Relational vs document, wide-column, graph, and key-valuesenior
- Index-only scans, the Visibility Map, and INCLUDEsenior
- Production failure modes and the index audit playbooksenior
- pg_statistic, ANALYZE, and production observabilitymiddle
- Production failure modes and plan stabilitysenior
- MVCC: why readers and writers never wait for each otherjunior
- Row versions and snapshots: the on-disk mechanicsmiddle
- HOT updates and isolation levels: what you gain and what you paymiddle
- Vacuum and bloat: keeping the storage tax boundedmiddle
- CLOG, XID wraparound, and MultiXact: deep visibility internalssenior
- SSI internals and production autovacuum tuningsenior
- Real-world MVCC failures, deployment patterns, and distributed snapshotssenior
- Connection pools: amortising the cost of a Postgres backendjunior
- PgBouncer session, transaction, and statement modesmiddle
- Pool sizing: the (cores × 2) + spindles formula and the two-layer stackmiddle
- Pool exhaustion and idle-in-transaction: the 3 AM failure modemiddle
- Migrating to transaction mode: rollout playbook and PgBouncer 1.21 prepared statementsmiddle
- The Postgres process model and why raising max_connections degrades throughputsenior
- Pooler landscape 2026, serverless connection storms, and the full failure-mode taxonomysenior
- What a schema migration is and why it replaces ad-hoc DDLjunior
- ADD COLUMN: instant in PG 11+ vs rewrite in older Postgresjunior
- The lock-queue failure mode: why instant DDL can freeze the databasemiddle
- Safe DDL patterns: NOT VALID, CONCURRENTLY, and unsafe-op fixesmiddle
- Expand-contract: zero-downtime for breaking schema changesmiddle
- Advisory locks, migration tools, and deploy coordinationsenior
- Migration failure taxonomy and production disciplinesenior
- Why sharding exists: the single-Postgres ceilingjunior
- Shard-key selection: hash, range, list, and directory strategiesmiddle
- Partitioning vs sharding: same word, two different thingsmiddle
- Co-location and Citus: the invariant that makes sharding usablemiddle
- The hot-shard failure mode: detection, isolation, and durable policymiddle
- Schema-based sharding and multi-tenancy alternativessenior
- Online resharding, 2PC, and the operational cost of shardingsenior
- The seven acts: from CREATE TABLE to Citusjunior
- Acts 1–3 in depth: schema, indexes, and planner statisticsmiddle
- Acts 4–6 in depth: MVCC bloat, connection pooling, and safe migrationsmiddle
- Act 7 in depth: sharding, co-location, and the seven-tier tradeoff cascademiddle
- Observability, anti-patterns, and production triagesenior
- Raft roles, terms, and why majority quorums prevent split brainjunior
- How Raft replicates a log entry and decides it is safe to commitmiddle
- Raft leader election: timeouts, voting rules, and the four safety propertiesmiddle
- Raft in the real world: partitions, slow disks, and client routingmiddle
- Raft extensions: pre-vote, learners, snapshots, and linearizable readssenior
- Raft in production: membership changes, Multi-Raft, and observabilitysenior
- Where data fetching happens — and why it decides LCPjunior
- Fetch waterfalls — diagnosis and the Promise.all curemiddle
- React Server Components and Suspense streamingmiddle
- Client-side cache: TanStack Query, SWR, and stale-while-revalidatemiddle
- LCP, prefetch, and race conditions in interactive fetchingmiddle
- Senior internals: RSC payload, caching layers, and production failure modessenior
- The IP envelopejunior
- Reading the IP headermiddle
- The three-way handshakejunior
- Sequence numbers and connection statemiddle
- DNS: what it does and why it existsjunior
- The resolver walk: referrals, record types, and gluemiddle
- TTL, caching, and DNS propagationmiddle
- What TLS does and why it existsjunior
- The 1-RTT handshake: key shares and ECDHEmiddle
- Session resumption and 0-RTTmiddle
- Key schedule, SNI, ALPN, and extensionssenior
- 0-RTT defenses, ECH, hybrid PQ, and production TLSsenior
- WebSocket: the HTTP upgrade handshakejunior
- WebSocket frame format: opcodes, masking, fragmentationmiddle
- WebSocket backpressure: when clients can''''t keep upmiddle
- Reconnection: jittered backoff, thundering herd, message resumptionsenior
- WebSocket at scale: HTTP/2 multiplexing, permessage-deflate, C10Msenior
- WebSocket in production: proxies, security, and distributed architecturesenior
- What reverse proxies dojunior
- Health checks, connection draining, and slow startmiddle
- Session affinity, consistent hashing, and the right fixmiddle
- Retry storms, circuit breakers, and load sheddingsenior
- Resilient LB architecture: anycast, zone-aware routing, and observabilitysenior
- Why QUIC and not TCP+TLSjunior
- Connection IDs and network migrationmiddle
- 0-RTT resumption and packet encryptionsenior
- DDoS: what it is and why it worksjunior
- Amplification attacks and state exhaustionmiddle
- Rate limiting: algorithms and architecturemiddle
- WAFs, firewalls, mTLS, and HSTSmiddle
- DNS cache poisoning and BGP hijackingsenior
- Defense-in-depth architecture and attack economicssenior
- The twelve layers: one URL, seven actorsjunior
- DNS, TCP, TLS in sequence: where the milliseconds gomiddle
- Proxy intercepts and security gates: rate limiters, WAF, mTLSmiddle
- Alternate paths: QUIC 0-RTT, WebSocket upgrade, connection migrationmiddle
- Observability: distributed traces, USE/RED, and samplingsenior
- Resilience: cascading retries, circuit breakers, and error budgetssenior
- What the three signals are: logs, metrics, and tracesjunior
- Why structured logs exist: the diary vs the spreadsheetjunior
- The production log schema: fields every line must carrymiddle
- PII redaction and log injectionsenior
- OTel Logs Data Model and audit logs as a subsystemsenior
- What is OpenTelemetry: API, SDK, Collector, OTLPjunior
- OTel signals, Semantic Conventions, and the OTLP wire formatmiddle
- The OTel Collector: receivers, processors, exporters, and deployment patternsmiddle
- Vendor neutrality, eBPF instrumentation, the Operator, and browser/serverless OTelsenior
- Operating the OTel Collector: reliability, version skew, failure modes, and governancesenior
- SLI, SLO, and the error budget: reliability by the numbersjunior
- Error budget policy, latency SLOs, and composite journeysmiddle
- Production SLO failures, self-observability, security, and the big picturesenior
- What is trace propagation and why broken propagation is worse than nonejunior
- traceparent and tracestate: the W3C header format in fullmiddle
- Baggage and async boundaries: carrying context across queues and callbacksmiddle
- Async context per language, service mesh, B3 migration, and securitysenior
- Production propagation failures, span links, and platform designsenior
- The debugging funnel: SLO → RED → trace → profilejunior
- OTel architecture: one SDK, four signals, one wire formatmiddle
- The incident loop: from pager to postmortem to preventionmiddle
- Scale, security, and the ROI of observable systemssenior
- Cache lines, struct layout, and false sharingmiddle
- SIMD, SoA vs AoS, and memory bandwidthmiddle
- Cache-oblivious algorithms, PGO, and production failuressenior
- GC in production: observability, security, edge cases, and fleet governancesenior
- Batching: amortize fixed cost per operationjunior
- The batching window: size and wait timemiddle
- Batching in Kafka and Postgresmiddle
- io_uring and observability of batchingmiddle
- From Nagle to io_uring: evolution of batchingmiddle
- Backpressure, failure isolation, and batch security in productionsenior
- CI enforcement and RUM: making budgets stickmiddle
- V8 JIT pipeline, HTTP priorities, and bundle securitysenior
- The performance loop: discipline, not a projectjunior
- Classify and fix: matching bottleneck families to remediesmiddle
- Observability stack and CI gates: catching regressions before they shipmiddle
- Incident to enforcement: SLO burn to verified fix in 35 minutesmiddle
- Culture, economics, and org-scale performancesenior
- At-most-once, at-least-once, exactly-once: the three delivery contractsjunior
- The three failure legs — where duplicates and losses actually happenmiddle
- Consumer-side dedup: the cheapest path to exactly-once processingmiddle
- Kafka exactly-once semantics: idempotent producer and transactionsmiddle
- SQS visibility timeout, DLQ, and the outbox patternmiddle
- Exactly-once in production: impossibility proof, hybrid patterns, and real incidentssenior
- What OAuth is and why passwords are not the answerjunior
- Authorization code flow with PKCEmiddle
- ID token validation and JWKS cache managementmiddle
- Refresh token rotation and scope-based least privilegemiddle
- Sender-constrained tokens: DPoP and mTLSsenior
- OAuth in production: audience attacks, observability, and real failuressenior