Browser & Frontend Runtime
Microtask starvation, Long Tasks, and LoAF
A user reports the page “froze completely” for several seconds. DevTools Performance shows one uninterrupted yellow scripting bar. No function call dominates the flame chart — the work keeps re-scheduling itself as microtasks and the loop never escapes.
Failure mode: microtask starvation
Microtasks drain to empty between tasks. If a microtask schedules another microtask before returning, the loop never escapes the microtask checkpoint — no rendering, no input, no further tasks. The pathological example:
function loop() {
Promise.resolve().then(loop); // schedules itself as a microtask
}
loop();This freezes the page indefinitely: the microtask queue is never empty, step 4 (rendering) never runs, the user sees a frozen tab. DevTools Performance shows it as a single uninterrupted yellow scripting bar. The same pattern appears in production as accidental recursion through async chains: a .then that resubscribes to the same observable, a router middleware that re-resolves on every change, a state library that re-fires reactions inside a reaction.
Detection: long-task entries with no clear single function, scripting bar that grows without bound, INP > 1 s.
Cure: insert at least one task-level yield (setTimeout, MessageChannel, scheduler.postTask) into the chain.
- Yellow scripting bar
- continuous, never breaks
- Long Tasks API entries
- duration grows without bound
- INP
- > 1 000 ms
- Frame counter
- 0 fps during starvation
- Fix
- insert a task-level yield in the chain
A `MutationObserver` callback fires synchronously after a DOM mutation. What kind of work is it queued as?
Long Tasks API and Long Animation Frames
PerformanceLongTaskTiming. Shipped in 2017, fires a PerformanceObserver entry for any task that exceeds 50 ms on the main thread. Each entry includes start time, duration, and an attribution array with the originating browsing context. Limitation: the attribution rarely points to a specific function; it tells you “something inside iframe A took 230 ms” — useful for SLO dashboards, less useful for root-cause. Use the Long Tasks API to count long tasks per session, not to debug them.
PerformanceLongAnimationFrameTiming (LoAF). Shipped 2023–2024 in Chromium. Fires for any frame whose render time exceeds 50 ms. A LoAF entry includes per-script attribution: an array of which scripts ran in the frame, what they did (event-handler, classic-script, module-script, user-callback), how long each took. This is the production diagnostic that long tasks should have been from the start. Combine with INP: when INP regresses, query LoAF entries from the same session, find the offending frame, get the script attribution, deploy a fix. The full pipeline: from user-perceived metric (INP) to specific script (LoAF) to specific function (sourcemap on the script URL).
[Long Task] 312 ms
attribution: same-origin
startTime: 2456.3
duration: 312.4
[Long Task] 287 ms
attribution: same-origin
startTime: 5102.7
duration: 287.1
[INP candidate] 410 ms
type: pointerdown -> click -> next paint
startTime: 2401.0
processingStart: 2456.3
processingEnd: 2768.7
presentationTime: 2811.0
attribution: handleSearch (search.js:142) A search input shows 410 ms INP and the long-task log points to handleSearch. The handler does: validate query → call setSearchTerm (Redux) → trigger debounced fetch → wait for results. Where is the 312 ms?
scheduler.yield() and the Scheduler API
What scheduler.yield() does. The Scheduler API (Chrome 115+, partial in Edge, Firefox/Safari behind flag as of late 2025) gives the platform a first-class yield primitive. Awaiting scheduler.yield() suspends the current task, drains the microtask queue, allows input and rendering to run, and resumes the suspended task as the next task — with priority that prevents low-priority work from cutting in front. Previously, yielding via setTimeout(0) worked for rendering but lost queue position (any task scheduled in the meantime would run first). With scheduler.yield(), you can split a 200 ms task into four 50 ms chunks that are still effectively one logical operation from the user’s perspective.
scheduler.postTask() with priorities. The same API exposes scheduler.postTask(callback, { priority }) with three priorities: user-blocking (input-response work), user-visible (default), background (non-urgent). Practical pattern: route input handlers through user-blocking so they jump the queue ahead of background work like analytics flushes; route background analytics through background so they yield to anything urgent. The scheduler has access to system signals (battery, thermal throttling, page visibility) that JS code does not.
Yield a long task without losing logical continuity
1/3Which specification defines the event loop, task queues, microtask queue, and the rendering steps in step-by-step detail?
Design the input pipeline for a search bar that filters 50 000 client-side items and must keep INP under 200 ms p75 on mid-range Android.
- Frame budget: ~10 ms after browser overhead.
- INP target: ≤200 ms p75.
- No long tasks > 50 ms on the main thread during typing.
- Filter results must update visibly within 200 ms of the most recent keystroke.
- Off-screen results may render lazily but on-screen results must be correct.
- Browser support: Chrome, Safari, Firefox (no Worker fallback drama).
- Debounce keystrokes to coalesce work.
- Chunk the filter into ≤50 ms tasks with scheduler.yield() between.
- Use useTransition so React reconciliation does not block input.
- For very large datasets, push the filter onto a Worker — main thread becomes free for input/rendering.
- Wire LoAF telemetry so regressions are caught in production, not local dev.
LoAF (PerformanceLongAnimationFrameTiming) differs from PerformanceLongTaskTiming in what key way?
- 01A search input shows 410 ms INP. LoAF reports a 312 ms script attributed to handleSearch. Walk through how to root-cause this from telemetry to fix.
- 02What is the difference between scheduler.yield() and await Promise.resolve() as yield mechanisms?
- 03Describe the microtask starvation pattern and give one production scenario where it appears accidentally.
Microtask starvation occurs when a microtask enqueues another microtask before returning — the loop is trapped in the microtask checkpoint and never reaches the render or input steps. It shows in DevTools as a single unbroken yellow bar and in production as INP > 1 s. The Long Tasks API (PerformanceLongTaskTiming, 2017) counts tasks exceeding 50 ms but gives only browsing-context attribution. LoAF (PerformanceLongAnimationFrameTiming, 2023) fires per frame and gives per-script attribution, making it the correct tool for root-causing INP regressions in production. The Scheduler API’s scheduler.yield() (Chrome 115+) provides a structured task-level yield that preserves queue priority, while scheduler.postTask() lets you assign user-blocking, user-visible, or background priority to work so the browser’s own scheduler — which has access to thermal and battery signals — can make better decisions than any hand-rolled priority queue.
appears again in143
- Why GraphQL gets N+1junior
- DataLoader mechanics: tick-boundary batchingmiddle
- Batch function contracts: ordering, shapes, errorsmiddle
- Federation and lookahead: batching beyond DataLoadermiddle
- Query complexity defences: depth, cost, persisted queriesmiddle
- Senior GraphQL API: scheduling contract, tenant isolation, observabilitysenior
- Why idempotency: making retries safejunior
- Server-side state machine: four states of an idempotency keymiddle
- Outbox and inbox: effectively-once across the dual-write boundarymiddle
- Concurrency and cache architecture for idempotency at scalesenior
- Observability, production failures, and global-scale designsenior
- What is a cache stampede and why it makes things worsejunior
- Lock and single-flight: bounding concurrent rebuildsmiddle
- XFetch: coordination-free probabilistic early expirationmiddle
- Stale-while-revalidate and CDN request coalescingmiddle
- Detecting stampedes and designing TTL for productionmiddle
- Metastable failure, fencing tokens, and production postmortemssenior
- What a relation is: tables, rows, keys, and constraintsjunior
- Constraints, keys, and Postgres data typesmiddle
- Normal forms, denormalization, and why schemas stickmiddle
- JSONB, arrays, and when a side table winsmiddle
- Heap storage, TOAST, and column alignmentsenior
- Schema integrity: deferral, versioning, and production failure modessenior
- Relational vs document, wide-column, graph, and key-valuesenior
- Index-only scans, the Visibility Map, and INCLUDEsenior
- Production failure modes and the index audit playbooksenior
- pg_statistic, ANALYZE, and production observabilitymiddle
- Production failure modes and plan stabilitysenior
- MVCC: why readers and writers never wait for each otherjunior
- Row versions and snapshots: the on-disk mechanicsmiddle
- HOT updates and isolation levels: what you gain and what you paymiddle
- Vacuum and bloat: keeping the storage tax boundedmiddle
- CLOG, XID wraparound, and MultiXact: deep visibility internalssenior
- SSI internals and production autovacuum tuningsenior
- Real-world MVCC failures, deployment patterns, and distributed snapshotssenior
- Connection pools: amortising the cost of a Postgres backendjunior
- PgBouncer session, transaction, and statement modesmiddle
- Pool sizing: the (cores × 2) + spindles formula and the two-layer stackmiddle
- Pool exhaustion and idle-in-transaction: the 3 AM failure modemiddle
- Migrating to transaction mode: rollout playbook and PgBouncer 1.21 prepared statementsmiddle
- The Postgres process model and why raising max_connections degrades throughputsenior
- Pooler landscape 2026, serverless connection storms, and the full failure-mode taxonomysenior
- What a schema migration is and why it replaces ad-hoc DDLjunior
- ADD COLUMN: instant in PG 11+ vs rewrite in older Postgresjunior
- The lock-queue failure mode: why instant DDL can freeze the databasemiddle
- Safe DDL patterns: NOT VALID, CONCURRENTLY, and unsafe-op fixesmiddle
- Expand-contract: zero-downtime for breaking schema changesmiddle
- Advisory locks, migration tools, and deploy coordinationsenior
- Migration failure taxonomy and production disciplinesenior
- Why sharding exists: the single-Postgres ceilingjunior
- Shard-key selection: hash, range, list, and directory strategiesmiddle
- Partitioning vs sharding: same word, two different thingsmiddle
- Co-location and Citus: the invariant that makes sharding usablemiddle
- The hot-shard failure mode: detection, isolation, and durable policymiddle
- Schema-based sharding and multi-tenancy alternativessenior
- Online resharding, 2PC, and the operational cost of shardingsenior
- The seven acts: from CREATE TABLE to Citusjunior
- Acts 1–3 in depth: schema, indexes, and planner statisticsmiddle
- Acts 4–6 in depth: MVCC bloat, connection pooling, and safe migrationsmiddle
- Act 7 in depth: sharding, co-location, and the seven-tier tradeoff cascademiddle
- Observability, anti-patterns, and production triagesenior
- Raft roles, terms, and why majority quorums prevent split brainjunior
- How Raft replicates a log entry and decides it is safe to commitmiddle
- Raft leader election: timeouts, voting rules, and the four safety propertiesmiddle
- Raft in the real world: partitions, slow disks, and client routingmiddle
- Raft extensions: pre-vote, learners, snapshots, and linearizable readssenior
- Raft in production: membership changes, Multi-Raft, and observabilitysenior
- Where data fetching happens — and why it decides LCPjunior
- Fetch waterfalls — diagnosis and the Promise.all curemiddle
- React Server Components and Suspense streamingmiddle
- Client-side cache: TanStack Query, SWR, and stale-while-revalidatemiddle
- LCP, prefetch, and race conditions in interactive fetchingmiddle
- Senior internals: RSC payload, caching layers, and production failure modessenior
- The three-way handshakejunior
- Sequence numbers and connection statemiddle
- DNS: what it does and why it existsjunior
- The resolver walk: referrals, record types, and gluemiddle
- TTL, caching, and DNS propagationmiddle
- The 1-RTT handshake: key shares and ECDHEmiddle
- Session resumption and 0-RTTmiddle
- WebSocket: the HTTP upgrade handshakejunior
- WebSocket frame format: opcodes, masking, fragmentationmiddle
- WebSocket backpressure: when clients can''''t keep upmiddle
- Reconnection: jittered backoff, thundering herd, message resumptionsenior
- WebSocket at scale: HTTP/2 multiplexing, permessage-deflate, C10Msenior
- WebSocket in production: proxies, security, and distributed architecturesenior
- What reverse proxies dojunior
- Health checks, connection draining, and slow startmiddle
- Session affinity, consistent hashing, and the right fixmiddle
- Retry storms, circuit breakers, and load sheddingsenior
- Resilient LB architecture: anycast, zone-aware routing, and observabilitysenior
- Why QUIC and not TCP+TLSjunior
- Connection IDs and network migrationmiddle
- 0-RTT resumption and packet encryptionsenior
- DDoS: what it is and why it worksjunior
- Amplification attacks and state exhaustionmiddle
- Rate limiting: algorithms and architecturemiddle
- WAFs, firewalls, mTLS, and HSTSmiddle
- DNS cache poisoning and BGP hijackingsenior
- Defense-in-depth architecture and attack economicssenior
- DNS, TCP, TLS in sequence: where the milliseconds gomiddle
- Proxy intercepts and security gates: rate limiters, WAF, mTLSmiddle
- Alternate paths: QUIC 0-RTT, WebSocket upgrade, connection migrationmiddle
- Observability: distributed traces, USE/RED, and samplingsenior
- Resilience: cascading retries, circuit breakers, and error budgetssenior
- What the three signals are: logs, metrics, and tracesjunior
- Why structured logs exist: the diary vs the spreadsheetjunior
- The production log schema: fields every line must carrymiddle
- PII redaction and log injectionsenior
- OTel Logs Data Model and audit logs as a subsystemsenior
- SLI, SLO, and the error budget: reliability by the numbersjunior
- Error budget policy, latency SLOs, and composite journeysmiddle
- Production SLO failures, self-observability, security, and the big picturesenior
- The incident loop: from pager to postmortem to preventionmiddle
- Cache lines, struct layout, and false sharingmiddle
- SIMD, SoA vs AoS, and memory bandwidthmiddle
- Cache-oblivious algorithms, PGO, and production failuressenior
- GC in production: observability, security, edge cases, and fleet governancesenior
- Batching: amortize fixed cost per operationjunior
- The batching window: size and wait timemiddle
- Batching in Kafka and Postgresmiddle
- io_uring and observability of batchingmiddle
- From Nagle to io_uring: evolution of batchingmiddle
- Backpressure, failure isolation, and batch security in productionsenior
- CI enforcement and RUM: making budgets stickmiddle
- V8 JIT pipeline, HTTP priorities, and bundle securitysenior
- The performance loop: discipline, not a projectjunior
- Classify and fix: matching bottleneck families to remediesmiddle
- Observability stack and CI gates: catching regressions before they shipmiddle
- Incident to enforcement: SLO burn to verified fix in 35 minutesmiddle
- Culture, economics, and org-scale performancesenior
- At-most-once, at-least-once, exactly-once: the three delivery contractsjunior
- The three failure legs — where duplicates and losses actually happenmiddle
- Consumer-side dedup: the cheapest path to exactly-once processingmiddle
- Kafka exactly-once semantics: idempotent producer and transactionsmiddle
- SQS visibility timeout, DLQ, and the outbox patternmiddle
- Exactly-once in production: impossibility proof, hybrid patterns, and real incidentssenior
- What OAuth is and why passwords are not the answerjunior
- Authorization code flow with PKCEmiddle
- ID token validation and JWKS cache managementmiddle
- Refresh token rotation and scope-based least privilegemiddle
- Sender-constrained tokens: DPoP and mTLSsenior
- OAuth in production: audience attacks, observability, and real failuressenior