Browser & Frontend Runtime
BeginMainFrame, compositor-driven animations, and GPU memory
A CSS animation keeps running smoothly at 60 fps while the main thread is blocked for 400 ms parsing a JSON blob. A rAF animation on the same element freezes for the same 400 ms. Same hardware, same property, opposite results — because one hands control to the compositor, the other does not.
The BeginMainFrame handshake
In Chromium’s renderer process, the compositor thread is the driver and the main thread is the worker. The flow each vsync:
- Compositor sends
BeginMainFrameto the main thread - Main thread runs all per-frame work: rAF callbacks, style, layout, paint setup
- Main thread sends
CommitMainFrameback, handing off a new layer tree - Compositor asks raster workers to fill any dirty tiles with bitmaps
- Compositor performs the GPU draw and displays the frame
If the main thread is too slow, the compositor does not wait: it ships the previous frame’s layer tree again (a “drawn but stale” frame). The user perceives this as a missed input or a stutter.
BeginMainFrame arrives every ~16.67 ms whether the main thread is ready or not. This is the deep reason main-thread work cannot exceed the frame budget: the metronome is relentless.
BeginMainFrame flow
Compositor-driven CSS animations
When you write a CSS animation that only changes transform or opacity on a compositor-promoted element, the browser detects this at animation-start and hands the animation’s timeline directly to the compositor thread. From that point the main thread is not involved at all — the compositor interpolates the value frame-by-frame and renders the next bitmap directly.
This is why a CSS animation can keep running smoothly even when the main thread is completely blocked (alert dialog, long synchronous parse, debugger pause): the compositor still runs.
The same animation written in JS via rAF cannot do this, because rAF runs on the main thread. A blocked main thread freezes the rAF animation.
Compositor-driven CSS animations are the only way to guarantee animation continuity under main-thread pressure.
GPU memory exhaustion and layer eviction
Layers are not free. Each layer is a GPU bitmap costing width × height × 4 bytes. On a phone with 256 MB of GPU memory, fifty 1080p layers exhausts the budget. When the OS starts evicting tiles:
- Browser detects the evicted tiles
- Raster workers re-rasterise them on the fly
- While rasterising, the compositor composites with incomplete tiles
- The page stutters — often worse than if no layers were requested at all
The will-change anti-pattern causes this: will-change: transform on every component permanently reserves a layer per instance. A list of 100 cards each with 5 will-change’d sub-elements holds 500 layers.
The correct pattern: set will-change just before the animation starts (on mouseenter, or in a transitionstart listener), remove it on animationend or transitionend. This keeps GPU memory near zero when the user is not actively interacting.
Why this works
Why does will-change: transform cause promotion before the animation even starts? The browser needs time to rasterise the layer into GPU memory before the first frame of the animation. If it waited until the animation started, the first 1–3 frames would be blank while rasterisation ran. will-change is the hint that says “prepare now, not at animation-start.” The cost is holding the bitmap for the duration of the hint — which is why you want to remove the hint when the animation ends.
Animate a card from y=0 to y=200 over 300 ms at 60 fps. Pick the implementation.
The main thread is blocked for 400 ms by a synchronous JSON.parse. Which kind of animation continues to run smoothly at 60 fps during the block?
GPU memory balloons on a mobile device. Profiling shows 500 compositor layers. Most are from cards in a list, each with `will-change: transform` set permanently on mount. What is the targeted fix?
- 01What does the compositor do if the main thread misses a BeginMainFrame deadline?
- 02Why does a CSS transform animation survive a blocked main thread while a rAF animation does not?
- 03What is the will-change anti-pattern and how do you fix it?
The Chromium compositor sends BeginMainFrame every ~16.67 ms. The main thread responds with CommitMainFrame if it finishes on time; if not, the compositor ships the previous frame as a stale frame. CSS animations on transform or opacity of promoted elements are handed to the compositor at animation-start — the main thread is out of the loop, so they survive a blocked main thread at 60 fps. rAF animations run on the main thread and freeze when it is blocked. The will-change anti-pattern reserves GPU bitmaps permanently; on a phone with 256 MB GPU memory, 500 layers from a design system exhausts the budget and the OS evicts tiles. Scope will-change to the duration of an animation to keep GPU memory near zero at rest.
appears again in143
- Why GraphQL gets N+1junior
- DataLoader mechanics: tick-boundary batchingmiddle
- Batch function contracts: ordering, shapes, errorsmiddle
- Federation and lookahead: batching beyond DataLoadermiddle
- Query complexity defences: depth, cost, persisted queriesmiddle
- Senior GraphQL API: scheduling contract, tenant isolation, observabilitysenior
- Why idempotency: making retries safejunior
- Server-side state machine: four states of an idempotency keymiddle
- Outbox and inbox: effectively-once across the dual-write boundarymiddle
- Concurrency and cache architecture for idempotency at scalesenior
- Observability, production failures, and global-scale designsenior
- What is a cache stampede and why it makes things worsejunior
- Lock and single-flight: bounding concurrent rebuildsmiddle
- XFetch: coordination-free probabilistic early expirationmiddle
- Stale-while-revalidate and CDN request coalescingmiddle
- Detecting stampedes and designing TTL for productionmiddle
- Metastable failure, fencing tokens, and production postmortemssenior
- What a relation is: tables, rows, keys, and constraintsjunior
- Constraints, keys, and Postgres data typesmiddle
- Normal forms, denormalization, and why schemas stickmiddle
- JSONB, arrays, and when a side table winsmiddle
- Heap storage, TOAST, and column alignmentsenior
- Schema integrity: deferral, versioning, and production failure modessenior
- Relational vs document, wide-column, graph, and key-valuesenior
- Index-only scans, the Visibility Map, and INCLUDEsenior
- Production failure modes and the index audit playbooksenior
- pg_statistic, ANALYZE, and production observabilitymiddle
- Production failure modes and plan stabilitysenior
- MVCC: why readers and writers never wait for each otherjunior
- Row versions and snapshots: the on-disk mechanicsmiddle
- HOT updates and isolation levels: what you gain and what you paymiddle
- Vacuum and bloat: keeping the storage tax boundedmiddle
- CLOG, XID wraparound, and MultiXact: deep visibility internalssenior
- SSI internals and production autovacuum tuningsenior
- Real-world MVCC failures, deployment patterns, and distributed snapshotssenior
- Connection pools: amortising the cost of a Postgres backendjunior
- PgBouncer session, transaction, and statement modesmiddle
- Pool sizing: the (cores × 2) + spindles formula and the two-layer stackmiddle
- Pool exhaustion and idle-in-transaction: the 3 AM failure modemiddle
- Migrating to transaction mode: rollout playbook and PgBouncer 1.21 prepared statementsmiddle
- The Postgres process model and why raising max_connections degrades throughputsenior
- Pooler landscape 2026, serverless connection storms, and the full failure-mode taxonomysenior
- What a schema migration is and why it replaces ad-hoc DDLjunior
- ADD COLUMN: instant in PG 11+ vs rewrite in older Postgresjunior
- The lock-queue failure mode: why instant DDL can freeze the databasemiddle
- Safe DDL patterns: NOT VALID, CONCURRENTLY, and unsafe-op fixesmiddle
- Expand-contract: zero-downtime for breaking schema changesmiddle
- Advisory locks, migration tools, and deploy coordinationsenior
- Migration failure taxonomy and production disciplinesenior
- Why sharding exists: the single-Postgres ceilingjunior
- Shard-key selection: hash, range, list, and directory strategiesmiddle
- Partitioning vs sharding: same word, two different thingsmiddle
- Co-location and Citus: the invariant that makes sharding usablemiddle
- The hot-shard failure mode: detection, isolation, and durable policymiddle
- Schema-based sharding and multi-tenancy alternativessenior
- Online resharding, 2PC, and the operational cost of shardingsenior
- The seven acts: from CREATE TABLE to Citusjunior
- Acts 1–3 in depth: schema, indexes, and planner statisticsmiddle
- Acts 4–6 in depth: MVCC bloat, connection pooling, and safe migrationsmiddle
- Act 7 in depth: sharding, co-location, and the seven-tier tradeoff cascademiddle
- Observability, anti-patterns, and production triagesenior
- Raft roles, terms, and why majority quorums prevent split brainjunior
- How Raft replicates a log entry and decides it is safe to commitmiddle
- Raft leader election: timeouts, voting rules, and the four safety propertiesmiddle
- Raft in the real world: partitions, slow disks, and client routingmiddle
- Raft extensions: pre-vote, learners, snapshots, and linearizable readssenior
- Raft in production: membership changes, Multi-Raft, and observabilitysenior
- Where data fetching happens — and why it decides LCPjunior
- Fetch waterfalls — diagnosis and the Promise.all curemiddle
- React Server Components and Suspense streamingmiddle
- Client-side cache: TanStack Query, SWR, and stale-while-revalidatemiddle
- LCP, prefetch, and race conditions in interactive fetchingmiddle
- Senior internals: RSC payload, caching layers, and production failure modessenior
- The three-way handshakejunior
- Sequence numbers and connection statemiddle
- DNS: what it does and why it existsjunior
- The resolver walk: referrals, record types, and gluemiddle
- TTL, caching, and DNS propagationmiddle
- The 1-RTT handshake: key shares and ECDHEmiddle
- Session resumption and 0-RTTmiddle
- WebSocket: the HTTP upgrade handshakejunior
- WebSocket frame format: opcodes, masking, fragmentationmiddle
- WebSocket backpressure: when clients can''''t keep upmiddle
- Reconnection: jittered backoff, thundering herd, message resumptionsenior
- WebSocket at scale: HTTP/2 multiplexing, permessage-deflate, C10Msenior
- WebSocket in production: proxies, security, and distributed architecturesenior
- What reverse proxies dojunior
- Health checks, connection draining, and slow startmiddle
- Session affinity, consistent hashing, and the right fixmiddle
- Retry storms, circuit breakers, and load sheddingsenior
- Resilient LB architecture: anycast, zone-aware routing, and observabilitysenior
- Why QUIC and not TCP+TLSjunior
- Connection IDs and network migrationmiddle
- 0-RTT resumption and packet encryptionsenior
- DDoS: what it is and why it worksjunior
- Amplification attacks and state exhaustionmiddle
- Rate limiting: algorithms and architecturemiddle
- WAFs, firewalls, mTLS, and HSTSmiddle
- DNS cache poisoning and BGP hijackingsenior
- Defense-in-depth architecture and attack economicssenior
- DNS, TCP, TLS in sequence: where the milliseconds gomiddle
- Proxy intercepts and security gates: rate limiters, WAF, mTLSmiddle
- Alternate paths: QUIC 0-RTT, WebSocket upgrade, connection migrationmiddle
- Observability: distributed traces, USE/RED, and samplingsenior
- Resilience: cascading retries, circuit breakers, and error budgetssenior
- What the three signals are: logs, metrics, and tracesjunior
- Why structured logs exist: the diary vs the spreadsheetjunior
- The production log schema: fields every line must carrymiddle
- PII redaction and log injectionsenior
- OTel Logs Data Model and audit logs as a subsystemsenior
- SLI, SLO, and the error budget: reliability by the numbersjunior
- Error budget policy, latency SLOs, and composite journeysmiddle
- Production SLO failures, self-observability, security, and the big picturesenior
- The incident loop: from pager to postmortem to preventionmiddle
- Cache lines, struct layout, and false sharingmiddle
- SIMD, SoA vs AoS, and memory bandwidthmiddle
- Cache-oblivious algorithms, PGO, and production failuressenior
- GC in production: observability, security, edge cases, and fleet governancesenior
- Batching: amortize fixed cost per operationjunior
- The batching window: size and wait timemiddle
- Batching in Kafka and Postgresmiddle
- io_uring and observability of batchingmiddle
- From Nagle to io_uring: evolution of batchingmiddle
- Backpressure, failure isolation, and batch security in productionsenior
- CI enforcement and RUM: making budgets stickmiddle
- V8 JIT pipeline, HTTP priorities, and bundle securitysenior
- The performance loop: discipline, not a projectjunior
- Classify and fix: matching bottleneck families to remediesmiddle
- Observability stack and CI gates: catching regressions before they shipmiddle
- Incident to enforcement: SLO burn to verified fix in 35 minutesmiddle
- Culture, economics, and org-scale performancesenior
- At-most-once, at-least-once, exactly-once: the three delivery contractsjunior
- The three failure legs — where duplicates and losses actually happenmiddle
- Consumer-side dedup: the cheapest path to exactly-once processingmiddle
- Kafka exactly-once semantics: idempotent producer and transactionsmiddle
- SQS visibility timeout, DLQ, and the outbox patternmiddle
- Exactly-once in production: impossibility proof, hybrid patterns, and real incidentssenior
- What OAuth is and why passwords are not the answerjunior
- Authorization code flow with PKCEmiddle
- ID token validation and JWKS cache managementmiddle
- Refresh token rotation and scope-based least privilegemiddle
- Sender-constrained tokens: DPoP and mTLSsenior
- OAuth in production: audience attacks, observability, and real failuressenior