Browser & Frontend Runtime WEB · 02 · 02

Stage costs and the renderer process model

What drives the cost of each pipeline stage, how the renderer process splits work across threads, and why the main thread is the bottleneck.

WEB Middle ◷ 14 min

Level

FoundationsJuniorMiddleSenior

Your page parses fast on your M2 MacBook. It crawls on a mid-range Android. The bottleneck is never “the CPU is slow” — it is which thread is doing which work, and how much of it is on the thread that cannot be parallelised.

The renderer process model

A modern Chromium-family browser runs each tab in its own renderer process. Inside that process:

Main thread — runs HTML parsing, CSSOM construction, style, layout, paint setup, and your JavaScript. It is single-threaded by design: DOM, CSSOM, and JS state can only be mutated by one execution context at a time, otherwise consistency would be impossible.
Compositor thread — assembles the layer tree and decides which layers need new bitmaps.
Raster worker threads — rasterise individual tiles in parallel.
GPU process — takes rasterised tiles, uploads them as textures, and runs the final composite-and-display.

Firefox uses a similar split (Quantum CSS for parallel style, WebRender for GPU-driven compositing); Safari/WebKit splits across its WebContent process and the GPU process. Names differ; the architecture rhyme is universal.

Renderer process internals (Chromium)

Main thread

Parse HTML → CSSOM → Style → Layout → Paint setup

+ your JavaScript

Compositor thread

Assembles layer tree, decides dirty tiles

Raster workers (N)

Rasterise tiles in parallel

GPU process

Upload textures, composite and display

Why the main thread is the bottleneck

Any work you put on the main thread — parsing a JSON blob, deserialising a Redux state, running a layout, executing a click handler — competes for the same 16.67 ms window. The compositor and raster threads exist precisely so rendering work can leave the main thread and run in parallel.

That is the architectural justification for transform/opacity animations being “free”: they reach the GPU without touching the bottleneck thread.

Top row is the single main thread (the bottleneck); the commit edge hands the painted result to the compositor thread, which runs Composite in parallel. Crossing that handoff is what makes transform/opacity cheap.

Stage-by-stage cost drivers

Each stage has typical levers that blow up its cost.

Parse HTML scales with document bytes. A 500 KB SSR-rendered page parses faster than a 2 MB one, simply because there is less to walk. Synchronous <script> tags block the parser until they finish downloading and executing — modern best practice puts defer or async on every external script not strictly needed for first paint.

CSSOM cost grows with stylesheet bytes and rule count. An unused 800-rule CSS framework wastes parse time even if zero rules actually match anything on the page.

Style calc cost is roughly DOM size × selectors. A 5 000-node DOM with a 2 000-rule stylesheet is 10 million selector-match checks. Most selectors are skipped via a bloom filter, but :has(), descendant combinators with no ancestor anchor, and universal selectors defeat the filter and cost more.

Layout cost is roughly DOM depth × box dependencies. A deeply nested flexbox with auto sizing forces multiple measure passes; a flat grid with explicit cell sizes is one pass.

Paint cost is painted area × paint op count. box-shadow with a large blur radius and filter properties (blur, drop-shadow) are paint-heavy because each pixel requires multi-pixel sampling.

Composite cost is layer count × layer pixel area. The cheap stages are cheap by orders of magnitude, but only if the upstream stages don’t invalidate downstream.

Together these cost drivers form a diagnostic checklist: when you see a slow frame, identify which stage dominates, then pull the lever that controls its cost driver. Without that mapping you’re guessing; with it, you’re reading a budget report.

▸Why this works

Why is the DOM single-threaded at all? Because two execution contexts writing the same DOM node concurrently without locking would require a full concurrent garbage collector and would still leave subtle race windows open. Java tried this with Swing’s UI thread rule; the browser inherited the same constraint. The single-thread rule is a deliberate correctness trade-off, not an oversight.

Quiz

You change a div's `top` property in a rAF loop. Which pipeline stages re-run per frame?

Quiz

You change `transform: translateX(...)` on a div that already has its own compositor layer. Which stages run on the main thread?

Trace it

1/4

DevTools Performance panel shows a 28 ms frame. Inside: 1 ms Parse HTML, 2 ms Recalculate Style, 18 ms Layout, 4 ms Paint, 1 ms Composite Layers, 2 ms idle. The page is scrolling a list of 5000 chat messages. Where is the time?

Step 1 of 4

Layout dominates at 18 ms. Most likely cause: every visible chat row is re-measured because something high in the DOM tree changed width

Locked

Paint dominates at 4 ms

Locked

Composite at 1 ms

Locked

Idle time at 2 ms means the page is starving

Compute it

A DOM has 5000 nodes. The stylesheet has 2000 rules. Roughly how many selector-match checks does style calc perform?

checks

Recall before you leave

01
What four threads/processes does a Chromium renderer process use?
02
Why is the main thread single-threaded?
03
What is the cost driver for style recalculation?

Recap

The renderer process has four players: main thread, compositor thread, raster workers, and GPU process. Five of the six pipeline stages run on the single main thread — the same thread as your JavaScript — so every long task competes with rendering. Stage costs are predictable: parse scales with bytes, style calc with DOM × rules, layout with DOM depth × box dependencies, paint with area × ops, composite with layer count × pixel area. Composite-only animations (transform, opacity) skip the main thread entirely; that is why they are an order of magnitude cheaper than layout-triggering ones. Now when you see a 28 ms frame in DevTools and the Layout bar is 18 ms wide, you know the cost driver is DOM depth or box dependencies — not JS, not paint — and you know which lever to pull.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 5 done

Connected lessons

builds on

The render pipeline: six stages from bytes to pixelsjunior

unlocks

deepens into

appears again in169

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.