Browser & Frontend Runtime WEB · 04 · 07

Worker pools, Comlink, and production observability

Pool N workers sized to hardwareConcurrency − 1, dispatch jobs via a priority queue with backpressure, wrap with Comlink for ergonomics — but postMessage task-hops still cost, and every worker thread needs its own telemetry.

WEB Senior ◷ 16 min

Level

FoundationsJuniorMiddleSenior

You moved the 400 ms image job to a worker. Then users start clicking rapidly and the page accumulates fifty pending jobs, each spawning a new worker that never gets cleaned up. The browser tab now uses 400 MB and is being killed on mobile.

When workers pay off — and when they do not

Spawning a worker is not free: a new realm, a new event loop, a fresh copy of any imported scripts — typically 5–20 ms of startup plus a few MB of memory. A worker is worth it when:

The task is long (tens of ms or more), so startup is amortised.
The result-transfer cost is small relative to the compute.

It is a net loss when the task is short — the postMessage round-trip and clone overhead can exceed the work itself.

For repeated small jobs: maintain a worker pool rather than spawning per job.

The worker pool

A pool amortises startup cost and bounds memory:

pool size = navigator.hardwareConcurrency − 1  // leave one core for main thread

Components:

N workers, created once and reused.
A job queue — pending work waiting for a free worker.
A dispatcher — on job arrival: if a worker is free, send it; otherwise enqueue.

The pool pattern eliminates repeated startup cost and bounds total worker memory.

Backpressure. Without a queue cap, the queue grows unboundedly under load. Two options:

Drop the oldest enqueued jobs.
Reject new jobs (return a rejected promise) so callers can back off.

Priority routing. Not all jobs are equal — a filter preview the user is waiting for right now outranks background thumbnail generation. Use a priority-tagged queue: interactive jobs skip background ones.

Cancellation. If the user scrolls away before a job completes, cancel it. For a queued job: remove it. For an in-progress job: the clean way is an atomic cancellation flag in a SharedArrayBuffer — the worker checks it periodically and exits early, keeping itself alive for the next job. The blunt way is worker.terminate() — but that destroys the worker, forcing a new spawn for the next job (back to paying startup cost).

Tasks fan out from one queue across N reused workers (sized hardwareConcurrency − 1) and fan back in to a single result collector. The pool amortises the 5–20 ms startup across every job and bounds total memory; the queue is where backpressure and priority routing live.

Worker pool design

Pool size rule of thumb: hardwareConcurrency − 1
Worker startup: 5–20 ms + script parse
postMessage task-hop latency: ~1 ms each direction
Empty worker memory: A few MB
Worker + large library: Tens of MB
Leaked worker (forgotten terminate): Persists until tab close

The DOM-in-a-worker mistake

The single most common worker mistake is architectural: teams reach for a worker to “speed up rendering” and discover the worker cannot touch the DOM at all. A worker can compute what should render but never render it — the result must be posted back and applied by the main thread.

If the bottleneck is DOM mutation itself (10 000 nodes inserted, a giant React reconciliation), a worker does not help — the expensive part has to happen on the main thread regardless. Workers help when the bottleneck is pure computation that produces a small result:

Parse 5 MB JSON (worker) → post back 200-row array (cheap) → main thread renders 200 rows (fast). ✓
React reconciliation of 10 000 nodes (bottleneck is the commit, not derivation) — worker cannot help. ✗

The exception: OffscreenCanvas. Canvas rendering can be done from a worker. Transfer an OffscreenCanvas and the worker draws 2D or WebGL entirely off the main thread.

Comlink and the RPC illusion

Comlink makes await worker.heavyCompute(data) look like a local call by wrapping a worker in a Proxy. This is ergonomic, but the abstraction hides two costs that still matter:

Every argument is structured-cloned unless explicitly wrapped with Comlink.transfer.
Every call is a task hop each way — a round-trip message between threads.

The illusion breaks for chatty interfaces — a worker API with many small methods called in a tight loop pays a task hop per call and serialises the program on the round-trips. Design worker interfaces coarse: one call that does a batch of work and returns a batch of results, not many fine-grained calls. Same principle as network API design: minimise round-trips, maximise work per round-trip.

Production observability

Workers that look fine in local development can hide latency surprises in production — a fast dev machine understates clone cost and task-hop latency by a factor of 3–5x compared to a mid-range mobile device. Each worker and each service worker is a separate context in DevTools. Web workers appear in the Sources panel thread list. Service workers have a dedicated panel under Application → Service Workers.

Telemetry across threads:

Instrument both sides of every postMessage with timestamps. Measure real task-hop latency and clone cost in production — local dev on a fast machine systematically understates both.
Track service worker fetch-handler duration. A slow handler delays every navigation and resource load on the page, and because it runs before the main thread sees the response, a regression there is invisible to ordinary main-thread profiling.

Worker leak detection:

A component that creates a worker in useEffect without terminating in the cleanup function leaks a worker on every remount. After a dozen navigations, you accumulate a pool of dead workers that nobody created intentionally.
Profile with DevTools → Performance → Threads to see all active workers. Unexpected idle threads are leaks.

Pick the best fit

You need to run a 400 ms image-processing job triggered by a button click, without freezing the page. Pick the approach.

Design challenge

Design the threading architecture for a browser-based video editor: a 4K timeline, real-time filter previews, and export. It must stay at 60 fps during scrubbing and never freeze the UI.

Main thread reserved for DOM, input, and the timeline UI only.
Filter previews must update within 100 ms of a parameter change.
Export of a multi-minute clip must not block the UI and must show progress.
Large frame buffers must cross threads without per-frame clone cost.
The app must load instantly on repeat visits and survive a page reload mid-edit.
Multithreaded WASM is used for the codec.

▸Why this works

Why is navigator.hardwareConcurrency − 1 the pool size rule? Using all N cores for workers starves the main thread — rendering, input, and your JS all run there. Leaving one core free for the main thread keeps 60 fps animation and input handling smooth while the worker pool runs at full utilisation. On a device with 2 cores the pool is 1 worker; on an 8-core machine it is 7. This is the same reasoning as leaving one CPU for the OS scheduler in server deployments. The − 1 is heuristic, not law — workloads with very short tasks may benefit from a smaller pool (less contention per core); workloads with I/O-bound workers may benefit from a larger one. Profile first.

Recall before you leave

01
A teammate proposes moving a slow React re-render into a web worker to fix jank. Explain why this will not work and what actually will.
02
What is the Comlink task-hop problem and how do you design around it?
03
How do you detect and prevent worker leaks in a React application?

Recap

Worker pools amortise the 5–20 ms startup cost and bound memory — size the pool to navigator.hardwareConcurrency − 1. Add backpressure (cap the queue, reject or drop when full) and priority routing (interactive jobs before background). Comlink removes postMessage boilerplate but hides clone and task-hop costs — keep worker APIs coarse to minimise round-trips. Workers cannot help DOM mutation — only pure computation. In production, instrument postMessage timestamps on both sides to measure real latency; track service worker fetch-handler duration as it is invisible to main-thread profiling; watch for worker leaks in SPA components. Now when you see a worker integration that works on your machine but regresses on mobile — you know which two numbers to measure first: the clone cost and the task-hop round-trip.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 5 done

Connected lessons

builds on

appears again in41

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.