Backend Architecture BE · 03 · 09

Async vs blocking: code and trace reading

Read real Node handlers and a perf signal, predict how each interacts with the event loop, and pick the fix a senior engineer reaches for first.

BE Senior ◷ 14 min

Level

FoundationsJuniorMiddleSenior

Blocking is diagnosed in handlers and in the lag histogram, not in the abstract. Read each snippet, predict what it does to the single loop thread, and choose the change a senior engineer would make before reaching for any knob.

Goal

Practise the loop you run in every freeze incident: spot the synchronous span or the unbounded fan-out on the hot path, name why it stalls the loop, and reach for the highest-leverage fix — async API, worker thread, or concurrency cap.

import bcrypt from "bcrypt";
import fs from "fs";

app.post("/login", (req, res) => {
  const policy = fs.readFileSync("./password-policy.json", "utf8"); // sync read
  const hash = bcrypt.hashSync(req.body.password, 12);              // ~250 ms CPU
  // ...verify and respond...
});

Quiz

Under login load this handler tanks throughput for every route, not just /login. What is happening, and the highest-leverage fix?

Snippet 2 — the resize on the libuv pool

// Team's "fix" after profiling a slow image endpoint:
process.env.UV_THREADPOOL_SIZE = "32";

app.post("/resize", (req, res) => {
  const out = resizeImageSync(req.body.buffer, 1024, 768); // pure-JS pixel loop
  res.send(out);
});

Quiz

The team raised UV_THREADPOOL_SIZE to 32 expecting the resize to parallelise. It changed nothing. Why, and what is the correct fix?

Snippet 3 — the worker-pool starvation

// Worker pool sized to the machine's 4 cores:
const pool = new WorkerPool({ size: 4 });

app.get("/report/:id", async (req, res) => {
  // Each report = one CPU-heavy aggregation task on the pool.
  const result = await pool.run("aggregate", req.params.id); // may take ~2 s
  res.json(result);
});

Quiz

Under a burst of report requests, every endpoint — including cheap ones that also use the pool — sees latency climb into seconds, then time out. What is the failure mode?

Snippet 4 — measuring the freeze

import { monitorEventLoopDelay } from "node:perf_hooks";

const h = monitorEventLoopDelay();
h.enable();
setInterval(() => {
  console.log("loop delay p99 (ms):", h.percentile(99) / 1e6);
}, 1000);
// Sample output during a bad reporting request:
// loop delay p99 (ms): 812

Quiz

CPU sits at a calm ~55% while this prints a p99 loop delay of 812 ms. Which reading is correct?

Recap

Every freeze is read in handlers and in the lag histogram: synchronous I/O and sync crypto on the request path stall the loop and must move to async APIs (which use the libuv pool); CPU-bound JS cannot be helped by a bigger libuv pool and belongs in a worker thread; a fixed worker pool starves under a burst of heavy tasks, so bound the queue with timeouts and keep fast work off it; and event-loop delay — not CPU — is the metric that matches the timeouts users feel. Diagnose from the signal, fix the blocking span, then re-measure.

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.

Async vs blocking: code and trace reading

Snippet 1 — the synchronous login

Under login load this handler tanks throughput for every route, not just /login. What is happening, and the highest-leverage fix?

Snippet 2 — the resize on the libuv pool

The team raised UV_THREADPOOL_SIZE to 32 expecting the resize to parallelise. It changed nothing. Why, and what is the correct fix?

Snippet 3 — the worker-pool starvation

Under a burst of report requests, every endpoint — including cheap ones that also use the pool — sees latency climb into seconds, then time out. What is the failure mode?

Snippet 4 — measuring the freeze

CPU sits at a calm ~55% while this prints a p99 loop delay of 812 ms. Which reading is correct?

Something unclear?