awesome-everything RU
↑ Back to the climb

Backend Architecture

Async vs blocking: unfreeze the loop

Crux Hands-on project — build a service that blocks its own event loop, instrument the lag, then offload CPU, bound the fan-out, and prove the tail recovered with before/after numbers.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at senior altitude — in orbit
◷ 240 min

Reading about a frozen loop is not the same as pulling a service out of one. Build a small server that blocks itself in three different ways, watch a trivial health check time out, then apply the unit’s fixes — offload, bound, backpressure — until the tail comes back, with evidence at every step.

Goal

Turn the unit’s mental model into a reproducible engineering loop: instrument event-loop lag and the tail, reproduce a self-inflicted freeze, move CPU work off the loop, bound the fan-out, and verify with before/after numbers under identical load.

Project
0 of 7
Objective

Take a deliberately loop-blocking HTTP service (your own or a starter) with a trivial /health endpoint, drive it into event-loop lag and tail-latency collapse, then apply the unit's fixes — offload CPU, bound concurrency, honour backpressure — to keep p99 health latency under target and event-loop delay p99 under ~50 ms at sustained load, proving each step with measurements.

Requirements
Acceptance criteria
  • A before/after table per route: event-loop delay p99, ELU, request p99 and p99.9, and in-flight concurrency — measured under the same load, not estimated.
  • A demonstration that /health stays fast (p99 under target) while every offender is hammered, proving the loop no longer head-of-line-blocks the whole process.
  • Event-loop delay p99 holds under ~50 ms at sustained load and the freeze signature is gone from the lag histogram.
  • A one-paragraph write-up naming the fix used for each offender (async API vs worker thread vs concurrency cap vs backpressure) and why it ranked above tuning UV_THREADPOOL_SIZE or adding cores.
Senior stretch
  • Add a ReDoS offender (a catastrophic-backtracking regex on a user-controlled field), show one crafted request freezes the loop for seconds, then fix it with a safe regex / input validation / match timeout and prove the freeze is gone.
  • Add a one-page on-call runbook: triage from the four panels, the question 'is this span bounded and fast or could it run tens of ms on a big input?', the offload-vs-chunk-vs-bound decision, and a verification checklist.
  • Run the service under cluster (or multiple instances behind a load balancer) and show how 'one loop is one core' changes the saturation point and tail under the same load.
  • Reproduce the same blocking workload on a second runtime (Go goroutines or Java virtual threads) and compare how the identical CPU span and fan-out manifest under a preemptive, multicore scheduler.
Recap

This is the loop you will run in every real freeze incident: instrument event-loop delay and the tail first, reproduce the self-inflicted block, then fix at the right layer — async API or worker thread for CPU work, a concurrency cap for fan-out, pipeline backpressure for streams — never a bigger libuv pool for JS CPU and never more cores for one loop. Verify with before/after numbers under identical load, with a trivial /health route as the canary. Doing it once on a toy service makes the production version muscle memory.

Continue the climb ↑Why pool: the cost of creating a connection
shortcuts expand
search
K
prev piece
k
next piece
j
cycle tier
t
this menu
?
sources3
expand
  1. 01
  2. 02
  3. 03

Trademarks belong to their respective owners. Editorial reference only.