awesome-everything RU
↑ Back to the climb

Backend Architecture

Blocking vs non-blocking I/O: two ways to wait

Crux A server spends most of its life waiting on I/O. Blocking I/O parks a whole thread on each wait, so concurrency costs memory; non-blocking I/O hands the wait to the kernel and lets one thread juggle thousands of sockets through an event loop.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at junior altitude — the surface
◷ 11 min

Time a typical request handler and the surprise is how little of it is your code. It reads a row from Postgres, calls a payment API, writes a log line — and spends 95% of its wall-clock time doing nothing but waiting for those to come back. The whole game of backend concurrency is: what does the program do while it waits? Two answers split the entire field. One parks a thread on every wait. The other refuses to park anyone and asks the kernel to tap it on the shoulder when data is ready.

Waiting is the job

A backend is mostly an I/O machine. Disk reads, database queries, outbound HTTP, socket writes — each is slow relative to the CPU (microseconds to milliseconds, while the CPU runs billions of instructions a second). So the design question is never “how fast is my code” first; it is “how does the runtime spend the wait.” Two I/O models give opposite answers, and the choice shapes how the server scales, how much memory it eats, and how it fails under load.

Blocking I/O: one thread per connection

In the blocking model, a thread calls read() and the operating system suspends that thread until bytes arrive. The thread is parked — consuming its stack and a scheduler slot — doing nothing useful. To serve a second connection concurrently you need a second thread, a third needs a third, and so on: thread-per-connection.

This is simple and easy to reason about — the code reads top to bottom, each line waits for the last — but it scales by adding threads, and threads are not free. Each OS thread reserves roughly 1–2 MB of stack, so 10,000 concurrent connections imply on the order of 10+ GB of memory just for stacks, plus thousands of context switches per second as the scheduler shuffles parked threads. The model trades memory for simplicity.

Non-blocking I/O: one thread, many sockets

In the non-blocking model, a socket is set to non-blocking mode and read() returns immediately — either with data or with “not ready yet.” Instead of parking, the thread registers interest in many sockets with a kernel facility — epoll on Linux, kqueue on BSD/macOS — and asks one question: “which of these thousands of file descriptors are ready right now?” The kernel returns only the ready ones, in roughly O(1) time regardless of how many are being watched. The thread services those, then asks again. That loop is the event loop.

One thread can therefore drive tens of thousands of connections, because it only ever touches sockets that have actual work. The cost is a different shape of code: you cannot read top-to-bottom and “wait” — you register a callback (or await) and the loop calls you back later. Logic that was a straight line becomes a set of continuations.

Why this works

Why does the kernel facility matter so much? The naive way to watch many sockets is to loop over all of them asking “ready? ready? ready?” — that is select/poll, and it costs O(n) per pass, so watching 10,000 sockets means scanning 10,000 every time even if one is ready. epoll/kqueue invert this: you register the set once, and the kernel hands back only the descriptors that became ready, so the cost tracks the number of active connections, not total connections. This is the mechanism that makes “one thread, 50,000 idle keep-alive connections” actually cheap — the idle ones cost almost nothing because the loop never visits them until they have data.

The C10k framing and the real tradeoff

This split was named by the C10k problem (~1999): how do you serve 10,000 concurrent clients on one box? Thread-per-connection hit a memory and context-switch wall; the event-loop model — Nginx, Node.js, Netty, Redis — was the answer. The honest summary:

  • Blocking / thread-per-connection trades memory and context-switch overhead for simplicity. Great when connection counts are modest or work is CPU-heavy; the code stays linear.
  • Non-blocking / event loop trades code complexity (callbacks, continuations, no parking) for scalability under many concurrent, mostly-idle connections.

Neither is universally “faster.” For I/O-bound workloads with high concurrency, the event loop wins decisively on memory and connection count. For CPU-bound work, a single event-loop thread is no faster than any other single thread — a limit the next lessons make sharp.

Blocking (thread-per-connection)Non-blocking (event loop)
WaitingThread parked by OSKernel watches FDs, thread moves on
10k connections~10+ GB stacks, many context switchesOne thread, memory ~ active conns
Code shapeLinear, top-to-bottomCallbacks / await, continuations
Scales byAdding threadsAdding ready-event throughput
Best forModest concurrency, CPU-heavyHigh concurrency, I/O-bound
Quiz

Why does a thread-per-connection server struggle to hold 50,000 mostly-idle keep-alive connections?

Quiz

What does `epoll`/`kqueue` give the event loop that a naive `select`/`poll` scan does not?

Order the steps

Order what a non-blocking server does to serve a read on one of many sockets:

  1. 1 Set the socket to non-blocking mode and register it with epoll/kqueue
  2. 2 Ask the kernel which of the watched descriptors are ready
  3. 3 Kernel returns only the ready descriptors
  4. 4 Run the callback for each ready socket, reading the available bytes
  5. 5 Loop back and ask the kernel again
Recall before you leave
  1. 01
    Why is 'how the runtime spends the wait' the central question for a backend, rather than raw code speed?
  2. 02
    How does blocking thread-per-connection work and what is its scaling cost?
  3. 03
    How does non-blocking I/O with an event loop serve many connections on one thread, and what does epoll/kqueue contribute?
Recap

A backend spends most of its life waiting on I/O, so the model for how it waits decides everything downstream. Blocking I/O parks a thread on each wait: linear, easy code, but each thread costs roughly 1–2 MB and a scheduler slot, so thread-per-connection turns 10,000 connections into 10+ GB of stacks and a storm of context switches — memory traded for simplicity. Non-blocking I/O sets sockets non-blocking, returns immediately, and registers them with epoll or kqueue so one thread asks the kernel which descriptors are ready and services only those — scaling to tens of thousands of connections because idle ones cost almost nothing, at the price of callback- or await-shaped code. The C10k problem named this divide, and the event loop became the standard answer for high-concurrency I/O-bound servers. The next lesson opens that loop up: the ordered phases it runs, the microtask queue it drains between them, and why this concurrency is cooperative rather than parallel.

Connected lessons
appears again in185
Continue the climb ↑The event loop: one thread, ordered phases
shortcuts expand
search
K
prev piece
k
next piece
j
cycle tier
t
this menu
?
sources3
expand
  1. 01
  2. 02
  3. 03

Trademarks belong to their respective owners. Editorial reference only.