Backend Architecture BE · 01 · 05

Streaming and backpressure: when the client reads slower than you write

A response is not done when the handler returns — it is done when the bytes drain. If the client reads slower than the server writes and backpressure is ignored, the response buffers in RAM until the process dies.

BE Senior ◷ 16 min

Level

FoundationsJuniorMiddleSenior

A reporting service streams a 400 MB CSV export. It works in tests. In production it OOMs and restarts every afternoon. The export code is fine — it reads rows and calls res.write() in a tight loop. The bug is that one customer downloads over a phone tethered at 200 KB/s, and the server generates rows far faster than that phone can read them. The unread bytes pile up in the process heap until the kernel kills it.

Buffer vs stream

Before you write a single line of streaming code, ask yourself: do you need to stream at all, and if so, do you know what happens when the client reads slower than you write? The answer decides whether you get flat memory or a nightly OOM.

There are two ways to send a body. Buffer: build the whole response in memory, set Content-Length, send it. Simple, but a 400 MB export means 400 MB resident per concurrent download. Stream: produce the body in pieces and write each piece as it is ready, using Transfer-Encoding: chunked so the total size need not be known up front. Streaming keeps memory flat — if you respect backpressure.

Same response, two memory profiles: buffering costs O(body size) resident memory per download, while streaming with backpressure stays flat at the high-water mark — the choice that turns a nightly OOM into flat memory.

Backpressure: the write that says “stop”

Every writable stream has a buffer with a threshold called the high-water mark (Node’s default is 16 KB for byte streams, 64 KB for filesystem streams). The mechanism:

res.write(chunk) returns true while the internal buffer is below the high-water mark — keep writing.
It returns false when the buffer is full — stop writing and wait for the 'drain' event before continuing.

write() returning false does not mean the write failed. It means the consumer is behind and the buffer is full. If you ignore the false and keep writing anyway, the data does not vanish — it accumulates in the stream’s internal buffer in your process heap. That is the OOM. The fix is to honor the signal: pause production until 'drain'. pipe() and pipeline() do this for you automatically, which is why pipeline(source, res) is the safe default and a hand-rolled while loop of write() is the classic footgun.

Pattern	Memory under a slow client	Verdict
Buffer whole body, then send	O(body size) per connection	OK for small bodies only
`write()` in a loop, ignore return value	Grows unbounded → OOM	Bug
`write()` + wait for `'drain'` on `false`	Flat (≈ high-water mark)	Correct
`pipeline(source, res)`	Flat, handles errors + cleanup	Correct, preferred

Why the buffer fills: it is turtles down to TCP

Backpressure is not a Node invention; it is the application-level surface of a chain of flow-control windows. The client’s TCP receive window advertises how much it can accept. When the client app reads slowly, its receive buffer fills, it shrinks the advertised window, and the server’s kernel send buffer stops draining. The userland writable stream then stops draining, write() returns false, and — if you listen — your code stops producing. Each layer pushes back on the one above it. Ignoring backpressure means decoupling your production rate from this entire chain, so the gap accumulates in the one place with no flow control: your heap.

▸Why this works

Why does HTTP/2 make backpressure both more important and more subtle? HTTP/2 multiplexes many streams over one TCP connection, and it has its own flow-control windows per stream (default 64 KB) on top of TCP’s connection-level window. A single slow stream can stall if its window is exhausted, but because all streams share one TCP connection, head-of-line blocking at the TCP layer can also stall unrelated streams. So an HTTP/2 server must respect two layers of windows, and a misbehaving large download can starve small concurrent requests on the same connection in ways that are invisible at the HTTP/1.1 level.

The slow-consumer attack

The same mechanism is a denial-of-service vector. Slowloris and slow-read attacks deliberately read responses one byte at a time to hold connections open and keep server-side buffers occupied. A server that buffers per connection and ignores backpressure can be exhausted by a handful of slow clients — no volume needed. Defenses are the same as the correctness fixes plus limits: honor backpressure, cap per-connection buffering, and set write/idle timeouts so a stalled drain is abandoned rather than held forever (the subject of the next lesson).

Quiz

A streaming endpoint OOMs only when a client downloads over a very slow link. The code calls res.write() in a loop and ignores its return value. What is happening?

Quiz

What does res.write() returning false actually mean?

Quiz

Why is backpressure described as the application-level surface of TCP flow control?

A slow client shrinks its TCP window, the kernel buffer stops draining, the userland buffer crosses its high-water mark, and write() returns false. Honor it and memory stays flat; ignore it and unread bytes pile up in the producer's heap until OOM.

Recall before you leave

01
What exactly does res.write() returning false signal, and what is the correct response?
02
Trace how a slow client ends up causing server-side OOM, layer by layer.
03
Why does HTTP/2 add a second layer of backpressure concern beyond TCP, and how can a slow download hurt other requests?

Recap

A response finishes when its bytes drain, not when the handler returns — and the hard case is a client that reads slower than the server writes. Buffering the whole body costs O(body size) memory per connection; streaming keeps memory flat only if you respect backpressure. The signal is write() returning false when the internal buffer crosses the high-water mark (16 KB default): stop and resume on ‘drain’, or let pipeline() manage it. Ignoring the signal piles unread bytes in the heap until the process is OOM-killed. Backpressure is the userland surface of a chain of flow-control windows down to TCP’s receive window, and HTTP/2 adds per-stream windows on top. The same mechanism is the slow-consumer DoS vector — which is why the final stop, timeouts, must abandon a drain that never completes. Now when you see a service that OOMs under specific clients or connections, look at every res.write() call: is the return value being checked?

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 5 done

Connected lessons

builds on

Handler and response: from business logic to bytes on the wiremiddle

unlocks

deepens into

appears again in188

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.