Networking & Protocols NET · 08 · 04

WebSocket backpressure: when clients can''''t keep up

How slow clients fill TCP windows, bloat application send queues, and crash servers — and the high-water-mark pattern that prevents it.

NET Middle ◷ 12 min

Level

FoundationsJuniorMiddleSenior

Your WebSocket broadcast server handles 10,000 clients fine during testing. In production, 100 clients on slow mobile connections start lagging behind. The server’s memory climbs. GC pauses lengthen. Then the OOM killer fires and takes down the entire service. This is the canonical WebSocket failure mode — and it has nothing to do with the protocol.

The TCP receive window, from the application’s view

Before you can fix a backpressure incident, you need to understand exactly where the pressure builds — from the application queue all the way down to the kernel buffer. When you see memory climbing and GC pauses growing, this is the chain that explains why.

TCP gives every socket a receive window — the number of bytes the receiver can accept before the sender must pause. As the application on the receiving end reads bytes, the window opens; as it falls behind, the window shrinks to zero and the sender stalls.

From the server’s perspective:

Server calls send(socket, message).
The kernel copies the message to the socket’s send buffer.
TCP transmits from the send buffer, subject to the client’s advertised receive window.
If the client’s window is zero (its read buffer is full), TCP stops transmitting.
The kernel’s send() call blocks if the send buffer is full, or returns EAGAIN in non-blocking mode.

For a broadcast server with thousands of clients, this creates a critical asymmetry: fast clients drain their buffers in milliseconds; slow clients leave the server’s bytes sitting in kernel send buffers, and when those buffers fill, the server is stuck.

The application-level send queue

Production WebSocket servers never block on send(). Instead, they maintain an application-level send queue per connection. When the server broadcasts a message, it enqueues it for each connection and returns immediately. A background writer drains each queue into the kernel as fast as the client’s TCP window allows.

This solves the blocking problem — but creates a new one: queue unbounded growth.

Consider this scenario:

Server broadcasts a 100 KB message every 10 ms.
100 of 10,000 clients are slow — database queries are blocking their read thread.
Each slow client’s queue grows by: 100 messages/second × 100 KB = 10 MB/second.
After 10 seconds: 100 slow clients × 10 MB × 10 s = 10 GB of RAM consumed by queues.
OOM killer fires.

Backpressure failure math

Broadcast rate (example): 100 KB every 10 ms = 10 MB/s per client
RAM growth per slow client at this rate: 10 MB/s
RAM at 100 slow clients after 10 s: 10 GB
Recommended p95 queue depth target: < 5 messages
Recommended slow-client threshold: < 0.1% of connections
Recommended total queued bytes ceiling: < 5% of heap

The high-water mark pattern

The solution is a high-water mark: a per-connection queue limit that triggers eviction or backpressure.

When a connection’s send queue exceeds the high-water mark (e.g., 100 messages or 10 MB), the server chooses:

Option A — evict the connection. Forcibly close the slow client with status code 1013 (“try again later”). The client reconnects when its network clears. Other clients are unaffected. This is the standard choice for broadcast services.

Option B — drop messages. For idempotent data (stock prices, game state snapshots) where the latest value supersedes older ones, drop old messages and only keep the newest N entries per connection. The slow client gets the current state on reconnect rather than stale history.

Option C — pause the producer. For peer-to-peer streams (video conference, file transfer), signal the sending application to pause until the queue drains below the low-water mark. This is full backpressure propagation.

The three eviction strategies are not interchangeable — pick by data semantics: evict broadcasts, drop idempotent state, pause peer-to-peer streams.

The proxy idle timeout failure mode

There is a second common failure mode: a proxy between client and server has a 60-second idle timeout. If no bytes cross the TCP connection in 60 seconds, the proxy closes the TCP connection — without sending a WebSocket close frame.

Both client and server are unaware until the server tries to send the next message: send() returns EPIPE or ECONNRESET. The client sees a close event with code 1006 (abnormal closure). The connection is dead.

Fix: the server sends a ping frame every 25–30 seconds. The ping resets the proxy’s idle timer. The client must reply with a pong within a reasonable window (the server can evict clients that don’t pong within 10 seconds as a stale connection cleanup).

▸Edge cases

The 1 MB message, 10k clients, 1% slow example. The server broadcasts a 1 MB message. For the 9,900 fast clients, TCP buffers and delivers in milliseconds. For the 100 slow clients, the message sits in the application queue. If the server broadcasts 50 such messages before detecting the slow clients, 100 clients × 50 msgs × 1 MB = 5 GB of memory bloat. Production servers detect slow-client count continuously and evict when the count exceeds 0.5% of connections or total queued bytes exceeds 5% of heap — whichever comes first.

Trace it

1/5

Trace a backpressure incident on a WebSocket broadcast server.

Step 1 of 5

The server broadcasts a 100 KB message to 10,000 clients. What does the server's TCP stack do for fast clients?

Locked

100 of the 10,000 clients are slow (database query blocking their read thread). Their kernel receive buffers fill. What does the server see when it calls send() for a slow client?

Locked

In reality, what do production servers do instead of blocking?

Locked

The server queues 100 KB per message, broadcasts every 10 ms, 100 clients are slow. How much RAM bloats per second?

Locked

What metric should the server monitor to catch this before the crash?

Debug this

Application metrics from a WebSocket broadcast service under backpressure

log

2026-05-15T14:32:00Z broadcast_service[5234]:
active_connections: 50234
total_queued_bytes: 8589934592  (8 GB)
slow_client_count: 127
p95_queue_depth: 156 messages
p99_queue_depth: 341 messages
broadcast_latency_p99: 18432ms
memory_usage: 15.2 GB / 16 GB heap
gc_pause_time: 1247ms

2026-05-15T14:32:15Z broadcast_service[5234]:
active_connections: 49891  (-343 in 15s)
total_queued_bytes: 10737418240  (10 GB)
slow_client_count: 412
p95_queue_depth: 287 messages
p99_queue_depth: 502 messages
broadcast_latency_p99: 32568ms
memory_usage: 15.8 GB / 16 GB heap
gc_pause_time: 2100ms

2026-05-15T14:32:30Z broadcast_service[5234]: OOM killer invoked; process killed

The service broadcasts 1 MB per second to 50k clients. Queue depth is climbing and heap is at 15.8 GB. What is the failure mode and the immediate fix?

Quiz

A proxy between client and server has a 60-second idle timeout. No messages have been sent for 59 seconds. At 60 seconds the proxy closes the TCP connection without a WebSocket close frame. What close code does the client's WebSocket implementation generate?

A stalled TCP window backs pressure up into the kernel buffer and then the app queue; unbounded growth ends in OOM unless a high-water mark evicts or drops first.

Recall before you leave

01
Why does an application-level send queue solve the blocking problem but create a new one, and how does the high-water mark address both?
02
What is the correct ping interval to defeat proxy idle timeouts, and what should happen if a client does not pong?
03
Name the three eviction strategies for slow clients, and when to use each.

Recap

WebSocket’s primary failure mode is backpressure: when clients cannot read data as fast as the server sends it, TCP receive windows shrink to zero, the kernel send buffer fills, and — if the server uses application-level queues to avoid blocking — those queues grow unboundedly until the process runs out of RAM. The fix is a high-water mark per connection that triggers eviction (code 1013), message dropping, or producer pause depending on the application’s data semantics. A second failure mode is the proxy idle timeout: proxies with 60-second idle timers close quiet WebSocket connections without a close frame, generating close code 1006. Ping frames every 25–30 seconds defeat this. Key metrics to monitor: slow-client count (target below 0.1% of connections), total queued bytes (target below 5% of heap), and p99 per-connection queue depth (target below 5 messages for interactive apps). Now when you see memory climbing on a broadcast server, check slow-client count first — if it is above 0.1%, you have a backpressure event and eviction is the immediate fix.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 5 done

Connected lessons

deepens into

appears again in287

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.

Apply this

Put this lesson to work on a real build.

Collaborative cursorsShow every connected user's live cursor and selection in a shared document, conflict-free, over WebSocket.