Performance PERF · 06 · 02

The batching window: size and wait time

Every batching system has two knobs: max-size (bytes or records) and max-wait (time). Whichever fires first sends the batch — and which one fires tells you whether you''''re throughput-bound or latency-bound.

PERF Middle ◷ 10 min

Level

FoundationsJuniorMiddleSenior

You bump your Kafka producer’s linger.ms from 0 to 10 and throughput jumps 10x for the cost of 10ms of added latency. Apache made the same call for everyone: in Kafka 4.0 (March 2025) the default linger.ms changed from 0 to 5ms after years at zero. The team’s reasoning was blunt — chasing immediacy at the sender does not give you global low latency. A tiny artificial delay buys batching efficiency that often lowers end-to-end latency. To know why a 5ms delay can make a system faster, you have to understand the two-dimensional window underneath.

Two triggers, not one knob

Every batching system you will ever tune has exactly two limits: a max-size (bytes, or record count) and a max-wait (a timer). Items accumulate in a buffer. The batch flushes the instant either limit is hit — the buffer fills to max-size, or the timer reaches max-wait. They are an OR, not an AND. In Kafka these are batch.size and linger.ms; in Redis pipelining they are your client buffer and how long you let it fill; in a database bulk insert they are rows-per-statement and a flush interval. Same shape everywhere.

The naive instinct is to ship one knob — “just set a batch size” — and senior engineers have all watched that instinct page someone at 3am. You need both, and the reason is that each one alone has a failure mode the other covers.

Why size alone stalls, and time alone overflows

Drop max-wait and keep only max-size: now a slow producer holds the batch hostage. Traffic dips at 2am, items trickle in, and the batch never reaches batch.size, so it never flushes. The first message of a half-full batch can sit for seconds — unbounded latency that scales inversely with load. This is the classic head-of-line stall: the cure for it is a timer that says “ship what you have.”

Drop max-size and keep only max-wait: now a fast producer builds a monster. Black Friday hits, items flood in, and in your 50ms window the buffer swells to something downstream cannot swallow — a request larger than the broker’s message.max.bytes, a packet that fragments, a transaction that blows the WAL, an array that OOMs the consumer. The cure is a size cap that says “ship before you get too big.” You need both because they fail in opposite directions: size protects throughput, time protects latency, and removing either re-introduces the bug the other was there to prevent.

Load regime	Trigger that fires first	What it means	Tuning lever
High load	max-size (buffer fills first)	Throughput-bound; the timer never gets to run	Raise `batch.size` for fewer, fatter flushes
Low load	max-wait (timer fires first)	Latency-bound; batches are small, timer dominates	Tune `linger.ms` against your latency SLO
Right at break-even	Either, roughly together	Window is well-matched to current traffic	Leave it; re-check when traffic shape shifts

The speedup math, derived

Before reaching for a bigger window, it is worth knowing when a bigger window actually helps — and when the numbers say it never will.

Model any per-item operation as a fixed cost F (the per-call overhead — a syscall, a network round-trip, a transaction begin/commit) plus a variable cost V*n (the work proportional to payload of size n). Doing N items separately costs:

N * (F + V*n)

Doing them as one batch pays the fixed cost once and the same variable work:

F + V*(N*n)

Speedup is the ratio:

speedup = (N*F + N*V*n) / (F + V*N*n)

Look at the two extremes. When fixed cost dominates — F > V*n, i.e. small payloads where the per-call overhead is the whole story — the N*F term swamps everything and speedup → N. Batch 100 items, go ~100x faster. When variable cost dominates — F < V*n, large payloads where you’re already paying mostly for bytes — the V*N*n term swamps everything and speedup → 1: batching buys you nothing because there was no fixed cost to amortize. The crossover, the break-even point, is F = V*n: when one item’s variable cost equals the fixed overhead, batching starts to pay.

A concrete number

Take a network op: 1KB packets, a 50µs round-trip per call as the fixed cost, and a 100-item batch. Sent one at a time, the round-trips alone cost 100 * 50µs = 5ms. Batched, you pay the round-trip once plus the bytes — call it ~150µs total. That’s 5ms down to 150µs, a ~33x speedup, because here F (50µs RTT) hugely outweighs V*n (the per-KB transfer time). This is exactly the regime Redis pipelining lives in: a published benchmark sends 10,000 PINGs in 1.185s unpipelined and 0.250s pipelined — ~5x — and the gap widens as RTT grows relative to per-command work. The syscall version is identical in spirit: a syscall costs ~1–5µs, so collapsing 300k syscalls into ~4k via larger buffered writes makes the per-call overhead effectively vanish.

▸Why this works

Notice the break-even is per item, not per batch. If a single item’s payload already costs more than the fixed overhead (V*n > F), no batch size rescues you — you’re firmly in the speedup → 1 regime and batching only adds latency. This is why batching tiny messages (logs, metrics, key lookups) is a massive win while batching already-large blobs (video chunks, big file uploads) is mostly pointless.

Batching pays only while fixed per-call cost outweighs per-item payload cost: speedup → N for small payloads, → 1 for large ones, with break-even per item at F = V·n.

Load decides which trigger rules

The window’s behavior is not static — it shifts with traffic. At high load items pour in and the buffer hits max-size long before the timer; size is the dominant trigger and you’re throughput-bound. At low load the trickle never fills the buffer, so the timer fires first; time is dominant and you’re latency-bound. The practical diagnostic: watch your average batch size against the configured max-size. If batches consistently flush near max-size, size is winning — raise it. If they flush small and on the timer, time is winning — and raising linger.ms only helps until the batch starts filling before the timer expires anyway.

Worked read of a live system: max-size 10000 bytes, max-wait 5ms, and you observe batches averaging 8000 bytes every 4ms. The timer is firing first (4ms < 5ms) but the buffer isn’t full (8000 < 10000) — moderate load, time-dominant. Bumping max-wait would grow batches and throughput, but only up to the point where the buffer fills before the timer.

Order the steps

Order what happens to one item as it flows through a size+time batching window:

1 Item arrives and is appended to the in-memory buffer
2 System checks: did this push the buffer to max-size?
3 If not full, the max-wait timer keeps running for the buffer's oldest item
4 Whichever limit is reached first — full buffer OR expired timer — triggers a flush
5 The whole batch ships as one operation, paying the fixed cost once

Sensible defaults and how to actually tune

Reasonable starting points cluster around max-wait 10–100ms and max-size 64KB–1MB; Kafka’s high-throughput recipe is batch.size 64KB–256KB with linger.ms 20–100ms, and a balanced production config like batch.size=32768, linger.ms=10, compression.type=lz4, acks=1 reaches ~25k msg/s with latency under 20ms. But defaults are a starting line, not an answer. The senior move is to derive the wait from your latency SLO, not from a desire for maximum throughput: pick the largest linger.ms your p99 budget tolerates, then size the buffer so it fills near that timer at your peak expected load. Then validate the way you can’t on a whiteboard — replay real production traffic against staging, sweep the two knobs, and read the actual batch-size distribution and tail latency. Synthetic uniform load lies; production is bursty, and only a replay shows you which trigger dominates across your real traffic shape.

Quiz

Your batching system uses max-size only (no timer). Traffic drops overnight. What's the failure mode?

Quiz

For 4KB payloads where the fixed per-call cost is ~50µs and per-KB transfer is ~40µs, will a large batch give a big speedup?

Pick the best fit

A payment-confirmation service has a strict p99 latency SLO of 25ms but also needs high throughput at peak. How should it set the batching window?

High load fills max-size before the timer (throughput-bound). Low load fires the timer first on a half-full buffer (latency-bound).

Recall before you leave

01
Why do batching systems need both a size limit and a time limit, and what breaks if you drop each one?
02
Derive the speedup formula and explain where it goes to N versus 1.
03
max-size=10000 bytes, max-wait=5ms, observed batches average 8000 bytes every 4ms. What does that tell you, and what should you change?

Recap

The two-dimensional window — max-size plus max-wait — is the core of every batching system, and the two limits are an OR: whichever fires first sends the batch. You need both because they fail in opposite directions: size-only stalls a slow producer (the batch never fills, latency goes unbounded), while time-only lets a fast producer build a batch too big for downstream. Which trigger fires also diagnoses your regime — size dominant means throughput-bound (high load), time dominant means latency-bound (low load). The speedup math is (N*F + N*V*n) / (F + V*N*n): it goes to N when fixed cost dominates (F > V*n, small payloads) and to 1 when variable cost dominates (F < V*n, large payloads), with break-even per item at F = V*n — 1KB packets over a 50µs RTT batch ~33x. Tune by deriving max-wait from your latency SLO, sizing the buffer to fill near that timer at peak load, then validating by replaying production traffic in staging — never by chasing maximum throughput on a whiteboard. Now when you look at a batching config, your first check is which trigger is firing: if it is almost always the timer, you are latency-bound and batching is costing you more than it earns.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 5 done

Connected lessons

builds on

Batching: amortize fixed cost per operationjunior

unlocks

deepens into

appears again in289

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.

Apply this

Put this lesson to work on a real build.

Collaborative cursorsShow every connected user's live cursor and selection in a shared document, conflict-free, over WebSocket.Distributed rate limiterBuild a token-bucket limiter that holds across many app instances by keeping the counter in Redis, not in process memory.