Performance
The batching window: size and wait time
You bump your Kafka producer’s linger.ms from 0 to 10 and throughput jumps 10x for the cost of 10ms of added latency. Apache made the same call for everyone: in Kafka 4.0 (March 2025) the default linger.ms changed from 0 to 5ms after years at zero. The team’s reasoning was blunt — chasing immediacy at the sender does not give you global low latency. A tiny artificial delay buys batching efficiency that often lowers end-to-end latency. To know why a 5ms delay can make a system faster, you have to understand the two-dimensional window underneath.
Two triggers, not one knob
Every batching system you will ever tune has exactly two limits: a max-size (bytes, or record count) and a max-wait (a timer). Items accumulate in a buffer. The batch flushes the instant either limit is hit — the buffer fills to max-size, or the timer reaches max-wait. They are an OR, not an AND. In Kafka these are batch.size and linger.ms; in Redis pipelining they are your client buffer and how long you let it fill; in a database bulk insert they are rows-per-statement and a flush interval. Same shape everywhere.
The naive instinct is to ship one knob — “just set a batch size” — and senior engineers have all watched that instinct page someone at 3am. You need both, and the reason is that each one alone has a failure mode the other covers.
Why size alone stalls, and time alone overflows
Drop max-wait and keep only max-size: now a slow producer holds the batch hostage. Traffic dips at 2am, items trickle in, and the batch never reaches batch.size, so it never flushes. The first message of a half-full batch can sit for seconds — unbounded latency that scales inversely with load. This is the classic head-of-line stall: the cure for it is a timer that says “ship what you have.”
Drop max-size and keep only max-wait: now a fast producer builds a monster. Black Friday hits, items flood in, and in your 50ms window the buffer swells to something downstream cannot swallow — a request larger than the broker’s message.max.bytes, a packet that fragments, a transaction that blows the WAL, an array that OOMs the consumer. The cure is a size cap that says “ship before you get too big.” You need both because they fail in opposite directions: size protects throughput, time protects latency, and removing either re-introduces the bug the other was there to prevent.
| Load regime | Trigger that fires first | What it means | Tuning lever |
|---|---|---|---|
| High load | max-size (buffer fills first) | Throughput-bound; the timer never gets to run | Raise batch.size for fewer, fatter flushes |
| Low load | max-wait (timer fires first) | Latency-bound; batches are small, timer dominates | Tune linger.ms against your latency SLO |
| Right at break-even | Either, roughly together | Window is well-matched to current traffic | Leave it; re-check when traffic shape shifts |
The speedup math, derived
Model any per-item operation as a fixed cost F (the per-call overhead — a syscall, a network round-trip, a transaction begin/commit) plus a variable cost V*n (the work proportional to payload of size n). Doing N items separately costs:
N * (F + V*n)
Doing them as one batch pays the fixed cost once and the same variable work:
F + V*(N*n)
Speedup is the ratio:
speedup = (N*F + N*V*n) / (F + V*N*n)
Look at the two extremes. When fixed cost dominates — F > V*n, i.e. small payloads where the per-call overhead is the whole story — the N*F term swamps everything and speedup → N. Batch 100 items, go ~100x faster. When variable cost dominates — F < V*n, large payloads where you’re already paying mostly for bytes — the V*N*n term swamps everything and speedup → 1: batching buys you nothing because there was no fixed cost to amortize. The crossover, the break-even point, is F = V*n: when one item’s variable cost equals the fixed overhead, batching starts to pay.
A concrete number
Take a network op: 1KB packets, a 50µs round-trip per call as the fixed cost, and a 100-item batch. Sent one at a time, the round-trips alone cost 100 * 50µs = 5ms. Batched, you pay the round-trip once plus the bytes — call it ~150µs total. That’s 5ms down to 150µs, a ~33x speedup, because here F (50µs RTT) hugely outweighs V*n (the per-KB transfer time). This is exactly the regime Redis pipelining lives in: a published benchmark sends 10,000 PINGs in 1.185s unpipelined and 0.250s pipelined — ~5x — and the gap widens as RTT grows relative to per-command work. The syscall version is identical in spirit: a syscall costs ~1–5µs, so collapsing 300k syscalls into ~4k via larger buffered writes makes the per-call overhead effectively vanish.
Why this works
Notice the break-even is per item, not per batch. If a single item’s payload already costs more than the fixed overhead (V*n > F), no batch size rescues you — you’re firmly in the speedup → 1 regime and batching only adds latency. This is why batching tiny messages (logs, metrics, key lookups) is a massive win while batching already-large blobs (video chunks, big file uploads) is mostly pointless.
Load decides which trigger rules
The window’s behavior is not static — it shifts with traffic. At high load items pour in and the buffer hits max-size long before the timer; size is the dominant trigger and you’re throughput-bound. At low load the trickle never fills the buffer, so the timer fires first; time is dominant and you’re latency-bound. The practical diagnostic: watch your average batch size against the configured max-size. If batches consistently flush near max-size, size is winning — raise it. If they flush small and on the timer, time is winning — and raising linger.ms only helps until the batch starts filling before the timer expires anyway.
Worked read of a live system: max-size 10000 bytes, max-wait 5ms, and you observe batches averaging 8000 bytes every 4ms. The timer is firing first (4ms < 5ms) but the buffer isn’t full (8000 < 10000) — moderate load, time-dominant. Bumping max-wait would grow batches and throughput, but only up to the point where the buffer fills before the timer.
Order what happens to one item as it flows through a size+time batching window:
- 1 Item arrives and is appended to the in-memory buffer
- 2 System checks: did this push the buffer to max-size?
- 3 If not full, the max-wait timer keeps running for the buffer's oldest item
- 4 Whichever limit is reached first — full buffer OR expired timer — triggers a flush
- 5 The whole batch ships as one operation, paying the fixed cost once
Sensible defaults and how to actually tune
Reasonable starting points cluster around max-wait 10–100ms and max-size 64KB–1MB; Kafka’s high-throughput recipe is batch.size 64KB–256KB with linger.ms 20–100ms, and a balanced production config like batch.size=32768, linger.ms=10, compression.type=lz4, acks=1 reaches ~25k msg/s with latency under 20ms. But defaults are a starting line, not an answer. The senior move is to derive the wait from your latency SLO, not from a desire for maximum throughput: pick the largest linger.ms your p99 budget tolerates, then size the buffer so it fills near that timer at your peak expected load. Then validate the way you can’t on a whiteboard — replay real production traffic against staging, sweep the two knobs, and read the actual batch-size distribution and tail latency. Synthetic uniform load lies; production is bursty, and only a replay shows you which trigger dominates across your real traffic shape.
Your batching system uses max-size only (no timer). Traffic drops overnight. What's the failure mode?
For 4KB payloads where the fixed per-call cost is ~50µs and per-KB transfer is ~40µs, will a large batch give a big speedup?
A payment-confirmation service has a strict p99 latency SLO of 25ms but also needs high throughput at peak. How should it set the batching window?
- 01Why do batching systems need both a size limit and a time limit, and what breaks if you drop each one?
- 02Derive the speedup formula and explain where it goes to N versus 1.
- 03max-size=10000 bytes, max-wait=5ms, observed batches average 8000 bytes every 4ms. What does that tell you, and what should you change?
The two-dimensional window — max-size plus max-wait — is the core of every batching system, and the two limits are an OR: whichever fires first sends the batch. You need both because they fail in opposite directions: size-only stalls a slow producer (the batch never fills, latency goes unbounded), while time-only lets a fast producer build a batch too big for downstream. Which trigger fires also diagnoses your regime — size dominant means throughput-bound (high load), time dominant means latency-bound (low load). The speedup math is (N*F + N*V*n) / (F + V*N*n): it goes to N when fixed cost dominates (F > V*n, small payloads) and to 1 when variable cost dominates (F < V*n, large payloads), with break-even per item at F = V*n — 1KB packets over a 50µs RTT batch ~33x. Tune by deriving max-wait from your latency SLO, sizing the buffer to fill near that timer at peak load, then validating by replaying production traffic in staging — never by chasing maximum throughput on a whiteboard.
appears again in260
- Why GraphQL gets N+1junior
- DataLoader mechanics: tick-boundary batchingmiddle
- Batch function contracts: ordering, shapes, errorsmiddle
- Federation and lookahead: batching beyond DataLoadermiddle
- Query complexity defences: depth, cost, persisted queriesmiddle
- Senior GraphQL API: scheduling contract, tenant isolation, observabilitysenior
- The journey of a request: seven stops from socket to responsejunior
- Accept and parse: from kernel queue to a typed requestmiddle
- Routing and middleware: choosing what runs, and in what ordermiddle
- Handler and response: from business logic to bytes on the wiremiddle
- Streaming and backpressure: when the client reads slower than you writesenior
- Timeouts and tail latency: budgets, deadlines, and the fan-out trapsenior
- Middleware and DI: the two patterns that shape every backendjunior
- Writing middleware: signatures, next(), and the three framework modelsmiddle
- Inversion of control: how dependencies reach a classmiddle
- DI scopes and lifecycles: singleton, request, transientmiddle
- DI as a testing seam: fakes, mocks, and the boundary that matterssenior
- DI containers in production: resolution graphs, circular deps, and when not tosenior
- Blocking vs non-blocking I/O: two ways to waitjunior
- The event loop: one thread, ordered phasesmiddle
- What blocks the loop: CPU work and sync callsmiddle
- Offloading CPU work: worker threads and the libuv poolmiddle
- Backpressure and bounded concurrencysenior
- Throughput under load: tail latency and saturationsenior
- Why pool: the cost of creating a connectionjunior
- Pool sizing: why bigger is not fastermiddle
- Acquisition and timeouts: the wait queue is the real latency dialmiddle
- Why idempotency: making retries safejunior
- Server-side state machine: four states of an idempotency keymiddle
- Retry strategies: backoff, jitter, and thundering herdmiddle
- Outbox and inbox: effectively-once across the dual-write boundarymiddle
- Concurrency and cache architecture for idempotency at scalesenior
- Observability, production failures, and global-scale designsenior
- The event loop: one thread, three queuesjunior
- Tasks, microtasks, and scheduler.yield()middle
- Timer accuracy, throttling, and idle workmiddle
- Microtask starvation, Long Tasks, and LoAFsenior
- Node.js event loop: phases, nextTick, and loop lagsenior
- React, Vue, and INP observability in productionsenior
- The render pipeline: six stages from bytes to pixelsjunior
- Stage costs and the renderer process modelmiddle
- Invalidation, dirty bits, and containmiddle
- Compositor layers: promotion, overlap, and GPU memorymiddle
- DevTools flame strip and the frame lifecyclemiddle
- Layout thrash: forced synchronous layoutsenior
- BeginMainFrame, compositor-driven animations, and GPU memorysenior
- Production observability: LoAF, INP, and the full attack surfacesenior
- What V8 is and why performance varies 100×junior
- V8''''s four-tier JIT pipeline and profile-guided tieringmiddle
- Hidden classes, transition trees, and memory layoutmiddle
- Inline caches, IC states, and deoptimizationmiddle
- Orinoco GC: parallel scavenger, concurrent marking, and write barriersmiddle
- TurboFan''''s speculative engine and the deopt-loop trapsenior
- V8 in production: isolates, pointer compression, and real failuressenior
- Service worker lifecycle and cache strategiesmiddle
- Service worker edge cases: version skew, durability, and navigation trapssenior
- What the reconciler does: render vs commitjunior
- The fiber object and the double-buffer treemiddle
- Render phase purity and commit phase sub-stepsmiddle
- Reconciliation: diffing heuristics and the key trapmiddle
- Priority lanes, time-slicing, and useTransitionmiddle
- Bailout, memoisation, and tearingsenior
- React Profiler, the Compiler, and production observabilitysenior
- Rendering strategies: SSG, SSR, ISR, streaming, and hydrationjunior
- SSG, SSR, ISR, streaming, and RSC — how each worksmiddle
- Hydration cost: selective, progressive, islands, resumabilitymiddle
- Hydration mismatch: causes, detection, and the determinism rulesenior
- RSC, per-route strategy, and production observabilitysenior
- Core Web Vitals: what LCP, INP, and CLS measurejunior
- LCP: four phases, one dominant costmiddle
- INP: input delay, processing, presentationmiddle
- CLS: why layout shifts happen and how to stop themmiddle
- Lab vs field: why the two disagree and how to use eachmiddle
- Metric tradeoffs, RUM attribution, and the CI+field loopsenior
- The full picture: URL to LCP to INP as a relay racejunior
- Eight layers traced: from the service worker to the second navigationmiddle
- Five canonical breaks: where production reliably diessenior
- The three-track method: reading traces and building a monitored systemsenior
- What is a cache stampede and why it makes things worsejunior
- Lock and single-flight: bounding concurrent rebuildsmiddle
- XFetch: coordination-free probabilistic early expirationmiddle
- Stale-while-revalidate and CDN request coalescingmiddle
- Detecting stampedes and designing TTL for productionmiddle
- Metastable failure, fencing tokens, and production postmortemssenior
- What a relation is: tables, rows, keys, and constraintsjunior
- Constraints, keys, and Postgres data typesmiddle
- Normal forms, denormalization, and why schemas stickmiddle
- JSONB, arrays, and when a side table winsmiddle
- Heap storage, TOAST, and column alignmentsenior
- Schema integrity: deferral, versioning, and production failure modessenior
- Relational vs document, wide-column, graph, and key-valuesenior
- What an index is and how it speeds up queriesjunior
- The leading-column rule and composite index designmiddle
- Partial, expression, and covering indexesmiddle
- Index types: GIN, GiST, BRIN, Hash, Bloom, and HOT updatesmiddle
- Index-only scans, the Visibility Map, and INCLUDEsenior
- Production failure modes and the index audit playbooksenior
- Index design exercise: full-text search strategysenior
- EXPLAIN and execution plans: what the planner decides and whyjunior
- Scan types: Seq, Index, Bitmap, Index-Onlymiddle
- Join algorithms and the row-estimate cascademiddle
- pg_statistic, ANALYZE, and production observabilitymiddle
- Extended statistics: fixing correlated-column estimate failuressenior
- Plan cache, cost-constant tuning, and planner internalssenior
- Production failure modes and plan stabilitysenior
- MVCC: why readers and writers never wait for each otherjunior
- Row versions and snapshots: the on-disk mechanicsmiddle
- HOT updates and isolation levels: what you gain and what you paymiddle
- Vacuum and bloat: keeping the storage tax boundedmiddle
- CLOG, XID wraparound, and MultiXact: deep visibility internalssenior
- SSI internals and production autovacuum tuningsenior
- Real-world MVCC failures, deployment patterns, and distributed snapshotssenior
- Connection pools: amortising the cost of a Postgres backendjunior
- PgBouncer session, transaction, and statement modesmiddle
- Pool sizing: the (cores × 2) + spindles formula and the two-layer stackmiddle
- Pool exhaustion and idle-in-transaction: the 3 AM failure modemiddle
- Migrating to transaction mode: rollout playbook and PgBouncer 1.21 prepared statementsmiddle
- The Postgres process model and why raising max_connections degrades throughputsenior
- Pooler landscape 2026, serverless connection storms, and the full failure-mode taxonomysenior
- What a schema migration is and why it replaces ad-hoc DDLjunior
- ADD COLUMN: instant in PG 11+ vs rewrite in older Postgresjunior
- The lock-queue failure mode: why instant DDL can freeze the databasemiddle
- Safe DDL patterns: NOT VALID, CONCURRENTLY, and unsafe-op fixesmiddle
- Expand-contract: zero-downtime for breaking schema changesmiddle
- Advisory locks, migration tools, and deploy coordinationsenior
- Migration failure taxonomy and production disciplinesenior
- Why sharding exists: the single-Postgres ceilingjunior
- Shard-key selection: hash, range, list, and directory strategiesmiddle
- Partitioning vs sharding: same word, two different thingsmiddle
- Co-location and Citus: the invariant that makes sharding usablemiddle
- The hot-shard failure mode: detection, isolation, and durable policymiddle
- Schema-based sharding and multi-tenancy alternativessenior
- Online resharding, 2PC, and the operational cost of shardingsenior
- The seven acts: from CREATE TABLE to Citusjunior
- Acts 1–3 in depth: schema, indexes, and planner statisticsmiddle
- Acts 4–6 in depth: MVCC bloat, connection pooling, and safe migrationsmiddle
- Act 7 in depth: sharding, co-location, and the seven-tier tradeoff cascademiddle
- Observability, anti-patterns, and production triagesenior
- Raft roles, terms, and why majority quorums prevent split brainjunior
- How Raft replicates a log entry and decides it is safe to commitmiddle
- Raft leader election: timeouts, voting rules, and the four safety propertiesmiddle
- Raft in the real world: partitions, slow disks, and client routingmiddle
- Raft extensions: pre-vote, learners, snapshots, and linearizable readssenior
- Raft in production: membership changes, Multi-Raft, and observabilitysenior
- Where data fetching happens — and why it decides LCPjunior
- Fetch waterfalls — diagnosis and the Promise.all curemiddle
- React Server Components and Suspense streamingmiddle
- Client-side cache: TanStack Query, SWR, and stale-while-revalidatemiddle
- LCP, prefetch, and race conditions in interactive fetchingmiddle
- Senior internals: RSC payload, caching layers, and production failure modessenior
- Bits on the wirejunior
- Latency mathmiddle
- Bufferbloat and congestionsenior
- The physical frontiersenior
- The three-way handshakejunior
- Sequence numbers and connection statemiddle
- Flow control and congestion controlmiddle
- BBR, production observability, and beyond TCPsenior
- DNS: what it does and why it existsjunior
- The resolver walk: referrals, record types, and gluemiddle
- TTL, caching, and DNS propagationmiddle
- The 1-RTT handshake: key shares and ECDHEmiddle
- Session resumption and 0-RTTmiddle
- CDN: putting content next doorjunior
- Anycast and GeoDNS: routing to the nearest edgemiddle
- Tiered cache and Cache-Controlmiddle
- Vary header and cache keysmiddle
- Stale-while-revalidate and cache stampedesenior
- Edge workers and edge-side compositionsenior
- CDN operations and observabilitysenior
- WebSocket: the HTTP upgrade handshakejunior
- WebSocket frame format: opcodes, masking, fragmentationmiddle
- WebSocket vs SSE vs long-polling: choosing the right transportmiddle
- WebSocket backpressure: when clients can''''t keep upmiddle
- Reconnection: jittered backoff, thundering herd, message resumptionsenior
- WebSocket at scale: HTTP/2 multiplexing, permessage-deflate, C10Msenior
- WebSocket in production: proxies, security, and distributed architecturesenior
- What reverse proxies dojunior
- Balancing algorithms: round-robin to power-of-two-choicesmiddle
- L4 vs L7 load balancing and client-IP preservationmiddle
- Health checks, connection draining, and slow startmiddle
- Session affinity, consistent hashing, and the right fixmiddle
- Retry storms, circuit breakers, and load sheddingsenior
- Resilient LB architecture: anycast, zone-aware routing, and observabilitysenior
- Why QUIC and not TCP+TLSjunior
- QUIC streams and head-of-line blockingjunior
- Integrated handshake and 1-RTTmiddle
- Connection IDs and network migrationmiddle
- Loss detection and congestion controlmiddle
- 0-RTT resumption and packet encryptionsenior
- Deployment tradeoffs and CPU costsenior
- DDoS: what it is and why it worksjunior
- Amplification attacks and state exhaustionmiddle
- Rate limiting: algorithms and architecturemiddle
- WAFs, firewalls, mTLS, and HSTSmiddle
- DNS cache poisoning and BGP hijackingsenior
- Defense-in-depth architecture and attack economicssenior
- The twelve layers: one URL, seven actorsjunior
- DNS, TCP, TLS in sequence: where the milliseconds gomiddle
- Critical render path and Core Web Vitalsmiddle
- Proxy intercepts and security gates: rate limiters, WAF, mTLSmiddle
- Alternate paths: QUIC 0-RTT, WebSocket upgrade, connection migrationmiddle
- Observability: distributed traces, USE/RED, and samplingsenior
- Resilience: cascading retries, circuit breakers, and error budgetssenior
- What the three signals are: logs, metrics, and tracesjunior
- Metrics and cardinality: the cost model of a time-series databasemiddle
- Logs and volume: the cost model of structured loggingmiddle
- Traces and sampling: the cost model of distributed tracingmiddle
- Join keys and exemplars: making the three signals composemiddle
- Observability 2.0: wide events and the cost shiftsenior
- Failure modes and engineering practice: cardinality budgets, PII, and samplingsenior
- Why structured logs exist: the diary vs the spreadsheetjunior
- The production log schema: fields every line must carrymiddle
- Log levels and alert routingmiddle
- Sampling strategies and log costmiddle
- PII redaction and log injectionsenior
- Trace context propagation in logssenior
- OTel Logs Data Model and audit logs as a subsystemsenior
- OTel signals, Semantic Conventions, and the OTLP wire formatmiddle
- Auto-instrumentation and manual spans: the 80/20 of OTelmiddle
- The OTel Collector: receivers, processors, exporters, and deployment patternsmiddle
- Sampling strategies: head, tail, and parent-basedmiddle
- Vendor neutrality, eBPF instrumentation, the Operator, and browser/serverless OTelsenior
- Operating the OTel Collector: reliability, version skew, failure modes, and governancesenior
- RED and USE: two checklists, one triage disciplinejunior
- Instrumenting RED in Prometheus: counters, histograms, and cardinality disciplinemiddle
- USE on Linux: CPU, memory, disk, network, and PSImiddle
- Golden signals, dashboard layout, and service mesh auto-REDmiddle
- Cardinality as a cost driver: labels, PII, exemplars, and samplingmiddle
- Native histograms, SLO tie-in, and production failure patternsmiddle
- SLI, SLO, and the error budget: reliability by the numbersjunior
- Choosing SLIs and SLO targets: ratios, not feelingsmiddle
- Multi-window multi-burn-rate alerting: why AND beats ORmiddle
- Error budget policy, latency SLOs, and composite journeysmiddle
- Iceberg SLIs, composite SLO math, and SLA vs SLOsenior
- Production SLO failures, self-observability, security, and the big picturesenior
- Flame graphs: reading the picture that shows where time goesjunior
- Sampling vs instrumentation profiling: why 99 Hz wins in productionmiddle
- Profile types: CPU, memory, off-CPU, mutex — which one to reach formiddle
- Continuous profiling: always-on flame graphs with eBPF and trace-id correlationmiddle
- How flame graphs are built from samples, and the production workflows that use themmiddle
- Linux perf, eBPF internals, PGO, and the limits of samplingsenior
- Profiling in production: security, war stories, OTel profiles, and the infrastructure designsenior
- The debugging funnel: SLO → RED → trace → profilejunior
- OTel architecture: one SDK, four signals, one wire formatmiddle
- Cost discipline: keeping observability under 5% of infra spendmiddle
- The incident loop: from pager to postmortem to preventionmiddle
- Scale, security, and the ROI of observable systemssenior
- At-most-once, at-least-once, exactly-once: the three delivery contractsjunior
- The three failure legs — where duplicates and losses actually happenmiddle
- Consumer-side dedup: the cheapest path to exactly-once processingmiddle
- Kafka exactly-once semantics: idempotent producer and transactionsmiddle
- SQS visibility timeout, DLQ, and the outbox patternmiddle
- Exactly-once in production: impossibility proof, hybrid patterns, and real incidentssenior
- What OAuth is and why passwords are not the answerjunior
- Authorization code flow with PKCEmiddle
- ID token validation and JWKS cache managementmiddle
- Refresh token rotation and scope-based least privilegemiddle
- Sender-constrained tokens: DPoP and mTLSsenior
- OAuth in production: audience attacks, observability, and real failuressenior