Performance PERF · 04 · 03

GC tradeoffs: pause, throughput, heap — and object pooling

Every collector picks two of three: short pauses, high throughput, or low memory. Allocation pressure is the upstream cause of all three; reducing it is more durable than tuning the collector. Object pooling works only when the profile says so.

PERF Middle ◷ 14 min

Level

FoundationsJuniorMiddleSenior

Two teams both switch from G1 to ZGC. Team A sees p99 drop from 60 ms to 0.8 ms — a clear win. Team B sees pauses drop, but throughput falls 12% and memory use rises 18%. Both outcomes are correct. The tradeoff is real; knowing which axis your workload needs decides whether ZGC is the right call.

The three-way tradeoff

Every garbage collector picks two of three axes to optimise:

Collector	Pause time	Throughput	Memory overhead	Typical use
JVM ParallelGC	Poor (100s of ms)	Excellent	Low	Batch jobs
JVM G1	Good (10–50 ms)	Good	Medium	Most services
JVM ZGC	Excellent (<1 ms)	~10–15% lower	Higher	Latency-sensitive, large heaps
Go concurrent	Excellent (~0.1–1 ms)	Slightly below peak	Medium	Go services
V8 Orinoco	Good (5–15 ms)	Good	Medium	Node.js services

The wrong choice for a workload can cost 30–50% throughput or 10x p99 latency.

GC numbers to know

Go concurrent GC typical STW: ~100 µs – 1 ms
JVM G1 default pause target: 200 ms
JVM ZGC pause: <1 ms on multi-TB heaps
V8 Orinoco typical pause: 5–15 ms
Write barrier overhead: ~2–10% CPU
Allocation rate before GC dominates: ≥500 MB/s/core
Go GOGC default: 100 (heap doubles between cycles)
GC frame share in a healthy service: <5% CPU

Allocation pressure: the upstream cause

If your service allocates 1 GB/s, GC is going to be busy regardless of which collector you ship. Allocation pressure is the upstream cause; pause time is the symptom. Reducing allocation pressure has unbounded upside — every byte you do not allocate is GC work that never has to happen. Tuning the collector has bounded upside — you are optimising the cost of work; the work does not disappear.

Tuning trades one axis for another with bounded upside; reducing allocation pressure removes the work itself and moves all three axes together — the only lever with unbounded upside.

The senior pattern: when GC is wide in a CPU profile, look at the allocation profile next. The widest leaf in the alloc profile is your target. Fix levers in priority order:

Eliminate the allocation (in-place mutation, struct-of-arrays)
Pool/reuse (sync.Pool in Go, object pools in Java/JS)
Escape-analyse — let the compiler stack-allocate (Go, .NET, partially JVM)
Shrink the allocation (smaller struct, smaller buffer, pre-sized container)
Move the allocation off the hot path (compute once, reuse)
Tune the collector
Switch the collector

Notice the order: elimination and pooling come before tuning and switching because they reduce the work the GC must do, while knob-turning and collector swaps merely change how the same work is distributed. If you skip steps 1–5 and jump to step 7, you will pay the throughput or memory cost of a new collector and still have the same allocation pressure.

Object pooling: when it works and when it does not

sync.Pool in Go, Apache Commons Pool in Java, object pools in .NET — patterns for reusing allocations to reduce GC pressure. Work well when objects are expensive to create and used briefly on hot paths: bytes.Buffers, JSON encoders, regexp objects, scratch slices.

Fail or hurt when:

Objects are cheap to create (savings smaller than pool overhead)
Objects live long (pool holds memory without freeing)
Thread-coordination overhead exceeds allocation savings

The senior rule: pool only what the profile flags as hot allocation. Do not pre-pool. Go’s sync.Pool has an advantage: the GC can drain the pool between cycles — memory is not lost forever. JVM/.NET object pools hold memory until explicit release.

▸Why this works

Why does reducing allocations help BOTH p99 latency AND throughput, even though they are often presented as a tradeoff? Allocations cost twice: once when the runtime allocates, and once when the GC reclaims. Reducing allocations reduces both costs. p99 improves because GC runs less and pauses are shorter or less frequent. Throughput improves because the CPU previously spent on GC machinery (mark, scan, write barriers) is now available for application work. The “throughput vs latency tradeoff” applies to GC tuning (longer pauses → higher throughput); it does not apply to allocation reduction, which moves both metrics in the same direction.

Quiz

A latency-sensitive Java microservice with a 16 GB heap sees G1 pauses occasionally hitting 200 ms. The highest-leverage first step is:

Quiz

A Go service switches from sync.Pool for a small struct to always allocating fresh. The struct costs 80 ns to allocate. The pool costs 120 ns to access (lock + reset). What should you do?

Order the steps

Order the fix levers for GC-driven allocation pressure, from highest to lowest leverage:

1 Eliminate the allocation (in-place mutation, struct-of-arrays, primitives)
2 Pool / reuse (sync.Pool, ObjectPool, bytes.Buffer reset)
3 Let escape analysis stack-allocate (smaller object, scope-local)
4 Shrink the allocation (smaller struct, smaller buffer, pre-sized container)
5 Move off the hot path (cache the result, compute once)
6 Tune the collector (GOGC, MaxGCPauseMillis, max-old-space-size)
7 Switch the collector algorithm (G1 → ZGC)

Short pauses, high throughput, low memory: any collector can win two corners at once but not all three. Reducing allocation pressure is the only lever that moves all three inward.

Recall before you leave

01
Why is reducing allocation rate a more durable fix than tuning GC parameters?
02
Name three conditions where object pooling helps, and three where it hurts.

Recap

Every GC algorithm picks two of three axes: short pauses, high throughput, low memory. ZGC delivers sub-ms pauses at the cost of 10–15% throughput and more memory. ParallelGC maximises throughput with poor pauses. G1 balances all three. Regardless of which collector runs, allocation pressure is the upstream cause of GC overhead: reducing allocations moves pause time, throughput, and memory footprint in the right direction simultaneously — the only lever with unbounded upside. Object pooling is a powerful tool when the alloc profile confirms the hotspot, but adds complexity and can hurt when objects are cheap to create or long-lived. Now when you reach for a collector flag, ask yourself first: have I looked at the allocation profile? If not, that is your first step.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 7 done

Connected lessons

builds on

GC algorithms: generational, concurrent, and per-runtimemiddle

unlocks

GC tuning: pacing, heap shape, and allocation observabilitymiddle

deepens into

appears again in162

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.