Performance PERF · 04 · 04

GC tuning: pacing, heap shape, and allocation observability

Pacing controls when the GC fires, heap shape distinguishes live data from overhead, and allocation-rate dashboards are the leading indicator every on-call needs. Know the knobs before you touch them.

PERF Middle ◷ 14 min

Level

FoundationsJuniorMiddleSenior

A Go service is containerised with a 4 GB limit. GOGC is left at the default 100. Under traffic the heap grows to 3.8 GB, the pacer triggers frantic GC cycles, and the pod OOMs. Adding GOMEMLIMIT=3600MiB stops the OOM — no code change, no collector swap.

Pacing and the soft memory limit

Modern collectors pace themselves: they predict when the next GC cycle needs to start so it finishes before the heap is exhausted. Go’s pacer (redesigned in 1.18, refined in 1.19 with GOMEMLIMIT) blends two signals — heap-growth rate and GC CPU share — into a closed-loop controller. JVM G1’s pause-time goal directs its pacer toward keeping pauses under a target.

Go knobs:

GOGC=100 (default) — run GC when live bytes double since the last cycle. Increase to defer GC (use more memory, cycle less often); decrease to cycle more often (use less memory, more CPU).
GOMEMLIMIT — soft cap on total memory the runtime should use. The pacer becomes more aggressive as the process approaches the limit, trading throughput for memory-bound compliance. For containerised services, set to ~90% of the container’s memory limit.

JVM knobs:

-Xmx — hard heap ceiling.
-XX:MaxGCPauseMillis=N — G1 targets N ms pauses by adjusting region sizes and collection frequency. Set too low and G1 over-collects, hurting throughput. Set too high and you get the occasional 200 ms spike.

▸Why this works

In containerised Go services, OOM kills from heap overrun are common when GOMEMLIMIT is unset. The pacer doesn’t know about the cgroup limit. Setting GOMEMLIMIT to ~90% of the container limit is the first tuning knob. Only reach for GOGC after that.

Heap shape and the live-set distinction

Total heap (RSS) is not the same as live set — the bytes that are actually reachable. A 4 GB RSS may have only 800 MB live; the remaining 3.2 GB is garbage pending the next cycle, or fragmentation, or runtime-reserved arenas.

Marking time is proportional to live bytes, not total RSS. GC cycle frequency is proportional to allocation rate. Sizing the heap to ~1.5–2x the live set is the canonical tuning target: smaller ratios trigger excessively frequent GC; larger ratios waste memory without a proportional latency win.

Both extremes are failure modes — target ~1.5–2x the live set: enough headroom that GC isn't thrashing, little enough that you're not paying for idle memory.

Runtime	Live-set metric	How to read it
Go	`runtime.MemStats.HeapLive`	Bytes marked live after last GC cycle
JVM	`MemoryUsage.getUsed()` post-GC	Old-gen used bytes after a full GC
V8 / Node	`v8.getHeapStatistics().used_heap_size`	After forcing GC with `—expose-gc`
.NET	`GC.GetTotalMemory(true)`	Forces collection, returns live bytes

Workloads with large fluctuating live sets (in-memory caches, large aggregations) need a higher ratio. Small, steady-state services run fine at the lower end.

Allocation observability per runtime

Without per-runtime instrumentation you diagnose blind. When you are on-call and GC CPU is climbing, the question is always the same: which code is generating the garbage? Wiring alloc-rate dashboards before the incident is what separates a five-minute triage from a three-hour war room.

Go: runtime/metrics package + /debug/pprof/allocs scrape every 30 s. Wire alloc_bytes_total and alloc_objects_total rate into Prometheus. GODEBUG=gctrace=1 logs every cycle to stderr — useful in dev, noisy in prod.

JVM: -Xlog:gc*:file=gc.log:time,uptime,level,tags for rotation-ready GC logs. JFR ObjectAllocationInNewTLAB events for allocation profiling. Micrometer exposes gc_pause_seconds histogram and gc_memory_allocated_bytes_total rate to Prometheus.

V8 / Node: process.memoryUsage().heapUsed polled every 15 s; v8.getHeapStatistics() for a richer view. perf_hooks.PerformanceObserver with 'gc' type captures every pause event programmatically.

.NET: GC.GetTotalAllocatedBytes() rate + dotnet-counters monitor for live readout of gen-0-gc-count, gen-1-gc-count, gen-2-gc-count, and alloc-rate.

The cross-runtime alert pattern: alloc rate trending above the service-specific threshold (typical 300–500 MB/s/core) for more than 5 minutes triggers investigation. Pair alloc rate with p99 on one chart — when they diverge, GC is the suspect.

Object pooling: when it works and when it doesn’t

Pooling reuses allocations to reduce GC pressure. Works well for objects that are expensive to create, used briefly on hot paths: bytes.Buffer, JSON encoders, regexp objects, scratch slices.

Pooling fails when:

Objects are cheap to create (pool overhead exceeds savings).
Objects are long-lived (pool holds memory without freeing).
Thread-coordination cost exceeds allocation cost.

Go’s sync.Pool has an advantage: the GC can drain the pool between cycles, so memory is not locked forever. JVM and .NET object pools hold memory until explicit release — audit their max-pool-size.

The senior rule: pool only what the profile flags as a hot allocation site. Don’t pool preemptively.

Quiz

A containerised Go service OOMs sporadically under traffic spikes. The heap is allowed to grow by GOGC=100 (default). What is the first knob to set?

Quiz

A JVM service's GC log shows a 4 GB RSS but a 600 MB live set after each full GC. The heap max is set to 4 GB. What does this tell you about tuning?

Order the steps

Order the GC pressure fix levers from highest leverage to lowest:

1 Eliminate the allocation (in-place mutation, struct-of-arrays, primitives)
2 Pool / reuse the allocation (sync.Pool, ObjectPool, bytes.Buffer reset)
3 Let escape analysis stack-allocate (smaller object, scope-local)
4 Shrink the allocation (smaller struct, smaller buffer, pre-sized container)
5 Move allocation off the hot path (cache the result, compute once)
6 Tune collector knobs (GOGC, MaxGCPauseMillis, max-old-space-size)
7 Switch the collector algorithm (ParallelGC → G1 → ZGC)

Profile allocation rate first, set the heap bound, then iterate: measure pause and throughput, adjust toward ~1.5–2x the live set, and re-measure.

Recall before you leave

01
Why should heap max-size target ~1.5–2x the live set, and what happens at the extremes?
02
When does object pooling help GC pressure and when does it hurt?

Recap

GC pacing uses runtime signals to decide when the next cycle fires. In Go, GOMEMLIMIT is the first knob for containerised services — it keeps the process within its memory budget without code changes. In JVM, MaxGCPauseMillis directs G1 toward a pause target. The heap should be sized to ~1.5–2x the live set: below that, GC runs constantly; above that, memory is wasted. Allocation observability — per-runtime rate dashboards wired to Prometheus — is the leading indicator that catches GC regressions before they become SLO burns. Object pooling reduces GC pressure on hot allocation paths but hurts when objects are cheap or long-lived. Profile allocation rate first; tune collector knobs second; switch algorithms last. Now when you deploy a new containerised Go service, setting GOMEMLIMIT before anything else is no longer optional — it is the first line of defence against OOM.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 5 done

Connected lessons

builds on

GC tradeoffs: pause, throughput, heap — and object poolingmiddle

unlocks

deepens into

appears again in162

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.