awesome-everything RU
↑ Back to the climb

Performance

GC tuning: pacing, heap shape, and allocation observability

Crux Pacing controls when the GC fires, heap shape distinguishes live data from overhead, and allocation-rate dashboards are the leading indicator every on-call needs. Know the knobs before you touch them.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at middle altitude — in the sky
◷ 14 min

A Go service is containerised with a 4 GB limit. GOGC is left at the default 100. Under traffic the heap grows to 3.8 GB, the pacer triggers frantic GC cycles, and the pod OOMs. Adding GOMEMLIMIT=3600MiB stops the OOM — no code change, no collector swap.

Pacing and the soft memory limit

Modern collectors pace themselves: they predict when the next GC cycle needs to start so it finishes before the heap is exhausted. Go’s pacer (redesigned in 1.18, refined in 1.19 with GOMEMLIMIT) blends two signals — heap-growth rate and GC CPU share — into a closed-loop controller. JVM G1’s pause-time goal directs its pacer toward keeping pauses under a target.

Go knobs:

  • GOGC=100 (default) — run GC when live bytes double since the last cycle. Increase to defer GC (use more memory, cycle less often); decrease to cycle more often (use less memory, more CPU).
  • GOMEMLIMIT — soft cap on total memory the runtime should use. The pacer becomes more aggressive as the process approaches the limit, trading throughput for memory-bound compliance. For containerised services, set to ~90% of the container’s memory limit.

JVM knobs:

  • -Xmx — hard heap ceiling.
  • -XX:MaxGCPauseMillis=N — G1 targets N ms pauses by adjusting region sizes and collection frequency. Set too low and G1 over-collects, hurting throughput. Set too high and you get the occasional 200 ms spike.
Why this works

In containerised Go services, OOM kills from heap overrun are common when GOMEMLIMIT is unset. The pacer doesn’t know about the cgroup limit. Setting GOMEMLIMIT to ~90% of the container limit is the first tuning knob. Only reach for GOGC after that.

Heap shape and the live-set distinction

Total heap (RSS) is not the same as live set — the bytes that are actually reachable. A 4 GB RSS may have only 800 MB live; the remaining 3.2 GB is garbage pending the next cycle, or fragmentation, or runtime-reserved arenas.

Marking time is proportional to live bytes, not total RSS. GC cycle frequency is proportional to allocation rate. Sizing the heap to ~1.5–2x the live set is the canonical tuning target: smaller ratios trigger excessively frequent GC; larger ratios waste memory without a proportional latency win.

RuntimeLive-set metricHow to read it
Goruntime.MemStats.HeapLiveBytes marked live after last GC cycle
JVMMemoryUsage.getUsed() post-GCOld-gen used bytes after a full GC
V8 / Nodev8.getHeapStatistics().used_heap_sizeAfter forcing GC with —expose-gc
.NETGC.GetTotalMemory(true)Forces collection, returns live bytes

Workloads with large fluctuating live sets (in-memory caches, large aggregations) need a higher ratio. Small, steady-state services run fine at the lower end.

Allocation observability per runtime

Without per-runtime instrumentation you diagnose blind.

Go: runtime/metrics package + /debug/pprof/allocs scrape every 30 s. Wire alloc_bytes_total and alloc_objects_total rate into Prometheus. GODEBUG=gctrace=1 logs every cycle to stderr — useful in dev, noisy in prod.

JVM: -Xlog:gc*:file=gc.log:time,uptime,level,tags for rotation-ready GC logs. JFR ObjectAllocationInNewTLAB events for allocation profiling. Micrometer exposes gc_pause_seconds histogram and gc_memory_allocated_bytes_total rate to Prometheus.

V8 / Node: process.memoryUsage().heapUsed polled every 15 s; v8.getHeapStatistics() for a richer view. perf_hooks.PerformanceObserver with 'gc' type captures every pause event programmatically.

.NET: GC.GetTotalAllocatedBytes() rate + dotnet-counters monitor for live readout of gen-0-gc-count, gen-1-gc-count, gen-2-gc-count, and alloc-rate.

The cross-runtime alert pattern: alloc rate trending above the service-specific threshold (typical 300–500 MB/s/core) for more than 5 minutes triggers investigation. Pair alloc rate with p99 on one chart — when they diverge, GC is the suspect.

Object pooling: when it works and when it doesn’t

Pooling reuses allocations to reduce GC pressure. Works well for objects that are expensive to create, used briefly on hot paths: bytes.Buffer, JSON encoders, regexp objects, scratch slices.

Pooling fails when:

  • Objects are cheap to create (pool overhead exceeds savings).
  • Objects are long-lived (pool holds memory without freeing).
  • Thread-coordination cost exceeds allocation cost.

Go’s sync.Pool has an advantage: the GC can drain the pool between cycles, so memory is not locked forever. JVM and .NET object pools hold memory until explicit release — audit their max-pool-size.

The senior rule: pool only what the profile flags as a hot allocation site. Don’t pool preemptively.

Quiz

A containerised Go service OOMs sporadically under traffic spikes. The heap is allowed to grow by GOGC=100 (default). What is the first knob to set?

Quiz

A JVM service's GC log shows a 4 GB RSS but a 600 MB live set after each full GC. The heap max is set to 4 GB. What does this tell you about tuning?

Order the steps

Order the GC pressure fix levers from highest leverage to lowest:

  1. 1 Eliminate the allocation (in-place mutation, struct-of-arrays, primitives)
  2. 2 Pool / reuse the allocation (sync.Pool, ObjectPool, bytes.Buffer reset)
  3. 3 Let escape analysis stack-allocate (smaller object, scope-local)
  4. 4 Shrink the allocation (smaller struct, smaller buffer, pre-sized container)
  5. 5 Move allocation off the hot path (cache the result, compute once)
  6. 6 Tune collector knobs (GOGC, MaxGCPauseMillis, max-old-space-size)
  7. 7 Switch the collector algorithm (ParallelGC → G1 → ZGC)
Recall before you leave
  1. 01
    Why should heap max-size target ~1.5–2x the live set, and what happens at the extremes?
  2. 02
    When does object pooling help GC pressure and when does it hurt?
Recap

GC pacing uses runtime signals to decide when the next cycle fires. In Go, GOMEMLIMIT is the first knob for containerised services — it keeps the process within its memory budget without code changes. In JVM, MaxGCPauseMillis directs G1 toward a pause target. The heap should be sized to ~1.5–2x the live set: below that, GC runs constantly; above that, memory is wasted. Allocation observability — per-runtime rate dashboards wired to Prometheus — is the leading indicator that catches GC regressions before they become SLO burns. Object pooling reduces GC pressure on hot allocation paths but hurts when objects are cheap or long-lived. Profile allocation rate first; tune collector knobs second; switch algorithms last.

Connected lessons
appears again in159
Continue the climb ↑GC internals: tri-color invariant, write barriers, and per-runtime deep-dives
shortcuts expand
search
K
prev piece
k
next piece
j
cycle tier
t
this menu
?
sources5
expand
  1. 01
  2. 02
  3. 03
  4. 04
  5. 05

Trademarks belong to their respective owners. Editorial reference only.