awesome-everything RU
↑ Back to the climb

Performance

GC algorithms: generational, concurrent, and per-runtime

Crux The generational hypothesis (most objects die young) shapes every major production GC. Knowing which algorithm your runtime ships — and why it makes different tradeoffs than others — is the prerequisite for diagnosing any pause.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at middle altitude — in the sky
◷ 16 min

A Go service and a JVM service are both running at 200 MB/s allocation rate. The Go service has flat 0.5 ms pauses; the JVM service has 50 ms pauses every 10 seconds. Same allocation rate, 100x pause difference — because the collectors work entirely differently.

The generational hypothesis

The single most important empirical observation in GC research: most objects die young. A typical request handler allocates many temporary objects — parsed body, intermediate strings, response builders — serves the response, and drops them all. The objects that survive long enough to be useful are few: caches, configuration, connection pools.

Generational collectors exploit this by splitting the heap:

  • Young generation — small (a few hundred MB to a few GB), collected frequently. All allocations land here.
  • Old generation — large, collected rarely. Objects that survive several young-gen cycles get “promoted” (tenured) here.

A young-gen collection walks only the small young heap. If 95% of objects die in young, one cheap pass reclaims 95% of allocated bytes. Old-gen collections are expensive but rare because most objects never reach old.

GenerationSizeCollection frequencyContents
Young (Eden + Survivor)~256 MB – 4 GBEvery few secondsNew allocations; mostly short-lived
Old (Tenured)Remaining heapEvery few minutesPromoted survivors; mostly long-lived

Concurrent marking

To keep pauses below 10 ms on multi-GB heaps, marking must happen while the application runs. The GC uses a background thread that walks the object graph alongside mutator (application) threads.

The challenge: the application may modify references while marking is in progress, so the GC needs a write barrier — a small piece of code that runs on every reference write to keep the collector informed. Two main strategies:

  • Snapshot-at-the-beginning (SATB) — marks the old reference about to be overwritten so the collector treats it as live for this cycle. Used by G1, Shenandoah, ZGC.
  • Incremental-update — marks the new reference being written so it is not missed. Used by CMS, classic V8 mark-compact.

Write barrier overhead: ~2–10% CPU — the price of concurrent marking.

Per-runtime tour

Go — concurrent tri-color non-generational mark-sweep (no compaction; allocator is a tcmalloc variant). Triggered by allocation rate; the pacer keeps STW pauses under 1 ms. GOGC=100 means GC runs when the heap doubles since the last cycle; GOMEMLIMIT (Go 1.19+) adds a soft memory cap. Go is the counterexample to generational: the team bet that simpler runtime + escape analysis reducing heap pressure made the generational complexity not worth it. Empirically validated — sub-ms pauses on most workloads.

JVM (modern) — G1 is the default for most servers: region-based, generational, concurrent marking, typical pauses 10–50 ms. ZGC (JEP 333 experimental JDK 11, production JDK 15; JEP 377) targets sub-ms pauses on heaps up to 16 TB using colored pointers + load barriers. Shenandoah (Red Hat) uses Brooks pointers for similar goals. Tunable via -XX:MaxGCPauseMillis and heap-size flags.

V8 (Node.js) — generational Scavenger for young (semi-space copying), Mark-Compact for old. The Orinoco project (2017+) added concurrent marking + parallel compaction, targeting ≤10 ms pauses. Heap capped per-isolate (~1.5 GB default in Node, configurable via --max-old-space-size).

.NET — workstation/server GC, generational (gen 0/1/2 + LOH), background concurrent marking. Tunable via GCSettings.LatencyMode.

CPython — reference counting (drops object when refcount=0, no major pause) + cycle collector for reference cycles. The GIL serialises mutation but reference counts cost ~10–20% throughput. No major STW pauses; cycle collection is incremental.

Why this works

The generational hypothesis still informs production even in non-generational runtimes like Go. Code that allocates many short-lived objects in tight loops — zero-capacity slices that grow, fmt.Sprintf on every request, JSON encoding without pooling — creates more GC pressure than long-lived state regardless of the collector’s algorithm. The lever is always the same: reduce short-lived allocations.

Quiz

A JVM service has 50 ms p99 from G1 pauses. Doubling the heap (-Xmx 8g → 16g) might help — why, and what is the risk?

Quiz

A V8 service running in Node.js grows to 1.4 GB then crashes. Most likely cause?

Quiz

A Python service is single-threaded due to the GIL, but still has GC overhead. What is the dominant cost?

Order the steps

Order the priority of GC-pressure fix levers, from highest leverage to lowest:

  1. 1 Eliminate the allocation (in-place mutation, struct-of-arrays, primitives)
  2. 2 Pool / reuse the allocation (sync.Pool, ObjectPool, bytes.Buffer reset)
  3. 3 Let escape analysis stack-allocate (smaller object, scope-local)
  4. 4 Shrink the allocation (smaller struct, smaller buffer, pre-sized container)
  5. 5 Move the allocation off the hot path (cache the result, compute once)
  6. 6 Tune the collector (GOGC, MaxGCPauseMillis, max-old-space-size)
  7. 7 Switch the collector algorithm (ParallelGC → G1 → ZGC)
Recall before you leave
  1. 01
    Walk through the generational hypothesis and why it shapes most production GCs. Include one counterexample.
  2. 02
    What is a write barrier in concurrent GC and why is it needed?
Recap

The generational hypothesis — most objects die young — is the empirical foundation of most production GC algorithms. Generational collectors split the heap into a small, frequently-collected young generation and a large, rarely-collected old generation; most garbage is reclaimed cheaply in the young gen. Concurrent marking runs alongside the application by adding a write barrier (~2–10% CPU) to intercept reference writes. Go is the main counterexample: non-generational, concurrent tri-color, betting that escape analysis reduces heap pressure enough. JVM G1 and ZGC, V8 Orinoco, and .NET server GC are all generational + concurrent. CPython is reference-counting with a supplementary cycle collector — no major STW pauses but per-operation refcount overhead.

Connected lessons
appears again in159
Continue the climb ↑GC tradeoffs: pause, throughput, heap — and object pooling
shortcuts expand
search
K
prev piece
k
next piece
j
cycle tier
t
this menu
?
sources6
expand
  1. 01
  2. 02
  3. 03
  4. 04
  5. 05
  6. 06

Trademarks belong to their respective owners. Editorial reference only.