Performance PERF · 04 · 02

GC algorithms: generational, concurrent, and per-runtime

The generational hypothesis (most objects die young) shapes every major production GC. Knowing which algorithm your runtime ships — and why it makes different tradeoffs than others — is the prerequisite for diagnosing any pause.

PERF Middle ◷ 16 min

Level

FoundationsJuniorMiddleSenior

A Go service and a JVM service are both running at 200 MB/s allocation rate. The Go service has flat 0.5 ms pauses; the JVM service has 50 ms pauses every 10 seconds. Same allocation rate, 100x pause difference — because the collectors work entirely differently. By the end of this lesson you will know exactly why that gap exists and which collector design fits your runtime.

The generational hypothesis

The single most important empirical observation in GC research: most objects die young. A typical request handler allocates many temporary objects — parsed body, intermediate strings, response builders — serves the response, and drops them all. The objects that survive long enough to be useful are few: caches, configuration, connection pools.

Generational collectors exploit this by splitting the heap:

Young generation — small (a few hundred MB to a few GB), collected frequently. All allocations land here.
Old generation — large, collected rarely. Objects that survive several young-gen cycles get “promoted” (tenured) here.

A young-gen collection walks only the small young heap. If 95% of objects die in young, one cheap pass reclaims 95% of allocated bytes. Old-gen collections are expensive but rare because most objects never reach old.

Generation	Size	Collection frequency	Contents
Young (Eden + Survivor)	~256 MB – 4 GB	Every few seconds	New allocations; mostly short-lived
Old (Tenured)	Remaining heap	Every few minutes	Promoted survivors; mostly long-lived

Concurrent marking

To keep pauses below 10 ms on multi-GB heaps, marking must happen while the application runs. The GC uses a background thread that walks the object graph alongside mutator (application) threads.

The challenge: the application may modify references while marking is in progress, so the GC needs a write barrier — a small piece of code that runs on every reference write to keep the collector informed. Two main strategies:

Snapshot-at-the-beginning (SATB) — marks the old reference about to be overwritten so the collector treats it as live for this cycle. Used by G1, Shenandoah, ZGC.
Incremental-update — marks the new reference being written so it is not missed. Used by CMS, classic V8 mark-compact.

Write barrier overhead: ~2–10% CPU — the price of concurrent marking.

Per-runtime tour

When you tune or diagnose GC in production, the collector your runtime ships is the constraint you cannot change — so knowing its design is not optional.

Go — concurrent tri-color non-generational mark-sweep (no compaction; allocator is a tcmalloc variant). Triggered by allocation rate; the pacer keeps STW pauses under 1 ms. GOGC=100 means GC runs when the heap doubles since the last cycle; GOMEMLIMIT (Go 1.19+) adds a soft memory cap. Go is the counterexample to generational: the team bet that simpler runtime + escape analysis reducing heap pressure made the generational complexity not worth it. Empirically validated — sub-ms pauses on most workloads.

JVM (modern) — G1 is the default for most servers: region-based, generational, concurrent marking, typical pauses 10–50 ms. ZGC (JEP 333 experimental JDK 11, production JDK 15; JEP 377) targets sub-ms pauses on heaps up to 16 TB using colored pointers + load barriers. Shenandoah (Red Hat) uses Brooks pointers for similar goals. Tunable via -XX:MaxGCPauseMillis and heap-size flags.

V8 (Node.js) — generational Scavenger for young (semi-space copying), Mark-Compact for old. The Orinoco project (2017+) added concurrent marking + parallel compaction, targeting ≤10 ms pauses. Heap capped per-isolate (~1.5 GB default in Node, configurable via --max-old-space-size).

.NET — workstation/server GC, generational (gen 0/1/2 + LOH), background concurrent marking. Tunable via GCSettings.LatencyMode.

CPython — reference counting (drops object when refcount=0, no major pause) + cycle collector for reference cycles. The GIL serialises mutation but reference counts cost ~10–20% throughput. No major STW pauses; cycle collection is incremental.

Same allocation pressure, ~100x range in pauses: this spread is what 'which collector your runtime ships' actually buys you.

▸Why this works

The generational hypothesis still informs production even in non-generational runtimes like Go. Code that allocates many short-lived objects in tight loops — zero-capacity slices that grow, fmt.Sprintf on every request, JSON encoding without pooling — creates more GC pressure than long-lived state regardless of the collector’s algorithm. The lever is always the same: reduce short-lived allocations.

Quiz

A JVM service has 50 ms p99 from G1 pauses. Doubling the heap (-Xmx 8g → 16g) might help — why, and what is the risk?

Quiz

A V8 service running in Node.js grows to 1.4 GB then crashes. Most likely cause?

Quiz

A Python service is single-threaded due to the GIL, but still has GC overhead. What is the dominant cost?

Order the steps

Order the priority of GC-pressure fix levers, from highest leverage to lowest:

1 Eliminate the allocation (in-place mutation, struct-of-arrays, primitives)
2 Pool / reuse the allocation (sync.Pool, ObjectPool, bytes.Buffer reset)
3 Let escape analysis stack-allocate (smaller object, scope-local)
4 Shrink the allocation (smaller struct, smaller buffer, pre-sized container)
5 Move the allocation off the hot path (cache the result, compute once)
6 Tune the collector (GOGC, MaxGCPauseMillis, max-old-space-size)
7 Switch the collector algorithm (ParallelGC → G1 → ZGC)

Allocations land in the young gen; most die there and are reclaimed cheaply. The few survivors are promoted to the old gen, collected rarely.

Recall before you leave

01
Walk through the generational hypothesis and why it shapes most production GCs. Include one counterexample.
02
What is a write barrier in concurrent GC and why is it needed?

Recap

The generational hypothesis — most objects die young — is the empirical foundation of most production GC algorithms. Generational collectors split the heap into a small, frequently-collected young generation and a large, rarely-collected old generation; most garbage is reclaimed cheaply in the young gen. Concurrent marking runs alongside the application by adding a write barrier (~2–10% CPU) to intercept reference writes. Go is the main counterexample: non-generational, concurrent tri-color, betting that escape analysis reduces heap pressure enough. JVM G1 and ZGC, V8 Orinoco, and .NET server GC are all generational + concurrent. CPython is reference-counting with a supplementary cycle collector — no major STW pauses but per-operation refcount overhead. Now when you see a GC pause profile you have not encountered before, your first question should be: which generation did this collection target, and was concurrent marking able to run?

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 5 done

Connected lessons

builds on

GC basics: what the runtime taxes you forjunior

unlocks

deepens into

appears again in162

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.