Performance PERF · 04 · 05

GC internals: tri-color invariant, write barriers, and per-runtime deep-dives

Tri-color marking is the formal core of every concurrent GC. Write barriers maintain the invariant. Go''''s pacer, ZGC''''s colored pointers, and V8 Orinoco each implement the same ideas differently — knowing the implementation shapes how you write allocation-safe hot paths.

PERF Senior ◷ 18 min

Level

FoundationsJuniorMiddleSenior

A JVM service migrates from G1 to ZGC. Pauses drop from 60 ms to sub-millisecond on a 16 GB heap — but throughput drops 12% and memory use climbs 18%. Understanding why requires knowing what colored pointers are and what load barriers cost.

Tri-color marking and the write barrier

Tri-color abstraction (Dijkstra 1978) is the formal foundation of concurrent GC. Objects are classified into three colors:

White — not yet visited; candidate for collection if marking ends while still white.
Grey — visited, but children not yet fully scanned.
Black — visited, all children scanned; considered live.

Marking moves grey objects to black by scanning their children and greying each unvisited child. When no grey objects remain, all white objects are unreachable and may be reclaimed.

The fundamental invariant: a black object must never directly reference a white object. If a mutator thread writes a reference from a black object into a white object’s field after the black was scanned, that white object becomes unreachable in the GC’s view but reachable in the program’s. The collector would reclaim live memory — silent heap corruption.

SATB vs incremental-update barriers

The write barrier prevents invariant violations by intercepting every reference write:

Snapshot-at-the-beginning (SATB): the barrier marks the old reference about to be overwritten, ensuring it survives this cycle. The collector behaves as if it snapshotted the heap at GC start. Used by G1, Shenandoah, ZGC, and Go’s hybrid Yuasa-style barrier.

Incremental-update (Dijkstra-style): the barrier marks the new reference being written into a black object, ensuring the newly pointed-to object is scanned before the cycle ends. Used by CMS and classic V8 mark-compact.

SATB is more conservative — it may preserve objects that became garbage during the cycle (floating garbage, reclaimed in the next cycle). But it gives stronger guarantees about marking termination and is simpler to reason about. Incremental-update may require a re-marking phase to fix up changes missed during concurrent marking.

Both cost 2–10% CPU on every reference write — the price of concurrent marking without stop-the-world pauses.

Barrier type	What it marks	Used by	Side effect
SATB	Old reference (pre-write)	G1, Shenandoah, ZGC, Go	Floating garbage (one cycle delay)
Incremental-update	New reference (post-write)	CMS, classic V8	May need re-mark phase

▸Why this works

Write barriers matter for write-heavy hot paths. A service that writes millions of references per second (e.g. updating a large in-memory graph) pays the barrier cost on every write. On most CRUD services this is negligible; on graph-mutation or event-sourcing workloads it shows up in profiles as runtime.wbBufFlush (Go) or similar GC frame names. Know your write pattern before claiming the barrier is free.

Go’s pacer redesign

If you have ever wondered why a Go service’s GC behaviour changed noticeably between minor versions, the pacer redesign is usually the answer.

Go 1.18’s GC pacer rewrite (proposal 44167, by Michael Knyszek) replaced heuristics with a closed-loop control system. The old pacer estimated when to start the next GC cycle so it would finish just before the heap doubled; it had instability at high allocation rates and made poor decisions on cgo-heavy workloads.

The new pacer uses a PI controller (proportional-integral) on two signals: heap-growth rate and GC CPU utilisation. The controller targets GC finishing just before the heap reaches the goal (GOGC-derived), with integral feedback preventing sustained drift.

GOMEMLIMIT (added Go 1.19) integrates into the pacer: as the process approaches the limit, the pacer pulls GC forward — accepting higher GC CPU — to prevent OOM. When the limit is respected, the pacer backs off.

Senior production advice: set GOMEMLIMIT to ~90% of the container’s memory limit; leave GOGC at the default 100 unless profiling shows a specific reason to change it. GOGC=off is only safe for memory-bounded batch jobs that deallocate via process exit.

The redesign reduced pause variance by ~50% on real workloads. Reading: Knyszek’s GopherCon 2022 talk on the pacer redesign.

ZGC and colored pointers

ZGC (JEP 333, JDK 11 experimental; production in JDK 15 via JEP 377) achieves sub-millisecond pauses on heaps up to 16 TB using two innovations:

Colored pointers pack metadata bits into the 64-bit pointer itself. ZGC uses bits 0–41 for the address (capping the heap at ~4 TB), and bits 42–45 for marking state — “good” colors vs “bad” colors indicating relocation or pending work.

Load barriers intercept every heap load (every pointer dereference). If the color is “bad,” the barrier triggers a slow path to update the pointer in-place. Because the barrier runs inline on every load, the application participates in GC’s work incrementally instead of waiting for a big stop-the-world phase.

The result: marking, relocation, and reference processing all happen concurrently. STW phases are limited to root scanning — sub-millisecond even on multi-TB heaps.

The tradeoff: load barriers cost ~5–15% CPU on read-heavy workloads. ZGC also requires multi-mapped heaps for fast relocation, inflating virtual memory significantly (though not physical RSS). The 12% throughput drop and 18% memory increase in the hook scenario are expected ZGC costs — not bugs.

The GC three-way tradeoff: ZGC buys sub-millisecond pauses by spending throughput and memory. No collector wins all three at once — these numbers are the expected cost, not a bug.

Generational ZGC (JEP 439, JDK 21+) adds a young generation, closing most of the throughput gap with G1. Production teams on JDK 21+ should evaluate generational ZGC when migrating.

V8 Orinoco

V8’s Orinoco project (2017+) moved V8’s GC from mostly stop-the-world to mostly concurrent. Key pieces:

Concurrent marking: marking runs on a background thread alongside JavaScript execution. Write barriers (SATB-style) maintain consistency with the mutator.
Parallel compaction: multiple threads move objects in parallel during the STW compaction phase, reducing its duration.
Young-gen scavenger parallelism: multiple threads evacuate the young heap in parallel.

Result: typical web workloads see pauses ≤10 ms, with most marking work hidden in the background. Memory overhead: ~5–15% for marking infrastructure (write barriers, marking worklist).

Node.js inherits Orinoco by default. Tuning is via --max-old-space-size (old heap cap) and --max-semi-space-size (young heap, affects minor GC frequency). Major Orinoco changes can shift performance characteristics across Node versions — engineering teams should track V8 release notes when upgrading Node.

Quiz

A service migrated from G1 to ZGC sees pauses drop from 60 ms to <1 ms but throughput drops 12% and RSS grows 18%. Is this expected?

Quiz

Why does Go's GC use a SATB write barrier instead of an incremental-update barrier?

Order the steps

Order the steps a ZGC load barrier takes when reading a pointer with a 'bad' color:

1 Mutator reads a heap reference (pointer dereference)
2 Inline load barrier checks the pointer's color bits
3 Color is 'bad' — object has been relocated or is pending work
4 Barrier triggers slow path: looks up the forwarding table
5 Barrier updates the pointer in-place to the new address
6 Mutator proceeds with the corrected (healed) pointer

Marking moves objects white → grey → black. When no grey remain, all still-white objects are unreachable. The invariant: a black object must never directly reference a white one — the write barrier enforces it.

Recall before you leave

01
Explain the tri-color invariant and the role of the write barrier in maintaining it during concurrent marking.
02
What problem did Go 1.18's pacer redesign solve, and what is GOMEMLIMIT's role?

Recap

Tri-color marking classifies objects as white, grey, or black and maintains the invariant that no black object directly references a white one. The write barrier enforces this invariant during concurrent marking by intercepting every reference write: SATB marks the old reference (used by Go, G1, ZGC); incremental-update marks the new one (used by CMS, classic V8). Both cost 2–10% CPU. ZGC extends this with colored pointers — metadata bits packed into 64-bit pointers — and load barriers that heal stale pointers inline, achieving sub-millisecond pauses at the cost of ~5–15% throughput and elevated memory. Go’s pacer redesign (1.18) replaced heuristics with a PI controller; GOMEMLIMIT (1.19) gives containerised services a soft memory cap the pacer respects. V8 Orinoco brought concurrent marking and parallel compaction to reduce JavaScript GC pauses to ≤10 ms. Knowing which barrier your runtime uses shapes how you write write-heavy hot paths. Now when you see runtime.wbBufFlush dominating a Go profile, you know exactly what it is — and that your hot path writes too many references per second to ignore the barrier cost.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 5 done

Connected lessons

builds on

unlocks

GC in production: observability, security, edge cases, and fleet governancesenior

appears again in162

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.