Browser & Frontend Runtime WEB · 03 · 06

TurboFan''''s speculative engine and the deopt-loop trap

TurboFan''''s sea-of-nodes IR, escape analysis, polymorphic inlining, speculative guards, the FeedbackVector in depth, Sparkplug''''s non-optimising pass, and diagnosing deopt-loops with --trace-deopt.

WEB Senior ◷ 18 min

Level

FoundationsJuniorMiddleSenior

A performance-critical function deopts repeatedly in production — every call triggers a 100ms TurboFan recompile, then deopts again. Your hot path is orders-of-magnitude slower than uncompiled. This is the deopt-loop, and it starts from one misunderstood guard.

TurboFan’s speculative optimisation engine

When a function is hot enough and its FeedbackVector is stable enough, V8 hands it to TurboFan — and the optimisations TurboFan applies are what let a hot JS inner loop compete with hand-written C.

TurboFan builds a sea-of-nodes graph IR, performs:

Aggressive inlining — call sites with known targets are inlined, eliminating call overhead and enabling cross-function optimisations.
Escape analysis — objects that never escape the function scope are stack-allocated or elided entirely (zero allocation, no GC pressure).
Polymorphic inlining — up to 4 known call targets are inlined with an upfront class-check; the generic slow path handles rare shapes.
Range analysis — a variable known to be 0..255 stays in a byte register; a loop index known bounded avoids overflow guards.
Type-narrowing — iterative refinement of types through the graph, enabling tighter machine code.

At every speculative assumption, TurboFan installs a guard instruction: “this is a smi” before arithmetic, “this object has hidden class HCx” before property access. Guard failure means deopt. The cost-benefit: TurboFan code is often 3–10× faster than Maglev when guards hold; when they break, the overhead is worse than never TurboFan-compiling.

The speculation cycle. A single trip round it costs microseconds and is survivable. The trap is the deopt-loop: when the violating value recurs on every call, V8 re-optimises and deopts forever — slower than never optimising.

The FeedbackVector in depth

If you want to understand why a function refuses to stay optimised despite multiple recompiles, the FeedbackVector is where you look — it is the engine’s memory of what it observed, and a single polluted slot is enough to block TurboFan.

Every function has an associated FeedbackVector — a fixed-size array in the GC heap, one slot per IC site (property load, call, binary op). Slots store:

One hidden class pointer (monomorphic)
A small array of classes (polymorphic)
MegamorphicSentinel (given up)

All four tiers read it: Sparkplug reads minimal feedback (type), Maglev reads richer data (class chains, call target frequencies), TurboFan reads everything plus loop trip counts and branch probabilities.

FeedbackVector pollution: a single megamorphic slot explains why V8 cannot keep a function optimised even after deopt-and-reopt cycles. %DebugPrintFeedback(fn) in d8 dumps the vector; inspecting it is the starting point for diagnosing persistent deopt behaviour.

Sparkplug: the JIT that does not optimise

Sparkplug (V8 9.1, May 2021) emits machine code from bytecode in a single linear pass with no SSA, no inlining, no instruction scheduling. For each Ignition bytecode, Sparkplug emits a fixed block of native instructions — approximately 1.5–2× faster than Ignition because the interpreter dispatch loop overhead (table lookup, indirect jump) is replaced by direct execution. Compile speed: ~1ms/kB bytecode. Documented average benefit: 5–15% on real workloads. Purpose: give hot-but-not-hot-enough-for-Maglev functions a speedup without paying Maglev’s compile cost.

Maglev: SSA, but cheap

Maglev (2023) fills the gap between Sparkplug and TurboFan. Pipeline: bytecode → Maglev IR (SSA) → linear-scan register allocator → native code. The IR is simpler than TurboFan’s (no effect chains, no graph rewriting passes); the register allocator is linear scan rather than graph-colouring. Maglev does some speculative optimisation — specialises property loads and arithmetic on observed types — but skips escape analysis, polymorphic inlining, and type-narrowing iteration. Result: ~10ms compile, ~50–70% of TurboFan code quality, correct tradeoff for medium-hot code.

Senior-tier V8 numbers

Sparkplug shipped: V8 9.1 (May 2021)
Maglev shipped: V8 11.0 (mid-2023)
TurboFan compile time: ~100 ms / function
Maglev compile time: ~10 ms / function
IC slot size: 16 bytes (8B class + 8B handler)
Sparkplug compile rate: ~1 ms / kB bytecode
TurboFan vs Maglev speed: TurboFan ~1.5–3× faster when stable

Trace it

1/5

A perf-critical function deopt-loops in production. Trace and fix.

Step 1 of 5

Step 1: deopt-loop means continuous tier-down then tier-up. What in --trace-deopt would confirm?

Locked

Step 2: reason is 'not a smi'. What does that mean?

Locked

Step 3: find the offending line. How?

Locked

Step 4: fix?

Locked

Step 5: prevent recurrence?

Debug this

Diagnose the V8 deopt log — what is the root cause?

log

[deoptimizing (DEOPT eager): begin 0x... <JSFunction processItem (sfi = 0x...)> (opt #42) @14, FP to SP delta: 128, caller sp: 0x...]
          ;;; deoptimize at <main.js:42:18>, not a Smi
 bytecode position 14
[deoptimizing (eager): end 0x... processItem  @14 => node=4, pc=0x..., caller sp=0x..., took 0.012 ms]
[marking 0x... <JSFunction processItem (sfi = 0x...)> for non-concurrent optimization]
[compiling method 0x... <JSFunction processItem (sfi = 0x...)> using TurboFan]
[deoptimizing (DEOPT eager): begin 0x... <JSFunction processItem (sfi = 0x...)> (opt #43) @14, FP to SP delta: 128, caller sp: 0x...]
          ;;; deoptimize at <main.js:42:18>, not a Smi
 bytecode position 14

The same function keeps deoptimizing with reason 'not a Smi' at line 42:18. What is happening and how do you fix it?

Quiz

V8 has FOUR JIT tiers and concurrent GC. What is the deepest performance reason these are necessary instead of one good optimiser?

Quiz

A function processes 10M items per frame and is monomorphic in tests but megamorphic in production. What is the most likely cause?

▸Why this works

Why does TurboFan use a sea-of-nodes IR instead of a traditional basic-block CFG? Sea-of-nodes represents both control flow and data flow as edges in the same graph — there is no notion of “instruction order” until scheduling. This enables more aggressive optimisations: escape analysis can hoist allocations out of loops, range analysis propagates bounds across branches, and dead-code elimination works at the expression level, not just the basic-block level. The cost: the compiler is significantly more complex and harder to debug. But for a dynamically-typed language where the compiler must reason about types observed at runtime rather than declared at compile time, the flexibility is worth it.

Recall before you leave

01
What is TurboFan's sea-of-nodes IR and what optimisations does it enable?
02
Explain FeedbackVector 'pollution' and how to diagnose it.
03
What distinguishes a deopt-loop from a one-time deopt?

Recap

TurboFan builds a sea-of-nodes IR and applies escape analysis, polymorphic inlining, range narrowing, and aggressive type specialisation. Every speculative assumption becomes a guard instruction; guard failure triggers deoptimization. A single deopt costs microseconds; a deopt-loop — where TurboFan recompiles and deopts on every call — costs more than never optimising. The FeedbackVector is the profile data that drives every tier: Ignition writes it, TurboFan reads it to know what shapes and types to specialise on. FeedbackVector pollution (megamorphic slots) can block TurboFan entirely. Sparkplug (V8 9.1) is the baseline tier that costs ~1ms/kB and delivers 1.5–2× over Ignition; Maglev (2023) adds SSA-based specialisation at ~10ms compile time, reaching 50–70% of TurboFan quality. The right mental model: tiers are not a waterfall but a ladder with lifts — functions can be on different tiers simultaneously, and the tiering heuristics continuously re-evaluate based on FeedbackVector data. Now when you see a hot function performing orders of magnitude below expectations, reach for --trace-deopt first: if you see the same function deoptimising and recompiling repeatedly, you have found a deopt-loop, and the fix lives in the data model — not in the engine.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 5 done

Connected lessons

builds on

unlocks

deepens into

V8 in production: isolates, pointer compression, and real failuressenior

appears again in169

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.