awesome-everything RU
↑ Back to the climb

Browser & Frontend Runtime

TurboFan''''s speculative engine and the deopt-loop trap

Crux TurboFan''''s sea-of-nodes IR, escape analysis, polymorphic inlining, speculative guards, the FeedbackVector in depth, Sparkplug''''s non-optimising pass, and diagnosing deopt-loops with --trace-deopt.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at senior altitude — in orbit
◷ 18 min

A performance-critical function deopts repeatedly in production — every call triggers a 100ms TurboFan recompile, then deopts again. Your hot path is orders-of-magnitude slower than uncompiled. This is the deopt-loop, and it starts from one misunderstood guard.

TurboFan’s speculative optimisation engine

TurboFan builds a sea-of-nodes graph IR, performs:

  • Aggressive inlining — call sites with known targets are inlined, eliminating call overhead and enabling cross-function optimisations.
  • Escape analysis — objects that never escape the function scope are stack-allocated or elided entirely (zero allocation, no GC pressure).
  • Polymorphic inlining — up to 4 known call targets are inlined with an upfront class-check; the generic slow path handles rare shapes.
  • Range analysis — a variable known to be 0..255 stays in a byte register; a loop index known bounded avoids overflow guards.
  • Type-narrowing — iterative refinement of types through the graph, enabling tighter machine code.

At every speculative assumption, TurboFan installs a guard instruction: “this is a smi” before arithmetic, “this object has hidden class HCx” before property access. Guard failure means deopt. The cost-benefit: TurboFan code is often 3–10× faster than Maglev when guards hold; when they break, the overhead is worse than never TurboFan-compiling.

The FeedbackVector in depth

Every function has an associated FeedbackVector — a fixed-size array in the GC heap, one slot per IC site (property load, call, binary op). Slots store:

  • One hidden class pointer (monomorphic)
  • A small array of classes (polymorphic)
  • MegamorphicSentinel (given up)

All four tiers read it: Sparkplug reads minimal feedback (type), Maglev reads richer data (class chains, call target frequencies), TurboFan reads everything plus loop trip counts and branch probabilities.

FeedbackVector pollution: a single megamorphic slot explains why V8 cannot keep a function optimised even after deopt-and-reopt cycles. %DebugPrintFeedback(fn) in d8 dumps the vector; inspecting it is the starting point for diagnosing persistent deopt behaviour.

Sparkplug: the JIT that does not optimise

Sparkplug (V8 9.1, May 2021) emits machine code from bytecode in a single linear pass with no SSA, no inlining, no instruction scheduling. For each Ignition bytecode, Sparkplug emits a fixed block of native instructions — approximately 1.5–2× faster than Ignition because the interpreter dispatch loop overhead (table lookup, indirect jump) is replaced by direct execution. Compile speed: ~1ms/kB bytecode. Documented average benefit: 5–15% on real workloads. Purpose: give hot-but-not-hot-enough-for-Maglev functions a speedup without paying Maglev’s compile cost.

Maglev: SSA, but cheap

Maglev (2023) fills the gap between Sparkplug and TurboFan. Pipeline: bytecode → Maglev IR (SSA) → linear-scan register allocator → native code. The IR is simpler than TurboFan’s (no effect chains, no graph rewriting passes); the register allocator is linear scan rather than graph-colouring. Maglev does some speculative optimisation — specialises property loads and arithmetic on observed types — but skips escape analysis, polymorphic inlining, and type-narrowing iteration. Result: ~10ms compile, ~50–70% of TurboFan code quality, correct tradeoff for medium-hot code.

Senior-tier V8 numbers
Sparkplug shipped
V8 9.1 (May 2021)
Maglev shipped
V8 11.0 (mid-2023)
TurboFan compile time
~100 ms / function
Maglev compile time
~10 ms / function
IC slot size
16 bytes (8B class + 8B handler)
Sparkplug compile rate
~1 ms / kB bytecode
TurboFan vs Maglev speed
TurboFan ~1.5–3× faster when stable
Trace it
1/5

A perf-critical function deopt-loops in production. Trace and fix.

1
Step 1 of 5
Step 1: deopt-loop means continuous tier-down then tier-up. What in --trace-deopt would confirm?
2
Locked
Step 2: reason is 'not a smi'. What does that mean?
3
Locked
Step 3: find the offending line. How?
4
Locked
Step 4: fix?
5
Locked
Step 5: prevent recurrence?
Debug this

Diagnose the V8 deopt log — what is the root cause?

log
[deoptimizing (DEOPT eager): begin 0x... <JSFunction processItem (sfi = 0x...)> (opt #42) @14, FP to SP delta: 128, caller sp: 0x...]
          ;;; deoptimize at <main.js:42:18>, not a Smi
 bytecode position 14
[deoptimizing (eager): end 0x... processItem  @14 => node=4, pc=0x..., caller sp=0x..., took 0.012 ms]
[marking 0x... <JSFunction processItem (sfi = 0x...)> for non-concurrent optimization]
[compiling method 0x... <JSFunction processItem (sfi = 0x...)> using TurboFan]
[deoptimizing (DEOPT eager): begin 0x... <JSFunction processItem (sfi = 0x...)> (opt #43) @14, FP to SP delta: 128, caller sp: 0x...]
          ;;; deoptimize at <main.js:42:18>, not a Smi
 bytecode position 14

The same function keeps deoptimizing with reason 'not a Smi' at line 42:18. What is happening and how do you fix it?

Quiz

V8 has FOUR JIT tiers and concurrent GC. What is the deepest performance reason these are necessary instead of one good optimiser?

Quiz

A function processes 10M items per frame and is monomorphic in tests but megamorphic in production. What is the most likely cause?

Why this works

Why does TurboFan use a sea-of-nodes IR instead of a traditional basic-block CFG? Sea-of-nodes represents both control flow and data flow as edges in the same graph — there is no notion of “instruction order” until scheduling. This enables more aggressive optimisations: escape analysis can hoist allocations out of loops, range analysis propagates bounds across branches, and dead-code elimination works at the expression level, not just the basic-block level. The cost: the compiler is significantly more complex and harder to debug. But for a dynamically-typed language where the compiler must reason about types observed at runtime rather than declared at compile time, the flexibility is worth it.

Recall before you leave
  1. 01
    What is TurboFan's sea-of-nodes IR and what optimisations does it enable?
  2. 02
    Explain FeedbackVector 'pollution' and how to diagnose it.
  3. 03
    What distinguishes a deopt-loop from a one-time deopt?
Recap

TurboFan builds a sea-of-nodes IR and applies escape analysis, polymorphic inlining, range narrowing, and aggressive type specialisation. Every speculative assumption becomes a guard instruction; guard failure triggers deoptimization. A single deopt costs microseconds; a deopt-loop — where TurboFan recompiles and deopts on every call — costs more than never optimising. The FeedbackVector is the profile data that drives every tier: Ignition writes it, TurboFan reads it to know what shapes and types to specialise on. FeedbackVector pollution (megamorphic slots) can block TurboFan entirely. Sparkplug (V8 9.1) is the baseline tier that costs ~1ms/kB and delivers 1.5–2× over Ignition; Maglev (2023) adds SSA-based specialisation at ~10ms compile time, reaching 50–70% of TurboFan quality. The right mental model: tiers are not a waterfall but a ladder with lifts — functions can be on different tiers simultaneously, and the tiering heuristics continuously re-evaluate based on FeedbackVector data.

Connected lessons
appears again in143
Continue the climb ↑V8 in production: isolates, pointer compression, and real failures
shortcuts expand
search
K
prev piece
k
next piece
j
cycle tier
t
this menu
?
sources4
expand
  1. 01
  2. 02
  3. 03
  4. 04

Trademarks belong to their respective owners. Editorial reference only.