awesome-everything RU
↑ Back to the climb

Browser & Frontend Runtime

Inline caches, IC states, and deoptimization

Crux How V8''''s per-call-site inline caches track hidden classes, the monomorphic/polymorphic/megamorphic state machine, deoptimization triggers, and the FadedExample of a hidden-class transition breaking a hot IC.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at middle altitude — in the sky
◷ 15 min

A single source line — p.x — can execute in 1 CPU cycle or in 100 cycles, depending on how many different object shapes have passed through it. The structure that decides this is the inline cache, and its state is the most impactful performance number you cannot see in DevTools by default.

How inline caches work

Every property access in JS is a polymorphic operation — the object could be any shape. V8 generates a tiny per-call-site cache: the first time a line runs, V8 records the hidden class it saw and the property offset for that class. Next time the same line runs and the object has the same hidden class, V8 reads the property at the recorded offset directly — 5–10 CPU cycles. No hashtable lookup, no type check, one MOV instruction.

The IC state machine

The IC state lives in the FeedbackVector slot for that source line. Transitions are one-way towards megamorphic:

StateHidden classes seenCostNotes
uninitialized0First execution
monomorphic15–10 cyclesFast direct access
polymorphic2–4Linear branch chainStill fast
megamorphic5+Generic lookup, ~50–100 cycles10–50× slower

A function can have hundreds of IC slots. One megamorphic site does not kill the whole function, but a hot loop containing a megamorphic property access kills its loop body.

IC state costs
Monomorphic access
5–10 cycles
Megamorphic threshold
≥5 shapes per IC slot
Megamorphic slowdown
10–50× vs monomorphic
IC slot size in FeedbackVector
16 bytes (8B class + 8B handler)
Generic lookup cost
50–100 cycles

Deoptimization

When TurboFan optimises a function, it bakes in assumptions: “this argument is always a smi”, “this object’s property z is always a number”, “this loop never overflows 32-bit indices”. A guard is inserted at each assumption. If at runtime any guard fails, V8 deoptimizes: throws away the compiled code, reconstructs the interpreter state from the optimised frame, and resumes execution in Ignition (or Sparkplug). The function may climb back up later, but the deopt itself costs ~microseconds.

Common deopt triggers:

  • Passing a different type to a previously-typed argument
  • Indexing past array bounds
  • Hitting NaN in a numeric operation
  • Accessing a deleted property

Chrome DevTools --trace-opt --trace-deopt flags expose deopts. %OptimizationStatus(fn) at --allow-natives-syntax in d8 shows the current tier.

Trace one hidden-class transition that breaks an IC

1/3
Quiz

A hot loop processes objects from 6 different constructors. What IC state does the property-access site reach?

Order the steps

Order the steps of a property access at a TurboFan-compiled monomorphic IC:

  1. 1 TurboFan-compiled function receives an object pointer in a register
  2. 2 Load the hidden-class pointer from the object header
  3. 3 Compare loaded hidden class to the expected class from compile time
  4. 4 If equal: read the property at the precomputed offset — 1 MOV
  5. 5 If not equal: deoptimize, fall back to lower tier
  6. 6 Lower tier records the new hidden class, IC may transition state
  7. 7 Function may be re-optimised with updated feedback
Quiz

A function was TurboFan-compiled and ran at 5µs. After a refactor it now runs at 250µs. What is the most likely V8-level cause?

Why this works

Why is the megamorphic threshold 5 and not some other number? V8’s polymorphic IC handler can embed up to 4 class-check branches inline before the code becomes too large and complex to manage. At 5+, the handler overhead exceeds the benefit of any specialisation, so V8 falls back to a generic runtime lookup that handles all shapes uniformly. The number 4 is an implementation constant tuned for the average JS workload; it is not a fundamental hardware limit.

Recall before you leave
  1. 01
    Describe the IC state machine: what triggers each transition?
  2. 02
    What is a deoptimization guard and when does it fire?
  3. 03
    A call site has seen 3 hidden classes. How do you bring it back to monomorphic?
Recap

An inline cache is a per-call-site stub that records the hidden class V8 saw and the property offset for that class. When the same hidden class arrives again, the property is read at the precomputed offset in 5–10 cycles — no lookup. The IC state machine transitions from uninitialized → monomorphic → polymorphic → megamorphic as more classes are observed. Megamorphic (5+ classes) forces a generic runtime lookup costing 50–100 cycles per access — 10–50× slower than monomorphic. Deoptimization happens when a TurboFan guard fires at runtime; the compiled code is discarded and execution falls back to Ignition. A single deopt is expensive but survivable; a deopt-loop — where TurboFan recompiles the same function only to deopt it on every call — is a catastrophic throughput killer.

Connected lessons
appears again in143
Continue the climb ↑Orinoco GC: parallel scavenger, concurrent marking, and write barriers
shortcuts expand
search
K
prev piece
k
next piece
j
cycle tier
t
this menu
?
sources3
expand
  1. 01
  2. 02
  3. 03

Trademarks belong to their respective owners. Editorial reference only.