Performance PERF · 02 · 01

What makes a hot path: symptom vs cause

A wide flame-graph frame names where time accumulates, not why. The same leaf can hide five different bottlenecks — each demanding a different fix.

PERF Junior ◷ 12 min

Level

FoundationsJuniorMiddleSenior

Already know this unit? Take a 1-minute quick check →

The profile is done. One flame-graph frame is wide. Two engineers want to switch template engines. A third engineer asks: “Wide from what? CPU work, allocations, lock contention, or a syscall?” Only one of those four has “switch template engines” as the right fix.

What a hot path is

In ten minutes you will know how to tell apart five very different problems that all look like “a wide frame” — and why the wrong diagnosis wastes more time than skipping optimisation entirely.

A hot path is a sequence of calls the program spends most of its time in. The profile shows it as a stack of wide frames climbing from a leaf back to a top-level entry. The leaf names a function; the question is why that function is expensive.

Modern hardware turns the same “1 second of CPU” into very different problems depending on what the CPU was actually doing: executing instructions, waiting for memory, waiting for a lock, waiting for a syscall to return. The diagnosis decides which family of fix applies.

One wide frame is a single symptom that fans out into five causes — and each branch ends in a different fix, so the diagnosis must precede the change.

Applying the wrong fix to the right hotspot is the second most common waste in performance work — after optimising the wrong hotspot entirely (covered in the profile-first unit).

The waiting room metaphor

A doctor’s waiting room is full. That tells you the room is busy — not why. Are patients waiting for the doctor, the lab, paperwork, or parking? Each has a different fix: more doctors, faster lab, fewer forms, more parking.

A wide flame-graph frame is the same: the room is full; ask what people are waiting on inside.

Wide frame shows	What it actually means	Where to look next
High self-time in user function	Function does real CPU work	Inspect the algorithm or data layout
Wide GC frames near leaf	Caller allocates a lot	Switch to allocation profile
Wide in wall-clock, narrow in CPU	Function waits — lock or syscall	Capture off-CPU or mutex profile
Interpreter frame where JIT should be	JIT deoptimised — fell back to baseline	Stabilise object shapes / types

Bea and Sven: one frame, two readings

Bea · Browser finds processOrder at 35% CPU and wants to rewrite the loop. Sven · Origin server looks closer: most of that 35% is in runtime.scanobject (the GC) called from inside the loop. The loop allocates a lot. The fix is sync.Pool, not a new algorithm.

The flame graph showed the symptom. The cause was one level deeper.

A scenario: regex on every request

A search endpoint shows regex.test as a wide leaf. Two engineers want to switch regex engines. A third looks at the parent: the regex is compiled on every request because the pattern is built from a template string. The fix is to compile once at startup. The leaf pointed at the right area; the bug was in the caller’s pattern, not in the leaf itself.

▸Why this works

The leaf is the dashboard warning light: it says “something is wrong here.” The fix may be inside the function (rewrite the algorithm), in the caller (don’t call so often), in a callee (real cost one level down), or in the surrounding context (allocate less, lock less, fewer syscalls). Senior engineers read the whole neighbourhood, not just the leaf.

Quiz

A flame graph shows a wide leaf frame. What is the FIRST question to ask?

Quiz

Why is 'wide frame = bottleneck' an incomplete reading of a flame graph?

Order the steps

Order the steps of attacking a hot path the senior way:

1 Open the profile and find the widest leaf frame by self-time
2 Read the parent chain — is the leaf called from one path or many?
3 Classify the work: CPU instructions, allocation, cache miss, lock wait, syscall, or JIT deopt
4 Form one hypothesis about the fix that matches the classification
5 Apply ONLY that change in isolation
6 Capture a new profile under the same load and diff against baseline
7 Verify both the local hotspot shrank AND the headline metric improved

Complete the analogy

Fill in the blank: a wide flame-graph frame names the _______; the cause may sit one level above (in the caller), one level below (in a callee), or in what the function is actually doing.

Hot path = a few functions, executed very frequently × per-call cost — a small slice of the codebase dominates total runtime.

Recall before you leave

01
In one paragraph: why is naming the hot function not enough — what else do you need to read from the profile before you can fix it?
02
Give two concrete examples where the fix is in the caller rather than in the wide leaf itself.

Recap

A hot path is the sequence of calls where the program spends most of its time. The flame graph’s wide leaf names the function, but the cause may be in the caller (too many calls), a callee (real cost one level down), or in what the function does (CPU work vs allocation vs waiting). The diagnosis question — which of the five shapes is this hotspot — must precede the fix choice. The next lessons cover each of the five shapes and their fix families. Now when you see a wide frame, your first move is a question, not a rewrite.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 5 done

Connected lessons

builds on

Reading flame graphs: shapes, per-language profilers, and the 60-second scanmiddle

unlocks

deepens into

appears again in162

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.

Apply this

Put this lesson to work on a real build.

Collaborative cursorsShow every connected user's live cursor and selection in a shared document, conflict-free, over WebSocket.Distributed rate limiterBuild a token-bucket limiter that holds across many app instances by keeping the counter in Redis, not in process memory.