Observability OBS · 07 · 05

How flame graphs are built from samples, and the production workflows that use them

Identical stacks collapse, alphabetical sorting groups parents with children, and width is sample count — once you know the algorithm you never misread the x-axis as time again. Profiling integrates with SLO burn, deploy diff, and capacity planning.

OBS Middle ◷ 15 min

Level

FoundationsJuniorMiddleSenior

A senior engineer at a conference asks: “which function runs before this one in the flame graph?” They point to two adjacent frames at the same level. The answer is: neither — the x-axis is alphabetical. If you do not know this, you will waste hours on the wrong hypothesis.

How the flame graph is built from samples

Each sample is a list of function names from leaf (currently executing) to root (program entry). After a profiling window:

All samples are collected.
Identical stacks collapse into one column — their counts add up, making the column wider.
For rendering, all unique stacks are sorted alphabetically by root function — so the same parent groups its children next to each other.
For each level, rectangles are emitted with width proportional to the count.

Together these four steps explain why the x-axis is alphabetical and not time: the sort in step 3 groups related paths, not sequential events. Without this understanding, every frame you see to the left of another will tempt you into a false “this ran first” conclusion.

The result reads top-to-bottom: pick a leaf frame (top), check its width, walk down to see what calls it.

The most expensive misread

The x-axis position is alphabetical order of the full stack path — not time, not call order. A frame appearing to the left of another at the same level tells you nothing about which one ran first. It tells you only that its stack path comes earlier alphabetically.

This is the most common mistake engineers make when first reading flame graphs. If you see two wide frames side by side and think “A runs, then B runs,” you are drawing a false conclusion. Both could be called by the same parent at different points; both could be from unrelated code paths; both could be parallelised.

If you need time order, the right tool is a trace view (Gantt-style span timeline). Flame graphs answer “what” is hot; traces answer “when” in the request each step ran.

Same stacks, opposite x-axis. Left-right position in a flame graph is alphabetical, so it never means call order — that is the most expensive misread. When you actually need execution order, switch to a trace timeline.

So the read is purely vertical: scan for the widest plateau, because width is sample count and a wide frame is where the CPU actually spent its time — that is your optimisation target. A tall but narrow tower is a deep call chain that is rarely on-CPU; it is a red herring, no matter how dramatic its height looks. The flat top of a wide stack is the leaf actually doing the work, since nothing wider sits above it.

Read vertically, not left-to-right. Width is sample count: the wide compress -> deflate plateau is where CPU time is actually spent (optimise the flat-top leaf). The tall background-flush tower is deep but thin — four frames high, almost no samples — a red herring.

Profile workflows in production

SLO burn drilldown: SLO alert fires → click link → time range pre-filtered to burn window → CPU + off-CPU flame graphs side by side → identify changed function → blame the deploy. Under 90 seconds from pager to git blame for any incident where the bug ran on CPU.

Deploy regression detection: Capture a profile on both the pre-deploy and post-deploy version under comparable load. Diff them: a differential flame graph colours frames by relative change — red for frames that grew, blue for frames that shrank, white for unchanged. New wide red frames that were absent before the deploy are the regression. Production-grade continuous-profile backends (Pyroscope, Datadog) bake this in: “compare versions” picks two commits or time windows and renders the diff.

Profile-as-data: queries beyond flame graphs:

Profiles are time-series of stack samples — backends increasingly let engineers query them like a database:

“Top 10 functions by self-CPU across all services for the past hour” → capacity planning.
“Find all profiles where function X appears in the top 5” → impact assessment before deleting a slow library.
“Group flame graphs by Kubernetes node” → spot hot nodes.
“Alert when a new function appears in the top 5 after a deploy” → automated regression detection.

Workflow	Trigger	Action	Output
SLO burn drill	Alert	Filter profile to burn window	Hot function identified <90 s
Deploy regression	Deploy	Diff pre vs post profiles	New hot frame highlighted red
Capacity planning	Quarterly	Top-N functions fleet-wide	Optimisation candidates ranked
Trace-id drill	Slow span in trace	Filter profile by trace-id	Flame graph for that request

▸Why this works

Why differential profiles catch what dashboards miss. A standard latency dashboard shows p99 going up after a deploy. But is the new code path 5% slower or 50% slower? And which function changed? The dashboard cannot say. A differential profile answers both: the width of the red frames is the severity; the frame name and its parent are the location. Teams that do automated profile diffs on every deploy catch regressions in minutes rather than after a customer-reported incident.

Quiz

An engineer reads a flame graph and concludes function A runs before function B because A is to the left of B at the same level. What is the misunderstanding?

Quiz

A deploy just went out. The team wants to know if it regressed CPU performance. Which profiling workflow is most direct?

Recall before you leave

01
Explain why the x-axis of a flame graph is alphabetical instead of temporal, and what tool to use if you need time order.
02
What is a differential flame graph and what problem does it solve?
03
Name three ways to query profiles as data (beyond just viewing flame graphs).

Recap

Flame graphs are built by aggregating identical stacks, sorting them alphabetically by root function, and rendering rectangles whose width is proportional to sample count. The x-axis encodes alphabetical grouping — never time — which is why left-right position between frames tells you nothing about execution order. Use a trace timeline when you need time order. Differential flame graphs overlay two profiles (before and after deploy) and colour frames by change; they are the most direct way to catch CPU regressions at deploy time. Profiles are time-series data: modern backends support querying them across services, diffing by version, grouping by node, and alerting on shape changes — turning profiling from a debugging tool into a continuous quality signal. Now when you see two adjacent frames on the same row and wonder which ran first, you know the answer is neither — and you know to open a trace view if execution order is what you actually need.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 4 done

Connected lessons

builds on

Continuous profiling: always-on flame graphs with eBPF and trace-id correlationmiddle

unlocks

Linux perf, eBPF internals, PGO, and the limits of samplingsenior

deepens into

appears again in170

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.

Apply this

Put this lesson to work on a real build.

Virtual data gridRender and smooth-scroll 100k rows at 60fps with windowing/virtualization, sticky headers, and full keyboard navigation — no library, just math.