awesome-everything RU
↑ Back to the climb

Observability

How flame graphs are built from samples, and the production workflows that use them

Crux Identical stacks collapse, alphabetical sorting groups parents with children, and width is sample count — once you know the algorithm you never misread the x-axis as time again. Profiling integrates with SLO burn, deploy diff, and capacity planning.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at middle altitude — in the sky
◷ 15 min

A senior engineer at a conference asks: “which function runs before this one in the flame graph?” They point to two adjacent frames at the same level. The answer is: neither — the x-axis is alphabetical. If you do not know this, you will waste hours on the wrong hypothesis.

How the flame graph is built from samples

Each sample is a list of function names from leaf (currently executing) to root (program entry). After a profiling window:

  1. All samples are collected.
  2. Identical stacks collapse into one column — their counts add up, making the column wider.
  3. For rendering, all unique stacks are sorted alphabetically by root function — so the same parent groups its children next to each other.
  4. For each level, rectangles are emitted with width proportional to the count.

The result reads top-to-bottom: pick a leaf frame (top), check its width, walk down to see what calls it.

The most expensive misread

The x-axis position is alphabetical order of the full stack path — not time, not call order. A frame appearing to the left of another at the same level tells you nothing about which one ran first. It tells you only that its stack path comes earlier alphabetically.

This is the most common mistake engineers make when first reading flame graphs. If you see two wide frames side by side and think “A runs, then B runs,” you are drawing a false conclusion. Both could be called by the same parent at different points; both could be from unrelated code paths; both could be parallelised.

If you need time order, the right tool is a trace view (Gantt-style span timeline). Flame graphs answer “what” is hot; traces answer “when” in the request each step ran.

Profile workflows in production

SLO burn drilldown: SLO alert fires → click link → time range pre-filtered to burn window → CPU + off-CPU flame graphs side by side → identify changed function → blame the deploy. Under 90 seconds from pager to git blame for any incident where the bug ran on CPU.

Deploy regression detection: Capture a profile on both the pre-deploy and post-deploy version under comparable load. Diff them: a differential flame graph colours frames by relative change — red for frames that grew, blue for frames that shrank, white for unchanged. New wide red frames that were absent before the deploy are the regression. Production-grade continuous-profile backends (Pyroscope, Datadog) bake this in: “compare versions” picks two commits or time windows and renders the diff.

Profile-as-data: queries beyond flame graphs:

Profiles are time-series of stack samples — backends increasingly let engineers query them like a database:

  • “Top 10 functions by self-CPU across all services for the past hour” → capacity planning.
  • “Find all profiles where function X appears in the top 5” → impact assessment before deleting a slow library.
  • “Group flame graphs by Kubernetes node” → spot hot nodes.
  • “Alert when a new function appears in the top 5 after a deploy” → automated regression detection.
WorkflowTriggerActionOutput
SLO burn drillAlertFilter profile to burn windowHot function identified <90 s
Deploy regressionDeployDiff pre vs post profilesNew hot frame highlighted red
Capacity planningQuarterlyTop-N functions fleet-wideOptimisation candidates ranked
Trace-id drillSlow span in traceFilter profile by trace-idFlame graph for that request
Why this works

Why differential profiles catch what dashboards miss. A standard latency dashboard shows p99 going up after a deploy. But is the new code path 5% slower or 50% slower? And which function changed? The dashboard cannot say. A differential profile answers both: the width of the red frames is the severity; the frame name and its parent are the location. Teams that do automated profile diffs on every deploy catch regressions in minutes rather than after a customer-reported incident.

Quiz

An engineer reads a flame graph and concludes function A runs before function B because A is to the left of B at the same level. What is the misunderstanding?

Quiz

A deploy just went out. The team wants to know if it regressed CPU performance. Which profiling workflow is most direct?

Recall before you leave
  1. 01
    Explain why the x-axis of a flame graph is alphabetical instead of temporal, and what tool to use if you need time order.
  2. 02
    What is a differential flame graph and what problem does it solve?
  3. 03
    Name three ways to query profiles as data (beyond just viewing flame graphs).
Recap

Flame graphs are built by aggregating identical stacks, sorting them alphabetically by root function, and rendering rectangles whose width is proportional to sample count. The x-axis encodes alphabetical grouping — never time — which is why left-right position between frames tells you nothing about execution order. Use a trace timeline when you need time order. Differential flame graphs overlay two profiles (before and after deploy) and colour frames by change; they are the most direct way to catch CPU regressions at deploy time. Profiles are time-series data: modern backends support querying them across services, diffing by version, grouping by node, and alerting on shape changes — turning profiling from a debugging tool into a continuous quality signal.

Connected lessons
appears again in167
Continue the climb ↑Linux perf, eBPF internals, PGO, and the limits of sampling
shortcuts expand
search
K
prev piece
k
next piece
j
cycle tier
t
this menu
?
sources3
expand
  1. 01
  2. 02
  3. 03

Trademarks belong to their respective owners. Editorial reference only.