Observability OBS · 07 · 10

Profiling: from SLO to flame graph

Hands-on project — stand up continuous profiling on a small polyglot service, drill a planted slow path from SLO to flame graph in under 30 seconds, and catch a deploy regression with a differential profile.

OBS Senior ◷ 240 min

Level

FoundationsJuniorMiddleSenior

Reading about the SLO-to-profile workflow is not the same as running it under a pager. Stand up continuous profiling on a small service, plant a realistic slow path, and prove you can go from a burning SLO to the exact hot function in under 30 seconds — then catch a deploy regression before it reaches users.

Goal

Turn the unit’s mental model into a reproducible engineering loop: instrument continuous + trace-correlated profiling, pick the right profile type for each symptom, drill from alert to flame graph, and gate deploys with a differential profile — with evidence at every step.

Project

0 of 7

Objective

Stand up continuous profiling with trace-id correlation on a small service (your own or a starter), plant a compute hotspot and an off-CPU wait, and prove a sub-30-second SLO-to-flame-graph drill plus a deploy-regression catch — all with measured evidence, not assertion.

Requirements

Acceptance criteria

A measured overhead figure for the running profiler (target ~2-5%), with the sample rate and symbolization mode you used noted alongside it.
Two side-by-side profiles for the same I/O-bound request: a CPU profile showing near-zero on-CPU time and an off-CPU/block profile attributing the wait to the blocking call, with the CPU/wall ratio stated.
A timed SLO-to-flame-graph drill recording (or annotated screenshots) showing alert -> trace-id-filtered flame graph -> named hot function, with the elapsed time written down.
A differential flame graph (or compare-versions view) that clearly shows the regressed frame as new/grown after the deploy, plus a one-paragraph write-up naming the regressed function and how the diff localised it where a latency chart could not.

Senior stretch

Add a CI gate: on each deploy, capture a 5-minute canary profile, diff it against the previous version, post the top-5 changed functions as a PR comment, and fail the build if a new function enters the top 5 by self-CPU or any top-5 frame grows more than 15%.
Add a second runtime (e.g. a JVM or Python service) under the same eBPF agent and document the symbolization differences — clean Go frames vs [unknown]/partial JVM/Python frames — then fix one by adding a language-aware profiler.
Demonstrate a sampling blind spot: add a function called very frequently for a sub-interval duration, show it is under-represented at 100 Hz, and recover it by raising the sample rate or using targeted instrumentation.
Write a one-page on-call runbook: from a pager, which profile to open per symptom (CPU vs off-CPU vs heap), how to filter by trace-id, how to read width vs the alphabetical x-axis, and the differential-profile deploy check.

Recap

This is the loop you will run in every real profiling incident: keep continuous, trace-correlated profiling always on at 2-5%, pick the profile type the symptom demands (CPU for compute, off-CPU for waiting, heap for memory), drill from SLO to a trace-id-filtered flame graph in seconds, catch regressions with a differential profile at deploy time, and treat every profile as a security-sensitive artefact. Doing it once on a small service makes the production version muscle memory.

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.