awesome-everything RU
↑ Back to the climb

Observability

Profiling: from SLO to flame graph

Crux Hands-on project — stand up continuous profiling on a small polyglot service, drill a planted slow path from SLO to flame graph in under 30 seconds, and catch a deploy regression with a differential profile.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at senior altitude — in orbit
◷ 240 min

Reading about the SLO-to-profile workflow is not the same as running it under a pager. Stand up continuous profiling on a small service, plant a realistic slow path, and prove you can go from a burning SLO to the exact hot function in under 30 seconds — then catch a deploy regression before it reaches users.

Goal

Turn the unit’s mental model into a reproducible engineering loop: instrument continuous + trace-correlated profiling, pick the right profile type for each symptom, drill from alert to flame graph, and gate deploys with a differential profile — with evidence at every step.

Project
0 of 7
Objective

Stand up continuous profiling with trace-id correlation on a small service (your own or a starter), plant a compute hotspot and an off-CPU wait, and prove a sub-30-second SLO-to-flame-graph drill plus a deploy-regression catch — all with measured evidence, not assertion.

Requirements
Acceptance criteria
  • A measured overhead figure for the running profiler (target ~2-5%), with the sample rate and symbolization mode you used noted alongside it.
  • Two side-by-side profiles for the same I/O-bound request: a CPU profile showing near-zero on-CPU time and an off-CPU/block profile attributing the wait to the blocking call, with the CPU/wall ratio stated.
  • A timed SLO-to-flame-graph drill recording (or annotated screenshots) showing alert -> trace-id-filtered flame graph -> named hot function, with the elapsed time written down.
  • A differential flame graph (or compare-versions view) that clearly shows the regressed frame as new/grown after the deploy, plus a one-paragraph write-up naming the regressed function and how the diff localised it where a latency chart could not.
Senior stretch
  • Add a CI gate: on each deploy, capture a 5-minute canary profile, diff it against the previous version, post the top-5 changed functions as a PR comment, and fail the build if a new function enters the top 5 by self-CPU or any top-5 frame grows more than 15%.
  • Add a second runtime (e.g. a JVM or Python service) under the same eBPF agent and document the symbolization differences — clean Go frames vs [unknown]/partial JVM/Python frames — then fix one by adding a language-aware profiler.
  • Demonstrate a sampling blind spot: add a function called very frequently for a sub-interval duration, show it is under-represented at 100 Hz, and recover it by raising the sample rate or using targeted instrumentation.
  • Write a one-page on-call runbook: from a pager, which profile to open per symptom (CPU vs off-CPU vs heap), how to filter by trace-id, how to read width vs the alphabetical x-axis, and the differential-profile deploy check.
Recap

This is the loop you will run in every real profiling incident: keep continuous, trace-correlated profiling always on at 2-5%, pick the profile type the symptom demands (CPU for compute, off-CPU for waiting, heap for memory), drill from SLO to a trace-id-filtered flame graph in seconds, catch regressions with a differential profile at deploy time, and treat every profile as a security-sensitive artefact. Doing it once on a small service makes the production version muscle memory.

Continue the climb ↑The debugging funnel: SLO → RED → trace → profile
shortcuts expand
search
K
prev piece
k
next piece
j
cycle tier
t
this menu
?
sources3
expand
  1. 01
  2. 02
  3. 03

Trademarks belong to their respective owners. Editorial reference only.