awesome-everything RU
↑ Back to the climb

Observability

Continuous profiling: always-on flame graphs with eBPF and trace-id correlation

Crux Continuous profiling runs always at 2-5% overhead — when an SLO burns the flame graph is already saved. eBPF captures polyglot stacks without language hooks; trace-id correlation drills any slow span to the exact function in 30 seconds.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at middle altitude — in the sky
◷ 16 min

An SLO burns at 2:14 AM. The pager wakes the on-call. Traditional profiling requires them to SSH in, reproduce the issue under load, capture a profile, and parse it — at 2:14 AM, under pressure. Continuous profiling already has the flame graph for 2:14 AM waiting in a dashboard.

Traditional vs continuous profiling

Traditional (on-demand) profiling: SSH in, run perf record or hit /debug/pprof/profile, gather data, analyse, leave. Cost: only during capture. Limitation: requires you to be present and the issue to be actively occurring. For intermittent issues or incidents that self-resolve, you lose the evidence.

Continuous profiling: an agent runs on every host or container, sampling 100 times per second continuously, batching and shipping compressed profiles every 10-15 seconds to a backend. The backend stores them indexed by service, host, and time range. Storage: ~50-200 MB/day per service, ~1.5-6 GB/month. Overhead: 2-5% CPU. The critical win: when an SLO burns, the profile of the burning minutes is already saved.

eBPF: language-agnostic profiling

Traditional language profilers (Go pprof, JFR, async-profiler, py-spy) require language-specific support — the runtime must expose stack walking APIs. For Python, Ruby, and older PHP interpreters, this requires hooks the runtime team must provide.

eBPF profilers (Pyroscope eBPF mode, Parca) read stacks from the kernel side: the kernel’s perf_event_open syscall plus a BPF-attached probe captures user-space stacks at sample time. This means:

  • Works for any language, any binary, with no application code change.
  • One agent covers all services on the host — Go, Java, Node, Python.
  • Catches third-party library overhead that language-specific profilers might miss.

The catch: symbol resolution. The kernel sees memory addresses; the profiler maps them back to function names using debug info (DWARF, BTF, JIT-emitted symbol files for V8 or JVM). Most production eBPF profilers handle this; occasional [unknown] frames appear when DWARF is stripped or JIT code is too volatile.

Cross-language profiler coverage

LanguageNative profilereBPF coverage
Gopprof (built-in)Full — frame pointers standard
JavaJFR, async-profilerPartial — needs JIT symbol maps
Pythonpy-spy, cProfileLimited — interpreter frames opaque
Node.js—prof, clinic.js, 0xPartial — V8 needs —perf-prof flag
Rust / C / C++perf, pprof-rsFull — compiled with frame pointers

Trace-id correlation: from slow span to flame graph in 30 seconds

Each profile sample can carry the trace-id of the request being processed at the moment of sampling — stored in thread-local context. When a slow trace appears in the trace view, the matching profile samples (only those carrying that trace-id) can be filtered out and rendered as a flame graph for that specific request.

This is the bridge between “where did time go in the request” (trace span) and “what code ate the CPU” (profile). The workflow:

  1. SLO alert fires — p99 latency over budget.
  2. Open trace view — find slow spans, note trace-id.
  3. Open profile view filtered by trace-id — flame graph for that exact request appears.
  4. Widest frame is the function to fix.
  5. Done in under 60 seconds.

OpenTelemetry’s profile signal (stabilising in 2025-2026) standardises this linkage. Production-grade observability platforms (Datadog, Grafana with Pyroscope, Honeycomb) ship this drilldown out of the box.

Profile storage economics

Continuous profiling: cost and storage
Profile size per 30-second capture
~50-500 KB compressed
Profiles per hour (15-s intervals)
240
Storage per service per day
~50-200 MB
Storage per service per month
~1.5-6 GB
Fleet of 200 services
300 GB - 1.2 TB/month
Object storage cost
~$0.02/GB ≈ $25/month
Pyroscope 2.0 storage improvement
~3x vs v1 via symbol deduplication

Pyroscope 2.0 (released April 2026) cut storage 3x by deduplicating symbols across profiles from the same binary — function names and source paths are shared in a common symbol table instead of repeated in every profile.

Retention strategy: 7 days full-fidelity for active debugging, 30 days downsampled (one profile per 5 minutes), 90 days for long-term trend analysis. Budget-conscious teams cap at 14 days fine + 60 days coarse.

Quiz

An eBPF profiler shows many '[unknown]' frames for a Python service. What is the cause?

Quiz

What does trace-id correlation in continuous profiling enable that a standalone CPU profile cannot provide?

Recall before you leave
  1. 01
    What is the critical operational advantage of continuous profiling over on-demand profiling during incidents?
  2. 02
    Why does an eBPF profiler work for Go and Rust but produce [unknown] frames for Python?
  3. 03
    How does trace-id correlation work mechanically?
Recap

Continuous profiling agents run on every host, sample stacks 100 times per second, and ship compressed profiles every 10-15 seconds to a backend like Pyroscope or Parca. At 2-5% overhead, this is affordable enough to leave always-on. eBPF agents capture stacks from the kernel side without language-specific hooks — one agent per host covers Go, Java, Node, and Python, though interpreter-based runtimes need extra support for accurate symbol resolution. Trace-id labels on every sample enable a flame graph filtered to one specific request in under 30 seconds. Pyroscope 2.0’s symbol deduplication cut storage costs 3x, making per-service monthly storage under 10 GB. The SLO → trace → profile workflow reduces MTTR for any CPU-bound incident to under 90 seconds.

Connected lessons
appears again in167
Continue the climb ↑How flame graphs are built from samples, and the production workflows that use them
shortcuts expand
search
K
prev piece
k
next piece
j
cycle tier
t
this menu
?
sources4
expand
  1. 01
  2. 02
  3. 03
  4. 04

Trademarks belong to their respective owners. Editorial reference only.