awesome-everything RU
↑ Back to the climb

Observability

Profile types: CPU, memory, off-CPU, mutex — which one to reach for

Crux CPU profiling only sees code that is running; off-CPU, block, and mutex profiles cover the 96% of a request that can be spent waiting — you need all four to diagnose any slow service.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at middle altitude — in the sky
◷ 15 min

A span takes 500 ms. You open the CPU profile — the service used 20 ms of CPU. Where did the other 480 ms go? CPU profiling is blind to waiting. If you stop there, you will never find the answer.

CPU profiles: what they see and what they miss

CPU profiling samples the call stack while the thread is on the CPU — running instructions. A request that spends 20 ms computing and 480 ms waiting on a database query will show 20 ms in the CPU profile and leave 480 ms invisible.

This is the most important constraint in profiling: CPU profiles only see functions that are consuming the processor. Anything the program waits for — I/O, network, locks, scheduling — is off-CPU and invisible to a CPU profiler.

Memory and allocation profiles

Heap profilers sample allocations, not CPU. Go’s heap profile samples one allocation per ~512 KiB (Poisson distributed) and records the stack at each sample. The result is a flame graph where width is allocated bytes, not CPU time. This finds memory hotspots: a function allocating 100 MB/s shows up wide.

Memory leak detection with heap profiles:

  1. Take a heap profile.
  2. Wait 30-60 minutes.
  3. Take another heap profile.
  4. Diff them (go tool pprof -base baseline.heap current.heap).
  5. Functions whose allocation grew are leaking.

Allocation profiles capture short-lived allocations that GC reclaims; heap profiles snapshot live memory. Both are useful. JVM equivalents: async-profiler with -e alloc, JFR allocation events. Python: tracemalloc, memray.

Off-CPU profiles

Brendan Gregg’s 2013 work on off-CPU analysis identified the gap: CPU profiles miss everything a process waits for. eBPF implementations hook into the kernel scheduler’s switch events. When the scheduler removes a thread from the CPU (it blocks on I/O, sleeps, waits on a lock), the kernel captures the thread’s stack. That stack represents where the wait started. When the thread comes back, elapsed time is attributed to that stack.

The off-CPU flame graph shows wait time the same way a CPU flame graph shows running time. For an I/O-bound service, the off-CPU profile is the only profile that explains anything — the CPU profile just says “service was idle.”

Block and mutex profiles

Block profile (Go: runtime.SetBlockProfileRate): time spent waiting on synchronisation primitives — channels, condition variables, WaitGroups. More focused than off-CPU because it targets language-level synchronisation.

Mutex profile (runtime.SetMutexProfileFraction): lock contention specifically. Reports which code held a lock while others were waiting for it, attributed at unlock time.

Profile typeWidth measuresWhen to reach for it
CPUCPU time consumedCPU usage is high, slow response
Heap / AllocationBytes allocatedGC pressure, OOM, memory growth
Off-CPUTime spent waiting (all causes)Slow request but low CPU usage
BlockTime on sync primitivesGo goroutine contention suspected
MutexLock contention timeHigh lock contention suspected

Choosing the right profile from the CPU/wall-time ratio

The diagnostic shortcut: look at CPU time vs wall-clock time for the slow request.

  • CPU/wall ≈ 100%: computation bottleneck — CPU profile.
  • CPU/wall < 30%: the bottleneck is off-CPU — off-CPU / block profile or trace spans.
  • Memory growing steadily: heap / allocation profile.
  • Threads contending on a lock: mutex profile.

A Java service GC-thrashing is a classic allocation-profile case. The symptom is high heap allocation rate with frequent old-gen GC. The allocation flame graph will show the widest frame as the function allocating at highest rate — often string concatenation in logging code that is not using parameterized formatting.

Quiz

A request spends 50 ms on CPU and 450 ms waiting on a DB query. Which profile type would show you the DB wait?

Quiz

A Java service is OOMing on certain endpoints. Its CPU profile looks normal. Which profile type to reach for?

Recall before you leave
  1. 01
    Why does Go's heap profiler sample one allocation per ~512 KiB instead of recording every allocation?
  2. 02
    Explain why a CPU flame graph is not enough to diagnose an I/O-bound service.
  3. 03
    What is the procedure to detect a memory leak with heap profiles?
Recap

Four profile types cover the full request lifecycle: CPU (what is running), heap or allocation (what is being allocated), off-CPU (what is waiting on I/O or scheduling), and block or mutex (what is waiting on locks). CPU profiling only sees code that is actively on the processor — a request waiting 480 ms on a DB query will show only the 20 ms of compute in a CPU profile. The CPU/wall-time ratio is the diagnostic signal: under 30% means the bottleneck is off-CPU. Go’s heap profiler samples at 1-per-512 KiB to make always-on memory profiling affordable. Combining all four types gives a complete picture; using only CPU profiling in an I/O-bound service guarantees you find the wrong bottleneck.

Connected lessons
appears again in167
Continue the climb ↑Continuous profiling: always-on flame graphs with eBPF and trace-id correlation
shortcuts expand
search
K
prev piece
k
next piece
j
cycle tier
t
this menu
?
sources3
expand
  1. 01
  2. 02
  3. 03

Trademarks belong to their respective owners. Editorial reference only.