awesome-everything RU
↑ Back to the climb

Observability

Sampling vs instrumentation profiling: why 99 Hz wins in production

Crux Instrumentation wraps every function call for exact data but collapses under production load; sampling captures one stack every 10 ms at bounded overhead, making it the only viable production profiling strategy.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at middle altitude — in the sky
◷ 13 min

Your staging server runs under profiling with no problem. The same profiler in production raises latency by 40%. The profiler didn’t change — the load did. Understanding why is the difference between a tool you can run always and one you can only use in emergencies.

Two ways to profile

Instrumentation profiling works by wrapping every function entry and exit with timing code. The runtime measures exactly how many times each function was called and exactly how long it ran. The cost is proportional to function call frequency — at 10 million calls per second, you pay 10 million timing measurements per second. On a dev machine with light load this is fine; under production traffic it can add 20-100% overhead. Instrumentation is great for small, isolated benchmarks; it is not viable for always-on production profiling.

Sampling profiling works differently: an external clock fires N times per second and captures the current call stack. The cost is exactly N stack walks per second regardless of how many function calls happen between them. At 100 Hz that is 100 stack walks per second — each costing ~5-20 microseconds — giving an overhead of roughly 0.5-2% on a busy CPU. Modern profilers (Go pprof, Linux perf, Java async-profiler, py-spy, eBPF-based profilers) are all sampling-based. The cost is bounded.

The key statistical property

At sample rate R Hz, a function that uses X% of CPU will appear in X% of samples — regardless of how many times it was called. The statistic reported is “fraction of CPU time,” not “call count.” This is exactly the right metric for finding bottlenecks: you want to know which function is on the CPU most often, not which function is called most often.

The consequence: sampling at 100 Hz is sufficient to find functions consuming more than a few percent of CPU. A function using 10% of CPU will hit ~10 samples per second. A 30-second profile gives 300 samples of that function — plenty for a reliable estimate.

Sample rate choices in the wild

Linux perf defaults to 99 Hz — not 100 Hz. This is deliberate: 100 Hz can accidentally synchronise with periodic kernel timers and produce misleading results. 99 Hz avoids the resonance. Go pprof defaults to 100 Hz. eBPF-based continuous profilers typically use 19 or 49 Hz to minimise impact on very busy containers.

ApproachCost modelTypical overheadUse case
InstrumentationPer function call20-100%Dev benchmarks, targeted micro-benchmarks
Sampling (Go pprof)Per sample (100/s)0.5-2%Production on-demand or continuous
Sampling (eBPF)Per sample (19-49/s)1-3%Continuous, polyglot fleets
Continuous profiler (full)Sampling + batching + shipping2-5%Always-on production fleet
Sample rates and overhead
Go pprof default sample rate
100 Hz
Linux perf default rate
99 Hz (avoids timer sync)
eBPF profiler typical rate
19 or 49 Hz
Stack walk cost per sample
~5-20 μs
Overhead at 100 Hz, busy CPU
~0.5-2%
Continuous profiler (full pipeline)
2-5% CPU

When the claimed 2-5% overhead becomes 12%

A continuous profiler claims 2% overhead; you measure 12% in production. The most common causes:

  • Sample rate was accidentally set 10x higher than default.
  • Stack walking is expensive because the language runtime uses JIT-compiled code requiring symbol resolution at sample time (Python, JVM without native hooks).
  • The agent is doing heavy symbol decompression or compression on the application thread instead of async.
  • Average stack depth is 120+ frames (deep middleware chains) — each sample costs proportionally more to walk.

Always check profiler config before assuming the tool is misbehaving.

Quiz

A service makes 5 million function calls per second. An instrumentation profiler adds 1 μs per call. A sampling profiler runs at 100 Hz and costs 10 μs per sample. Which adds less overhead?

Quiz

Linux perf defaults to 99 Hz, not 100 Hz. Why?

Recall before you leave
  1. 01
    Why is instrumentation profiling impractical for always-on production profiling?
  2. 02
    A sampling profiler reports a function at 8% of samples. What does this mean in terms of CPU usage?
  3. 03
    Why does Linux perf use 99 Hz instead of 100 Hz?
Recap

Instrumentation profiling wraps every function call, giving exact data but collapsing under production load as overhead scales with call frequency. Sampling profiling fires a clock N times per second and captures the current stack — bounded overhead regardless of how many functions are called between samples. At 100 Hz and ~10 μs per stack walk, the overhead is 0.5-2%; a full continuous profiler pipeline (with batching and shipping) adds to 2-5%. The statistical property that makes sampling powerful: a function consuming X% of CPU appears in X% of samples, so you get CPU share estimates without touching every call. Linux perf defaults to 99 Hz to avoid synchronising with periodic kernel timers — a subtle correctness detail every senior engineer should know.

Connected lessons
appears again in167
Continue the climb ↑Profile types: CPU, memory, off-CPU, mutex — which one to reach for
shortcuts expand
search
K
prev piece
k
next piece
j
cycle tier
t
this menu
?
sources3
expand
  1. 01
  2. 02
  3. 03

Trademarks belong to their respective owners. Editorial reference only.