awesome-everything RU
↑ Back to the climb

Performance

Classify and fix: matching bottleneck families to remedies

Crux Each bottleneck belongs to one of eight families. Naming the family takes seconds from the profile; picking the wrong family wastes days. Amdahl sets the ceiling on any fix before you write a line of code.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at middle altitude — in the sky
◷ 12 min

Two engineers look at the same flame graph. One sees “GC is hot” and rewrites the allocator. The other sees “GC is hot because of a bulk upstream payload” and rolls back one deploy. The first spends a week; the second spends 35 minutes. The difference is classification — naming the bottleneck family before touching the code.

The eight bottleneck families

The seven pieces of this chapter map to a small vocabulary of bottleneck families. When you classify a hotspot correctly, the fix set follows immediately.

FamilyProfile signalChapter piecePrimary fix
CPU-algorithmicHigh self-time, low off-CPU02 hot-pathsBetter algorithm or data structure
Allocation-boundmallocgc / scanobject high04 GCReduce allocations, reuse, pools
Cache-boundHigh LLC-miss perf counters03 cache-vs-bigoData layout, access pattern
Lock-boundHigh off-CPU on sync wait02 hot-pathsLock-free structures, sharding
I/O-bound (N+1)Many short spans in trace05 N+1Batch queries, eager-load
Syscall-boundHigh syscall overhead in profile06 batchingBatch writes, vectorised I/O
JIT-deoptDeopt frames in JS/JVM profile02 hot-pathsMonomorphic shapes, typed arrays
Bundle-boundRUM shows parse/compile time07 bundle-budgetsCode-split, lazy-load, tree-shake

Amdahl before you code

Amdahl’s law: if fraction f of execution time is spent in the hotspot, the maximum speedup from fixing it is 1 / (1 - f).

If the hotspot is 40% of CPU (f = 0.4), the best possible speedup is 1 / 0.6 = 1.67x. If your SLO needs a 3x improvement, this is the wrong hotspot. Return to step 2 and profile again.

This calculation takes 30 seconds. It prevents weeks of work on the wrong target.

Example: A checkout service has p99 = 800 ms and a target of under 200 ms. The profile shows:

  • json.Marshal: 28% CPU
  • runtime.scanobject: 22% CPU
  • pgx.Query: 18% CPU (via trace spans — actually I/O)

Amdahl on Marshal alone: 1 / (1 - 0.28) = 1.39x. That brings 800 ms to ~575 ms — still above target. Amdahl on Marshal + scanobject combined (50%): 1 / 0.5 = 2x. That brings to ~400 ms — still above target. Adding the I/O path (68% total): 1 / 0.32 = 3.1x. That brings to ~258 ms — close.

Real root cause: all three hot frames share an upstream that suddenly returns 10x more data. Fix the upstream; all three frames shrink together. Total gain: 6x. The Amdahl calculation made it clear that fixing any single family in isolation was insufficient.

Cross-layer compounding

Real bottlenecks rarely sit in one layer. A slow checkout page might be 30% JS bundle parse on the client, 20% N+1 queries in the backend, 15% GC allocation pressure, and 35% backend compute. Fixing any one layer in isolation gives only Amdahl-bounded wins on the total.

The compound effect is the real win. Two engineers working in parallel — one on bundle, one on queries and GC — each delivering a 2x improvement, give a combined 4x on the headline metric. Without cross-layer reasoning, the conversation stalls: “we already optimised our part.”

The chapter’s classification vocabulary lets engineers from frontend, backend, and infra describe their bottleneck in a way the other layers understand. Without the shared vocabulary, “the backend is slow” and “no, the frontend is slow” is the whole conversation.

Why this works

The fix-family table above is not a ranking. Cache-bound bottlenecks in tight compute loops (piece 03) can give 10x improvements on CPU workloads. Allocation-bound regressions (piece 04) are often the sneakiest because GC pauses add tail latency variance that Amdahl underestimates. When classifying, check whether the bottleneck contributes to p50 (average user) or p99 (worst-case user); the fix priority differs.

Classification pays off
Time to classify from a flame graph
30–90 seconds
Time lost fixing the wrong family
1–5 days
Amdahl ceiling if bottleneck is 25% of CPU
1.33x max
Amdahl ceiling if bottleneck is 80% of CPU
5x max
Typical multi-family compound gain
4–8x
Single-family fix typical gain
1.3–2.5x
Quiz

A service is allocation-bound (GC pressure 25%) AND has an 800 KB JS bundle. Which should be fixed first?

Quiz

A flame graph shows runtime.scanobject at 22% and runtime.mallocgc at 11%. Amdahl on the combined GC frames gives a maximum 1.47x speedup. The SLO requires 3x. What is the correct next step?

Order the steps

Order the steps for classifying and fixing a bottleneck from profile to verified result:

  1. 1 Open the profile — identify the hottest frame by self-time
  2. 2 Name the family: CPU, allocation, cache, lock, I/O, syscall, JIT, bundle
  3. 3 Apply Amdahl to check whether fixing this family can reach the SLO target
  4. 4 If Amdahl ceiling is insufficient, return to profile and find additional contributors
  5. 5 Pick the fix technique from the family's playbook
  6. 6 Re-profile under same load; confirm both local frame and headline metric improved
Recall before you leave
  1. 01
    Why does naming the bottleneck family matter before writing any fix?
  2. 02
    A checkout service has three hot families: allocation 28%, I/O 22%, and CPU-algorithmic 18%. What does Amdahl say about each, and what does that imply?
  3. 03
    What is cross-layer compounding, and why does the shared classification vocabulary enable it?
Recap

Every bottleneck belongs to one of eight families: CPU-algorithmic, allocation, cache, lock, I/O (N+1), syscall (batching), JIT-deopt, or bundle. Naming the family from the profile takes under two minutes; it directs you to the correct fix playbook immediately. Amdahl’s law — maximum speedup equals 1 divided by (1 minus the hotspot fraction) — sets a hard ceiling on any single fix before you write a line of code. If the Amdahl ceiling is below the SLO target, the hotspot is not sufficient to fix alone; return to the profile and find additional contributors. Real production bottlenecks typically span multiple layers; the compound gain from fixing two or three families in parallel is four to eight times larger than fixing any one family in isolation. The shared vocabulary of families lets frontend, backend, and infra engineers describe and quantify their layer’s contribution so parallel work can be coordinated without confusion.

Connected lessons
appears again in260
Continue the climb ↑Observability stack and CI gates: catching regressions before they ship
shortcuts expand
search
K
prev piece
k
next piece
j
cycle tier
t
this menu
?
sources3
expand
  1. 01
  2. 02
  3. 03

Trademarks belong to their respective owners. Editorial reference only.