Performance PERF · 02 · 10

Hot paths: diagnose and fix two shapes

Hands-on project — seed a service with two different-shaped hotspots, diagnose each from the right profile, apply the matching fix, and prove the wins with before/after numbers under identical load.

PERF Senior ◷ 240 min

Level

FoundationsJuniorMiddleSenior

Reading about the five shapes is not the same as pulling a service out of two of them at once. Build a service with deliberately different-shaped hotspots, then run the full loop on each: classify from the right profile, locate the fix via the parent/child chain, apply one change, and prove it with numbers — without guessing the toolbox.

Goal

Turn the unit’s decision tree into a reproducible engineering loop: instrument, capture the right profile per hotspot, classify it into a shape, locate the fix layer, apply the matching fix family, and verify both the local frame and the headline metric — twice, on two different shapes.

Project

0 of 7

Objective

Take an HTTP service (your own or the starter described below) seeded with at least two hotspots of DIFFERENT shapes — one allocation-bound and one memory-bound or false-sharing — and bring each under control by diagnosing the shape first, then applying only the matching fix family, proving every step with measurements under identical load.

Requirements

Acceptance criteria

A before/after table per endpoint: p99, p99.9, CPU%, allocation rate, and (for the memory-bound path) IPC and cache-miss rate — measured under identical load, not estimated.
Each fix is justified by its classification: a one-line statement of the shape, the deciding signal, and why the chosen fix family matched (and why the obvious wrong fix — e.g. 'switch libraries' or 'rewrite the algorithm' — would have under-delivered).
Re-captured evidence shows the shape resolved: the allocation path's GC frames and alloc rate dropped ≥50%; the memory-bound path's IPC rose above 2 and its cache-miss rate fell below ~5% (or the false-sharing path's IPC recovered and stopped worsening with cores).
The fix-and-verify check passes for both: the local frame shrank AND the headline latency (p99) improved; if a headline did not move, a written note identifies the next-widest hotspot the fix unmasked.

Senior stretch

Add a one-page on-call triage runbook: page → profile in 60 s → widest leaf → category decision tree (GC frames? low IPC + miss? off-CPU? kernel frames? interpreter frames?) → parent/child read → fix-family lookup → diff-verify checklist, skill-portable for a polyglot fleet.
Add a security gate to the runbook: before optimising any path, check whether it touches auth, crypto comparison, or input validation; demonstrate it by attempting (and rejecting) an early-exit 'optimisation' of a constant-time token compare.
Add tail-latency monitoring: per-endpoint latency histograms sliced to p99.9, and show that a deliberately injected intermittent stall (a periodic GC or lock spike) is invisible on the CPU% dashboard but obvious on the p99.9 panel.
Add a PR-time CI gate: load-test a canary, diff its profile against main, and fail the build if any function's self-time share grows more than 30% relative.

Recap

This is the loop you run in every real incident, done twice on purpose: instrument first, capture the right profile per hotspot, classify the shape from the deciding signal (not the flame-graph width), locate the fix via the parent/child chain, apply only the matching fix family, and verify both the local frame and the headline metric under identical load. Doing it on two different shapes at once is what turns the five-shape model from a table you memorised into a decision tree you reach for under pressure.

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.