Performance
Hot paths: multiple-choice review
Six questions that cut across the whole unit. Each one mirrors a call you make mid-incident with a profile open — not a definition to recite, but a diagnosis to commit to before you spend a sprint on the wrong fix.
Confirm you can chain the unit together: read a wide leaf as a symptom, classify it into one of the shapes, locate the fix via the parent/child chain, and respect the constraints production adds — counters, tail latency, and the security gate.
A flame graph shows renderTemplate at 28% self-time. The allocation profile then shows fmt.Sprintf at 62% of alloc bytes inside it. Two engineers want to switch template engines. What is the correct read?
Two wide leaves look identical on the flame graph. perf stat shows leaf A at IPC 3.0 with a 1% cache-miss rate, and leaf B at IPC 0.4 with an 18% cache-miss rate. How do the fixes differ?
json.Marshal is at 28% CPU. In one service it has a single dominant parent (a request logger); in another it has twenty thin parents (handlers). Where does each fix belong?
A lock-free counter array uses atomic adds with per-thread indices, yet under load IPC collapses to 0.42, cache-miss rate hits 76%, and throughput gets worse as you add threads. Diagnosis and fix?
A Node flame graph shows InterpreterCallStub frames dominating a function that should be hot, and latency spikes that do not correlate with traffic. After a 'better algorithm' rewrite, nothing improves. Why, and what is the actual fix?
An engineer makes a token-comparison hot path 3x faster by adding an early-exit branch on the first mismatching byte. The flame frame shrank and p50 improved. Why is this still wrong?
The unit is one decision tree: a wide leaf names the symptom, not the cause; classify it into a shape (CPU, allocation, cache, lock, syscall, JIT deopt, or the hardware-only cases — false sharing, native-bridge); locate the fix via the parent chain (one caller vs many) and the child chain (work here vs one level down); reach for hardware counters and TMA when CPU-bound and memory-bound look identical; and gate any change to constant-time or security-sensitive paths through review. Diagnose first, fix one thing, prove it with a profile diff — and watch tail latency, not just the mean.