Performance
Cache vs big-O: multiple-choice review
Six questions that cut across the whole unit. None is a definition to recite — each is the judgement call you make when a “fast” algorithm loses to a “slow” one and big-O has nothing to say about it.
Confirm you can connect the memory hierarchy, cache-line behaviour, false sharing, branch prediction, and data layout into one diagnostic instinct: read the profile, name the constant-factor cause, pick the matching fix.
A linked list of 1M nodes iterates ~17x slower than an array of 1M ints, though both are O(N). Which statement best explains the gap?
A team flips a matrix-multiply inner loop from row-major to column-major order on a 10 000×10 000 double matrix and sees a ~9x slowdown. The arithmetic is identical. Root cause?
A counter array is made lock-free with atomics and gets slower than the locked version as threads scale. perf shows IPC 0.4 and a 72% cache-miss rate. Diagnosis?
Sorting an array before summing the elements above a threshold yields a 2–3x speedup, despite the extra O(N log N) sort. Why does it pay back?
A hot ML loop runs 5x slower than a reference C++ port; perf shows IPC 0.8 and 25% L3 miss rate. Someone proposes adding SIMD intrinsics. What is the correct first move?
perf stat on a hot loop reports IPC 0.40, branch-miss 25%, and LLC-miss 47%. You can fix one thing first. Which, and why?
The unit’s through-line is one decision tree: big-O sizes how cost grows with N, but the memory hierarchy sets the constant — and that constant swings ~100x with layout. Contiguous beats pointer-chasing; loop order must match storage order; per-thread writes need their own cache line; unpredictable branches want sorting or branchless code; one-field-at-a-time work wants SoA. When a profile is in front of you, read miss rates and IPC first, name the constant-factor cause, and apply the matching fix — measure, never guess.