awesome-everything RU
↑ Back to the climb

Engineering Practice

On-call: free-recall review

Crux Free-recall prompts across the on-call unit. Answer each in your own words first, then reveal the model answer and compare — symptom alerting, burn rate, fatigue, load caps, escalation, and toil.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at senior altitude — in orbit
◷ 13 min

Retrieval beats re-reading. For each prompt, say or write a full answer from memory before you open the model answer — the effort of recall is what makes the on-call discipline stick when you are actually holding the pager.

Goal

Reconstruct the unit’s core mechanisms — why symptoms beat causes, what burn-rate alerting does, the anatomy of alert fatigue, the load caps, the page lifecycle, and how toil reduction keeps a rotation sustainable — without looking back.

Recall before you leave
  1. 01
    Why is symptom-based alerting higher-leverage than cause-based alerting, and what is the one exception?
  2. 02
    Explain burn-rate alerting and why multi-window, multi-burn-rate beats a raw error-rate threshold.
  3. 03
    Walk through the mechanism of alert fatigue and why you can't fix it by telling the responder to be more careful.
  4. 04
    State Google SRE's on-call load caps and the reasoning that makes each cap a reliability lever, not just a humane one.
  5. 05
    Describe the lifecycle of a well-run page from definition to closure, and where each on-call mechanism plugs in.
  6. 06
    What is toil, why does reducing it matter for on-call specifically, and how do the metrics MTTA, MTTR, page volume, and % actionable steer that reduction?
Recap

If you could reconstruct each answer from memory, you hold the unit’s spine: page on symptoms and SLO burn rate, not causes; burn-rate alerting matches urgency to real budget threat; alert fatigue is a structural failure cured by deletion, not discipline; the load caps (≈2 incidents/shift, ≤50% ops, ≤25% on-call) protect the engineering that keeps the rotation quiet; every page runs a defined lifecycle with a runbook and timed escalation; and toil reduction, steered by MTTA, MTTR, page volume, and % actionable, is what keeps it all sustainable.

Continue the climb ↑On-call: alert rules and budget math
shortcuts expand
search
K
prev piece
k
next piece
j
cycle tier
t
this menu
?
sources3
expand
  1. 01
  2. 02
  3. 03

Trademarks belong to their respective owners. Editorial reference only.