Engineering Practice ENG · 07 · 08

On-call: free-recall review

Free-recall prompts across the on-call unit. Answer each in your own words first, then reveal the model answer and compare — symptom alerting, burn rate, fatigue, load caps, escalation, and toil.

ENG Senior ◷ 13 min

Level

FoundationsJuniorMiddleSenior

Retrieval beats re-reading. For each prompt, say or write a full answer from memory before you open the model answer — the effort of recall is what makes the on-call discipline stick when you are actually holding the pager.

Goal

Reconstruct the unit’s core mechanisms — why symptoms beat causes, what burn-rate alerting does, the anatomy of alert fatigue, the load caps, the page lifecycle, and how toil reduction keeps a rotation sustainable — without looking back.

Recall before you leave

01
Why is symptom-based alerting higher-leverage than cause-based alerting, and what is the one exception?
02
Explain burn-rate alerting and why multi-window, multi-burn-rate beats a raw error-rate threshold.
03
Walk through the mechanism of alert fatigue and why you can't fix it by telling the responder to be more careful.
04
State Google SRE's on-call load caps and the reasoning that makes each cap a reliability lever, not just a humane one.
05
Describe the lifecycle of a well-run page from definition to closure, and where each on-call mechanism plugs in.
06
What is toil, why does reducing it matter for on-call specifically, and how do the metrics MTTA, MTTR, page volume, and % actionable steer that reduction?

Recap

If you could reconstruct each answer from memory, you hold the unit’s spine: page on symptoms and SLO burn rate, not causes; burn-rate alerting matches urgency to real budget threat; alert fatigue is a structural failure cured by deletion, not discipline; the load caps (≈2 incidents/shift, ≤50% ops, ≤25% on-call) protect the engineering that keeps the rotation quiet; every page runs a defined lifecycle with a runbook and timed escalation; and toil reduction, steered by MTTA, MTTR, page volume, and % actionable, is what keeps it all sustainable. Now when you carry the pager for real, you have the instinct to delete before you add — and you know which number tells you the rotation is quietly dying before anyone burns out.

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.