awesome-everything RU
↑ Back to the climb

Engineering Practice

On-call: multiple-choice review

Crux Multiple-choice synthesis across the on-call unit — symptom vs cause alerting, burn-rate pages, alert fatigue, load caps, runbooks, escalation, and the metrics that keep a rotation honest.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at senior altitude — in orbit
◷ 13 min

Six questions that cut across the whole unit. None is a definition to recite — each mirrors a decision a senior makes about what fires a pager at 4 a.m. and what must not.

Goal

Confirm you can connect the unit’s spine: actionable symptom/SLO alerts on the pager, everything else demoted, load capped, runbooks linked, and the rotation measured by % actionable.

Quiz

A disk-usage threshold pages on-call 40 times a month and self-resolves nearly every time. Why is this more dangerous than having no alert at all?

Quiz

You must decide what pages a human. Which condition belongs on the pager, and what is the principle?

Quiz

Google SRE caps a shift at about two incidents and operational work at 50% of an SRE's time. What breaks if you ignore the caps?

Quiz

Which metric most directly signals a rotation degrading toward burnout?

Quiz

A page fires but the primary responder is stuck and hasn't acknowledged. What is the right structural mechanism, and what is the runbook's role here?

Quiz

A change is proposed to reduce noise: snooze the noisiest alerts and add Alertmanager group_by, inhibit_rules, and a 30s group_wait. What does this buy you, and what does it not?

Recap

The through-line of the unit is one rule applied at every layer: a page must be actionable or it is deleted. Page on symptoms and SLO burn rate, not causes like CPU or disk; demote the rest to tickets and dashboards. Cap load (≈2 incidents/shift, ≤50% ops time) so prevention work survives. Link a runbook to every page, escalate on a timer when a responder is stuck, and use grouping/inhibition to collapse storms — but remember routing only de-duplicates. Measure MTTA, MTTR, and page volume, and steer by the north star: % actionable.

Continue the climb ↑On-call: free-recall review
shortcuts expand
search
K
prev piece
k
next piece
j
cycle tier
t
this menu
?
sources3
expand
  1. 01
  2. 02
  3. 03

Trademarks belong to their respective owners. Editorial reference only.