Engineering Practice ENG · 03 · 04

Automate the mechanical, reserve humans for judgement

Every deterministic check a human performs in review is wasted senior attention. Push format, lint, types, and security scanning into a blocking CI gate that runs before review, so the human budget lands on design, intent, and correctness — the things no tool can decide.

ENG Senior ◷ 17 min

Level

FoundationsJuniorMiddleSenior

A team’s PRs averaged eleven comments each, and roughly seven of those eleven were about formatting, import order, or naming case. The two senior reviewers were spending most of a day a week typing “nit: spacing” and “use camelCase here.” Then someone added Prettier and ESLint as a blocking CI check that ran before the review request even fired. Overnight the average dropped to four comments, and all four were about behavior: an off-by-one in pagination, a missing await, a dropped error case, a question about a money rounding rule. Nothing changed about how careful the reviewers were. The machine simply stopped handing them the machine’s job.

The dividing line is decidability, not importance

The rule isn’t “automate the unimportant stuff.” Some automated checks — a security scanner catching a SQL injection, a type checker catching a null deref — catch things far more important than most human comments. The real dividing line is decidability: can a rule be evaluated mechanically with a stable answer? Formatting, import order, naming conventions, unused variables, lint rules, type errors, known-vulnerable dependencies, and a large class of bug patterns are all decidable — a tool gives the same verdict every time, with no context required. Everything decidable belongs in CI, running before the human ever opens the diff, and blocking merge when it fails.

What’s left for the human is the undecidable residue, which is precisely what review is for: Is this the right design? Does it actually do what the ticket intended? Will the next engineer understand it in six months? Is the error path correct, the money path exact, the auth boundary sound? Google’s Standard of Code Review puts design first — “the most important thing to cover” — because design choices compound and are expensive to undo, while a misplaced brace costs nothing and a formatter fixes it for free. Every minute a senior spends on a decidable check is a minute stolen from the undecidable ones only they can do, and it’s a recurring theft: the same nit, every PR, forever, until you automate it once.

Gate	Catches	Decidable?	When it runs
Formatter (Prettier, gofmt)	Layout, spacing, quotes	Yes — auto-fixes	Pre-commit + CI block
Linter / type checker	Unused vars, null derefs, lint rules	Yes	CI block, before review
SAST / dependency scan	Injection, known CVEs, secrets	Mostly — needs tuning	CI block, before review
Human reviewer	Design, intent, maintainability, correctness	No — needs judgement	After gates are green

Shift left: the gate runs before the human, and it must be cheap to obey

Sequencing is half the value. A linter that only runs after a human approves is theatre — the nit was already typed. The gate must run before the review request, ideally also pre-commit on the author’s machine, so the author fixes mechanical issues themselves and the reviewer receives a diff where every decidable check is already green. This is the same shift-left logic as trunk-based CI: catch the cheap class of problem at the earliest, cheapest point, and free the expensive resource (human judgement, end-to-end testing) for the problems that can only be caught late. The author-preparation effect from the SmartBear study reinforces it — a clean, pre-checked diff is one the author has already had to look at carefully, which itself drives defect density down.

But automation has a failure mode that will quietly destroy it: false positives. A static-analysis tool that flags ten non-issues for every real one trains engineers to ignore it, and then it catches nothing because everyone reflexively dismisses its output — the same alert-fatigue dynamic as a pager that cries wolf. Google’s Tricorder program treated this as the central design constraint, holding analyzers to a strict effective-false-positive rate under 10% and giving developers a one-click “not useful” feedback channel to kill noisy checks. The lesson is that “turn on more static analysis” is not free: a noisy analyzer is worse than none, because it spends trust you can’t easily rebuild. Tune for signal, make the fix obvious, and let developers prune checks that don’t earn their place.

▸Why this works

The deeper reason to automate the mechanical isn’t reviewer time — it’s calibration of the whole team’s model of what review means. When a junior’s first PRs come back full of spacing nits, they learn that review is about surface, and they review others the same way, and the culture ossifies around trivia while design flaws sail through. When the machine owns all of that and every human comment is about behavior or design, the team learns that review is for the things that matter. Automation doesn’t just save attention; it teaches everyone where attention is supposed to go.

Automation redirects review, it doesn’t replace it

The seductive overreach is to conclude that if tools catch bugs, tools can replace review. They can’t — they shift the boundary, they don’t erase it. A linter has no concept of whether a feature solves the user’s actual problem; a SAST tool can’t tell you the retry path isn’t idempotent; an AI reviewer can summarize a diff but still can’t certify that the design will survive next quarter’s requirements. What automation does is redirect the human: by clearing the decidable layer, it concentrates scarce judgement on the undecidable layer where the expensive defects live. A team that wires up formatters, linters, type checks, and tuned SAST as a blocking pre-review gate isn’t reviewing less — it’s reviewing only the things worth a human, which is the entire point of the practice.

The end state is a tiered pipeline. Tier 0 is the author’s own pre-commit checks and annotated diff. Tier 1 is the blocking CI gate — format, lint, types, tests, dependency and security scans — that a PR must pass before a human is even asked. Tier 2 is human review, now spending its full, undiluted budget on design, intent, correctness, and knowledge transfer. Each tier catches what’s cheapest to catch at that layer and passes the irreducible residue up. The single highest-leverage move most teams haven’t fully made is simply ensuring no human ever types a comment a machine could have made.

Pick the best fit

Your PRs are dominated by style and formatting comments, and seniors are burning hours on nits. What's the right structural fix?

Quiz

What's the actual dividing line for what to automate versus leave to a human reviewer?

Quiz

Why did Google's Tricorder hold analyzers to an effective-false-positive rate under 10%?

Order the steps

Order a tiered pipeline so human attention only ever sees the undecidable layer:

1 Author runs pre-commit format/lint locally and annotates the diff
2 Blocking CI gate runs format, lint, types, tests, tuned SAST — must be green
3 Review request fires only after every decidable check passes
4 Human spends full budget on design, intent, correctness, maintainability
5 Noisy checks that draw 'not useful' feedback are pruned to protect trust

Each gate catches what's cheapest at that layer. The human never sees a decidable issue — full budget goes to design and correctness.

Recall before you leave

01
A teammate says 'automate the unimportant checks, humans do the important ones.' Why is 'importance' the wrong dividing line, and what's the right one?
02
What are the two failure modes to avoid when leaning on automated gates, and what does the tiered end state look like?

Recap

The reason to automate the mechanical parts of review isn’t merely to save reviewer time — it’s to put the right class of problem at the right layer and to teach the whole team where attention belongs. The dividing line is decidability, not importance: any rule a machine can evaluate with a stable, context-free answer — formatting, import order, naming, unused variables, lint, types, known-vulnerable dependencies, and a wide band of bug patterns — belongs in a blocking CI gate, ideally mirrored pre-commit, that runs before the review request so the author clears it and the reviewer never sees it. That leaves the human with the undecidable residue that review actually exists for: is the design right, does it match intent, will it be maintainable, are the error, money, and auth paths correct. Sequencing matters — a gate that runs after approval is theatre — and so does signal: an untuned analyzer that cries wolf gets ignored and protects nothing, which is why Google’s Tricorder held false positives under 10% with a one-click prune channel. The seductive overreach is to think tools can replace review; they only redirect it, concentrating scarce judgement where the expensive defects hide. The mature end state is tiered — author pre-checks, then a blocking CI gate, then human review on design and correctness alone — and the single highest-leverage rule is that no human should ever type a comment a machine could have made.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 7 done

Connected lessons

builds on

Giving and receiving review without the frictionmiddle

unlocks

Review anti-patterns and scaling review across an orgsenior

deepens into

Review anti-patterns and scaling review across an orgsenior

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.