Engineering Practice ENG · 03 · 10

Code review: redesign a review process

Hands-on project — redesign a real repo's code-review process: add a blocking pre-review gate, a small-PR norm, a severity-tagged comment convention, and latency/outcome metrics, then prove the change with before/after numbers.

ENG Senior ◷ 240 min

Level

FoundationsJuniorMiddleSenior

Reading about review is not the same as fixing a review process that’s leaking defects and stalling PRs. Take a real repository — your team’s, an open-source project, or a deliberately messy one you set up — diagnose where its review pipeline puts human attention in the wrong place, and re-engineer it using the unit’s full model, proving each change moves a metric.

Goal

Turn the unit’s mental model into a working review pipeline: automate the decidable into a blocking pre-review gate, make PRs small, make feedback triageable, route and time-box the queue, and verify the redesign with before/after latency and quality numbers — not opinions.

Project

0 of 7

Objective

Take a repository with a real or simulated review process and redesign it so human attention lands only on the undecidable layer — measuring that pickup latency drops and the mechanical-comment share falls, without lowering defect detection.

Requirements

Acceptance criteria

A before/after table: median time-to-first-review, median time-to-merge, and the percentage of review comments that are mechanical vs substantive — measured from PR data on both runs, not estimated.
Evidence the gate works: a screenshot or log of a PR blocked by the gate before any human reviewed it, and the mechanical-comment share dropping (humans no longer typing comments a machine could have made).
At least one genuinely large change shipped as a reviewable stacked-diff chain, with each PR in the 200–400 LOC band and a note on the seams chosen.
A one-page write-up naming, for each change (gate, size norm, comment convention, routing/SLA, review shape), which lesson principle it applies and which metric it moved — and an honest note on any metric that did NOT improve and why.

Senior stretch

Add an outcome metric to guard against gaming: track escaped defects (post-merge bugs or reverts per PR) across both runs to confirm the latency win didn't come from skimming — measuring outcomes, not activity, the way the anti-patterns lesson demands.
Build a review-load dashboard: pending reviews per person and per-PR age, so the org can rebalance before anyone becomes the bottleneck reviewer from the scaling lesson.
Pilot post-commit (ship-then-review) on one explicitly low-stakes, well-tested path among trusted committers, with feature flags and fast rollback, and document the trust/automation preconditions that made it safe there but wrong for a money or auth path.
Run a calibration exercise: have two reviewers independently review the same medium PR and diff their comments, then reconcile the severity labels — surfacing where the team's model of 'blocking vs nit' diverges and tightening the convention.

Recap

This is the loop you run when a real review process is failing: baseline it with numbers, push the decidable class into a blocking pre-review gate so no human types a comment a machine could have made, keep PRs small and stack the genuinely large ones, make feedback triageable with severity and a fix, route to a team and time-box pickup not completion, and choose the review shape by stakes and trust. Then prove it with before/after latency and comment-split metrics — guarded by an escaped-defect outcome so the speed didn’t come from skimming. Doing this once on a real repo turns the unit’s model into something you can install on any team.

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.