Engineering Practice ENG · 03 · 05

Review anti-patterns and scaling review across an org

Rubber-stamps, bikeshedding, the bottleneck reviewer, and the metric trap all fail the same way: review stops being about design. Scaling fixes route ownership, time-box pickup, balance load, and sometimes move review continuous (pairing) or post-commit on a high-trust base.

ENG Senior ◷ 17 min

Level

FoundationsJuniorMiddleSenior

At a company that had grown from 30 to 200 engineers, one staff engineer owned the payments service in CODEOWNERS, so every PR touching it required his approval. He had 40-odd reviews queued daily; the team learned that the way to ship was to make him comfortable, so they kept diffs small and boring — and he, drowning, started skimming and stamping. A migration that changed the settlement currency rounding slipped through with a “LGTM, nice tests.” The tests were nice. They asserted the old behavior. Two days later finance flagged a six-figure discrepancy across one settlement batch. The review process hadn’t failed because the reviewer was bad — it failed because one human had been made the wall the whole org queued behind, and a wall under that much load stops reading.

The anti-patterns are all the same failure: review stops being about design

This arc has built one claim across four lessons: review exists for the undecidable — design, intent, correctness, knowledge transfer — and everything mechanical belongs in a gate. Every anti-pattern is a way that claim quietly inverts. Rubber-stamping — “LGTM” in eleven seconds without reading — is review-as-ritual, and it correlates directly with the large PRs from the size lesson: a 1,600-line diff exceeds working memory, so the only honest options are a free 90-minute block nobody has or a stamp, and under deadline the stamp wins. Bikeshedding (Parkinson’s law of triviality: people argue hardest about the bike-shed color because everyone has an opinion on paint, nobody on the reactor) is the opposite surface, same rot: ten comments about naming and import order while the design sails through unexamined — which is exactly the class the previous lesson said to automate away, freeing the human for the reactor instead of the shed.

Then the people-shaped ones. Power-tripping / non-actionable feedback — “this whole approach is wrong” with no alternative, or comments that score points rather than improve code — ties straight back to the giving-review lesson: a comment that states neither its severity nor its fix is a guess the author must decode, and a reviewer who uses review to assert dominance teaches everyone to ship to the reviewer’s taste rather than to the system’s needs. Underneath all of them sits the deepest one: review-as-gatekeeping versus review-as-knowledge-sharing. If review’s purpose is “a senior decides who’s allowed to merge,” it concentrates power, breeds the bottleneck, and makes every comment a verdict. If its purpose is “two engineers build shared understanding of this change,” it distributes knowledge, grows reviewers, and makes comments collaborative. Same activity, opposite cultures — and the gatekeeping framing is what produces almost every other anti-pattern.

Anti-pattern	Root cause	Structural fix
Rubber-stamping (LGTM, unread)	PR too big to read; deadline pressure	Small/stacked PRs; decouple approval from speed
Bikeshedding / nitpicking	Trivia is easy to opine on; style not automated	Automate the mechanical; tag nits as non-blocking
Bottleneck reviewer	One owner gates everything; no routing	CODEOWNERS spread, round-robin, grow reviewers
Power-tripping / non-actionable	Review framed as verdict, not collaboration	Severity + fix in every comment; knowledge-share framing
Metric trap (count comments/PRs)	Goodhart: the target gets gamed	Measure outcomes (latency, escaped defects), not activity

Scaling: route, time-box, balance — don’t add a gate, add throughput

When you join a team that has grown past ~50 engineers, the review problems you’ll encounter won’t look like laziness — they’ll look like structure. A practice that works at 20 engineers breaks at 200 not because people get worse but because the queue changes shape. The first lever is routing: CODEOWNERS (or equivalent) maps paths to the people who actually own them, so a PR is auto-assigned to a competent reviewer instead of waiting for someone to notice it. The trap inside routing is the one in the Hook — make a single person own a hot path and you’ve built a bottleneck by construction. The fix is to spread ownership: list a team, not a person; use round-robin or load-balanced assignment so the same staff engineer doesn’t accumulate 40 reviews a day while juniors who could handle the routine 80% sit idle and never grow into reviewers. Most day-to-day PRs — bug fixes, features inside established patterns, test additions — need a competent reviewer, not the senior; reserve the bottleneck-prone expert for the architecturally significant and security-sensitive minority, and pair a junior with them on those so review skill propagates instead of concentrating.

The second lever is time-boxing pickup. Google’s guidance is to respond to a review within one business day, and the size lesson showed why the response, not the reviewing, is the dominant cost: a PR in review is blocked work. An SLA on first response (say, within 4 hours during local hours) plus a review-buddy rotation that guarantees someone is watching the queue keeps pickup latency bounded — but note the sharp edge: an SLA on completion of a large diff just pressures the reviewer to skim, trading the latency win for a rubber stamp. SLAs work on pickup, not on quality. Make load visible — a dashboard of who has how many pending reviews — and the org can rebalance before anyone drowns, which is the cheap prevention for the Hook’s failure.

▸Why this works

Why does scaling so reliably resurrect the rubber stamp? Because every scaling pressure pushes service time and queue length up while the human’s attention budget stays fixed at a few hundred lines. More engineers means more PRs per reviewer; a hot owned path means a deeper queue on one person; a deadline means less time per review. None of those change the cognitive limit — they just push more volume through the same fixed pipe, and the overflow valve is always the same one: read less, stamp more. That’s why the scaling answers are all about reducing what any one human must serve (route, balance, shrink, move continuous) rather than exhorting people to review harder. You cannot scale a fixed attention budget by asking it to try.

When async review isn’t the right shape at all: pairing and post-commit

For some work the right move isn’t a faster async review — it’s a different shape of review. Pair or mob programming is continuous review with zero async latency: the navigator reviews each line as it’s typed, so there’s no PR to queue, no pickup wait, and feedback bandwidth is orders of magnitude higher than comments left hours later. It’s the strongest substitute for review on genuinely hard or high-risk work — gnarly design, a critical migration, onboarding someone into a complex subsystem — because the design conversation happens before the code exists rather than after. The cost is two people’s time on one task and that it’s exhausting to sustain all day, so teams pair on the hard 20% and async-review the routine rest; pairing also doesn’t auto-produce the written audit trail an async PR leaves, which some compliance regimes require.

Post-commit (ship-then-review) review removes pickup latency from the merge path entirely: the author merges to trunk and reviewers comment afterward, with concerns addressed in follow-ups. This trades pre-merge gating for velocity, and it is only safe on a high-trust base with a strong automated floor — exhaustive CI, good tests, fast rollback, feature flags. The reason it can work is that the gate that actually protects trunk in a mature trunk-based shop is the test suite, not the human; Facebook’s predictive test selection, for instance, catches over 99.9% of regressions while running a third of the tests, which is the kind of automated confidence that makes post-commit review a reasonable risk for low-stakes changes among trusted committers. It is not a license to skip thinking — it relocates the human review to after a machine-verified merge, and it is the wrong default for security-sensitive code, untrusted contributors, or any change the test suite can’t meaningfully cover. Choose the shape by stakes and trust, not by habit.

Pick the best fit

Your org grew to 200 engineers and one staff engineer is CODEOWNER for the payments service — 40 reviews/day, and quality is now slipping into rubber stamps. What's the highest-leverage fix?

Quiz

A manager starts rewarding engineers for the number of review comments they leave, to encourage thoroughness. What's the predictable result?

Quiz

When is post-commit (ship-then-review) review a reasonable choice rather than a reckless one?

Order the steps

Order how a single owned hot path turns into a rubber-stamp incident at scale:

1 Org grows; one staff engineer is sole CODEOWNER for the payments path
2 Their queue swells to ~40 reviews/day — a fixed attention budget, way overloaded
3 Drowning, they start skimming and stamping 'LGTM, nice tests'
4 A money-path change slips through with tests that assert the old behavior
5 Fix the structure: spread ownership, load-balance, pair to grow reviewers

Rubber stamp PR too big → skim + LGTM → design ships unread

Bikeshedding style not automated → trivia dominates → design ignored

Bottleneck reviewer single CODEOWNER → 40/day queue → stamps to survive

Power-tripping verdict framing → no severity/fix → round-trips

Metric trap count comments → Goodhart → nits up, design still slips

All five share the same root: review stops being about design. Fix: small PRs, automate style, spread ownership, tag comments, measure escaped defects.

Recall before you leave

01
Name the major code-review anti-patterns and explain why they're really one failure, tying each back to an earlier lesson in the arc.
02
How do you scale code review across a growing org, and when do you change the shape of review rather than speed it up?

Recap

This is the synthesis of the whole code-review arc. Review exists for the undecidable — design, intent, correctness, knowledge transfer — and every anti-pattern is that purpose quietly inverting. Rubber-stamping is review-as-ritual and rides on the large PRs that overflow a fixed attention budget; bikeshedding is the same rot on the opposite surface, arguing paint while the design ships unexamined, which is exactly the mechanical class the previous lesson says to automate. Power-tripping and non-actionable feedback are the giving-review lesson failing — a comment without severity and a fix is a guess, and review used as a verdict teaches people to ship to a person’s taste. The root framing is gatekeeping versus knowledge-sharing: ‘a senior decides who may merge’ concentrates power and breeds the bottleneck, while ‘two engineers build shared understanding’ distributes it and grows reviewers. The metric trap (Goodhart) closes the set: measure comment or PR count and it gets gamed while quality slips. Scaling answers all reduce what any one human must serve, because scaling only pushes more volume through a pipe whose width is fixed. Route ownership with CODEOWNERS to a team, not a person, load-balanced, reserving the expert for the security-sensitive minority and pairing juniors in to propagate skill. Time-box pickup with an SLA on first response (one business day) and visible load dashboards — never an SLA on completion, which just buys a skim. And for the right work, change the shape: pairing is continuous review with no async latency for the hard 20%, and post-commit review trades pre-merge gating for velocity only on a high-trust base with a strong automated floor, for low-stakes changes, never for money or security paths. The thread through it all: review’s job is design, intent, and knowledge transfer; the mechanical is automated; size and latency are the levers; and at scale you route, time-box, balance, and sometimes move review continuous or post-commit — you do not scale a fixed attention budget by asking it to try harder. Now when you see a single reviewer drowning under 40 daily requests, or a team measuring quality by comment count, you’ll recognize the structural failure underneath — and know which lever to reach for.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 5 done

Connected lessons

builds on

Automate the mechanical, reserve humans for judgementsenior

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.