Engineering Practice
Review anti-patterns and scaling review across an org
At a company that had grown from 30 to 200 engineers, one staff engineer owned the payments service in CODEOWNERS, so every PR touching it required his approval. He had 40-odd reviews queued daily; the team learned that the way to ship was to make him comfortable, so they kept diffs small and boring — and he, drowning, started skimming and stamping. A migration that changed the settlement currency rounding slipped through with a “LGTM, nice tests.” The tests were nice. They asserted the old behavior. Two days later finance flagged a six-figure discrepancy across one settlement batch. The review process hadn’t failed because the reviewer was bad — it failed because one human had been made the wall the whole org queued behind, and a wall under that much load stops reading.
The anti-patterns are all the same failure: review stops being about design
This arc has built one claim across four lessons: review exists for the undecidable — design, intent, correctness, knowledge transfer — and everything mechanical belongs in a gate. Every anti-pattern is a way that claim quietly inverts. Rubber-stamping — “LGTM” in eleven seconds without reading — is review-as-ritual, and it correlates directly with the large PRs from the size lesson: a 1,600-line diff exceeds working memory, so the only honest options are a free 90-minute block nobody has or a stamp, and under deadline the stamp wins. Bikeshedding (Parkinson’s law of triviality: people argue hardest about the bike-shed color because everyone has an opinion on paint, nobody on the reactor) is the opposite surface, same rot: ten comments about naming and import order while the design sails through unexamined — which is exactly the class the previous lesson said to automate away, freeing the human for the reactor instead of the shed.
Then the people-shaped ones. Power-tripping / non-actionable feedback — “this whole approach is wrong” with no alternative, or comments that score points rather than improve code — ties straight back to the giving-review lesson: a comment that states neither its severity nor its fix is a guess the author must decode, and a reviewer who uses review to assert dominance teaches everyone to ship to the reviewer’s taste rather than to the system’s needs. Underneath all of them sits the deepest one: review-as-gatekeeping versus review-as-knowledge-sharing. If review’s purpose is “a senior decides who’s allowed to merge,” it concentrates power, breeds the bottleneck, and makes every comment a verdict. If its purpose is “two engineers build shared understanding of this change,” it distributes knowledge, grows reviewers, and makes comments collaborative. Same activity, opposite cultures — and the gatekeeping framing is what produces almost every other anti-pattern.
| Anti-pattern | Root cause | Structural fix |
|---|---|---|
| Rubber-stamping (LGTM, unread) | PR too big to read; deadline pressure | Small/stacked PRs; decouple approval from speed |
| Bikeshedding / nitpicking | Trivia is easy to opine on; style not automated | Automate the mechanical; tag nits as non-blocking |
| Bottleneck reviewer | One owner gates everything; no routing | CODEOWNERS spread, round-robin, grow reviewers |
| Power-tripping / non-actionable | Review framed as verdict, not collaboration | Severity + fix in every comment; knowledge-share framing |
| Metric trap (count comments/PRs) | Goodhart: the target gets gamed | Measure outcomes (latency, escaped defects), not activity |
Scaling: route, time-box, balance — don’t add a gate, add throughput
A practice that works at 20 engineers breaks at 200 not because people get worse but because the queue changes shape. The first lever is routing: CODEOWNERS (or equivalent) maps paths to the people who actually own them, so a PR is auto-assigned to a competent reviewer instead of waiting for someone to notice it. The trap inside routing is the one in the Hook — make a single person own a hot path and you’ve built a bottleneck by construction. The fix is to spread ownership: list a team, not a person; use round-robin or load-balanced assignment so the same staff engineer doesn’t accumulate 40 reviews a day while juniors who could handle the routine 80% sit idle and never grow into reviewers. Most day-to-day PRs — bug fixes, features inside established patterns, test additions — need a competent reviewer, not the senior; reserve the bottleneck-prone expert for the architecturally significant and security-sensitive minority, and pair a junior with them on those so review skill propagates instead of concentrating.
The second lever is time-boxing pickup. Google’s guidance is to respond to a review within one business day, and the size lesson showed why the response, not the reviewing, is the dominant cost: a PR in review is blocked work. An SLA on first response (say, within 4 hours during local hours) plus a review-buddy rotation that guarantees someone is watching the queue keeps pickup latency bounded — but note the sharp edge: an SLA on completion of a large diff just pressures the reviewer to skim, trading the latency win for a rubber stamp. SLAs work on pickup, not on quality. Make load visible — a dashboard of who has how many pending reviews — and the org can rebalance before anyone drowns, which is the cheap prevention for the Hook’s failure.
Why this works
Why does scaling so reliably resurrect the rubber stamp? Because every scaling pressure pushes service time and queue length up while the human’s attention budget stays fixed at a few hundred lines. More engineers means more PRs per reviewer; a hot owned path means a deeper queue on one person; a deadline means less time per review. None of those change the cognitive limit — they just push more volume through the same fixed pipe, and the overflow valve is always the same one: read less, stamp more. That’s why the scaling answers are all about reducing what any one human must serve (route, balance, shrink, move continuous) rather than exhorting people to review harder. You cannot scale a fixed attention budget by asking it to try.
When async review isn’t the right shape at all: pairing and post-commit
For some work the right move isn’t a faster async review — it’s a different shape of review. Pair or mob programming is continuous review with zero async latency: the navigator reviews each line as it’s typed, so there’s no PR to queue, no pickup wait, and feedback bandwidth is orders of magnitude higher than comments left hours later. It’s the strongest substitute for review on genuinely hard or high-risk work — gnarly design, a critical migration, onboarding someone into a complex subsystem — because the design conversation happens before the code exists rather than after. The cost is two people’s time on one task and that it’s exhausting to sustain all day, so teams pair on the hard 20% and async-review the routine rest; pairing also doesn’t auto-produce the written audit trail an async PR leaves, which some compliance regimes require.
Post-commit (ship-then-review) review removes pickup latency from the merge path entirely: the author merges to trunk and reviewers comment afterward, with concerns addressed in follow-ups. This trades pre-merge gating for velocity, and it is only safe on a high-trust base with a strong automated floor — exhaustive CI, good tests, fast rollback, feature flags. The reason it can work is that the gate that actually protects trunk in a mature trunk-based shop is the test suite, not the human; Facebook’s predictive test selection, for instance, catches over 99.9% of regressions while running a third of the tests, which is the kind of automated confidence that makes post-commit review a reasonable risk for low-stakes changes among trusted committers. It is not a license to skip thinking — it relocates the human review to after a machine-verified merge, and it is the wrong default for security-sensitive code, untrusted contributors, or any change the test suite can’t meaningfully cover. Choose the shape by stakes and trust, not by habit.
Your org grew to 200 engineers and one staff engineer is CODEOWNER for the payments service — 40 reviews/day, and quality is now slipping into rubber stamps. What's the highest-leverage fix?
A manager starts rewarding engineers for the number of review comments they leave, to encourage thoroughness. What's the predictable result?
When is post-commit (ship-then-review) review a reasonable choice rather than a reckless one?
Order how a single owned hot path turns into a rubber-stamp incident at scale:
- 1 Org grows; one staff engineer is sole CODEOWNER for the payments path
- 2 Their queue swells to ~40 reviews/day — a fixed attention budget, way overloaded
- 3 Drowning, they start skimming and stamping 'LGTM, nice tests'
- 4 A money-path change slips through with tests that assert the old behavior
- 5 Fix the structure: spread ownership, load-balance, pair to grow reviewers
- 01Name the major code-review anti-patterns and explain why they're really one failure, tying each back to an earlier lesson in the arc.
- 02How do you scale code review across a growing org, and when do you change the shape of review rather than speed it up?
This is the synthesis of the whole code-review arc. Review exists for the undecidable — design, intent, correctness, knowledge transfer — and every anti-pattern is that purpose quietly inverting. Rubber-stamping is review-as-ritual and rides on the large PRs that overflow a fixed attention budget; bikeshedding is the same rot on the opposite surface, arguing paint while the design ships unexamined, which is exactly the mechanical class the previous lesson says to automate. Power-tripping and non-actionable feedback are the giving-review lesson failing — a comment without severity and a fix is a guess, and review used as a verdict teaches people to ship to a person’s taste. The root framing is gatekeeping versus knowledge-sharing: ‘a senior decides who may merge’ concentrates power and breeds the bottleneck, while ‘two engineers build shared understanding’ distributes it and grows reviewers. The metric trap (Goodhart) closes the set: measure comment or PR count and it gets gamed while quality slips. Scaling answers all reduce what any one human must serve, because scaling only pushes more volume through a pipe whose width is fixed. Route ownership with CODEOWNERS to a team, not a person, load-balanced, reserving the expert for the security-sensitive minority and pairing juniors in to propagate skill. Time-box pickup with an SLA on first response (one business day) and visible load dashboards — never an SLA on completion, which just buys a skim. And for the right work, change the shape: pairing is continuous review with no async latency for the hard 20%, and post-commit review trades pre-merge gating for velocity only on a high-trust base with a strong automated floor, for low-stakes changes, never for money or security paths. The thread through it all: review’s job is design, intent, and knowledge transfer; the mechanical is automated; size and latency are the levers; and at scale you route, time-box, balance, and sometimes move review continuous or post-commit — you do not scale a fixed attention budget by asking it to try harder.