Engineering Practice ENG · 03 · 02

PR size sets review latency and detection at once

Big PRs don''''t get more scrutiny — they get less, because attention is a fixed budget. Detection peaks at 200-400 LOC and collapses past it, while large diffs sit longest waiting for a free hour. Small PRs are the one lever that cuts latency and raises quality together.

ENG Middle ◷ 16 min

Level

FoundationsJuniorMiddleSenior

Priya opens two PRs the same morning. The first is a 40-line bug fix; Sam picks it up between meetings, reads every line, leaves one real question, approves — done before lunch. The second is a 1,600-line “refactor + new export pipeline” she’d been polishing on a branch for nine days. It sits. Nobody has a free 90-minute block, so it waits two days for a reviewer, then gets eleven comments — ten about naming, one about a log line — and an LGTM. The pipeline’s batching logic, the actual risk, is never discussed. Three weeks later it silently drops every event that arrives during a retry. The small PR got more real review than the big one, in a tenth of the time.

Detection peaks in a narrow band, then falls off a cliff

The SmartBear study at Cisco — 2,500 reviews over 3.2 million lines, still the largest of its kind — found the sweet spot is 200–400 lines of code per review, examined over no more than 60–90 minutes, yielding a 70–90% defect yield: of ten defects present, you’d find seven to nine. Two thresholds bound that band. Past ~400 lines, detection drops sharply because the reviewer can no longer hold the change in working memory. And pace matters independently: reviewers slower than ~400 LOC/hour were above average at finding defects, but past ~450–500 LOC/hour, defect density came in below average in 87% of cases. Detection also collapses after 60–90 minutes of continuous reviewing — concentration is a depleting resource, not a constant.

The non-intuitive consequence: a big PR doesn’t earn proportionally more scrutiny for being big. Attention doesn’t scale with diff size; the human’s cognitive budget is fixed at a few hundred lines. So the 1,600-line PR gets less effective review than the 160-line one, not more. This is the mechanism behind the rubber stamp — the reviewer comments on what’s locally visible (a name, a log line) and approves, because that’s the only thing that fits the budget. Real design flaws hide in giant diffs precisely because giant diffs are the ones nobody can fully read.

PR size	What the reviewer actually does	Detection	Typical pickup
`< 200 LOC`	Reads fully in a 5-min gap; real questions	High	Minutes
`200–400 LOC`	Sweet spot: 60–90 min, full attention	70–90% yield	~1 hour
`400–800 LOC`	Attention thins; skimming begins	Falling	Hours
`> 800 LOC`	Rubber stamp: nits + LGTM, design unread	Collapses	Days (waits for a free block)

Latency is a queueing problem, and big PRs starve at the front

A PR in review is blocked work — the author can’t merge, often can’t cleanly start the next thing, and every hour it sits is frozen progress. The dominant component of turnaround usually isn’t the reviewing itself; it’s the pickup latency, the gap before anyone looks. And pickup is a queueing phenomenon driven by task size. Google’s own data is the cleanest illustration: developers wait a median under one hour for small changes but around five hours for very large ones, because a reviewer can find a five-minute slot to read a small CL between meetings, but a 1,500-line change needs a contiguous block that almost never appears on a busy calendar. The big PR doesn’t just review worse — it queues longer, because its service time exceeds the gaps in everyone’s day.

This is why “review faster” is almost always the wrong instruction. You can’t will a free 90-minute block into existence, and pressuring reviewers to make one means they rush and skim — trading the latency win for a detection loss. The variable you actually control is the service time: a 200-line PR fits the slots that already exist. Shrinking PR size attacks both halves of turnaround — shorter queue and shorter review — without asking anyone to work faster or skip steps.

▸Why this works

The deep point is that PR size couples two things people usually treat as a tradeoff. “Thorough but slow” versus “fast but shallow” feels like a dial you have to position. It isn’t — for PR size, both move the same way. A smaller diff is reviewed faster (it fits the gaps in a calendar) and more thoroughly (it fits working memory). The only real cost is the author’s discipline to decompose the work, which is exactly the skill trunk-based development demands too. Quality and flow are the same lever here, not opposite ends of one.

Decompose the batch: stacking, not big-banging

The objection is real: some changes are genuinely large. A migration plus the code that uses it, a refactor that touches forty call sites — you can’t always ship 200 lines. The answer is to decompose the batch, not to give up and merge a monolith. Split along seams that each stand alone and improve code health independently: land the pure refactor first (behavior-preserving, easy to verify), then the new feature on top of the clean base, then the wiring. Stacked diffs make this practical — a chain of small, dependent PRs each reviewed in its own readable window, so the reviewer never faces more than a few hundred lines at once even though the whole change is large.

This is the same batch-size principle from lean that drives trunk-based development: small batches surface problems early and cheaply; large batches hide them until they detonate together. A 1,600-line PR is one enormous batch that can’t be partially approved — it’s all-or-nothing, and “all” means rubber-stamped. Four 400-line PRs are four small batches, each of which gets real attention, fast pickup, and the option to be rejected or fixed in isolation. The author pays a little decomposition cost up front; the team avoids the much larger cost of an unreadable diff shipping an unreviewed flaw.

Pick the best fit

Your team's median PR pickup time is creeping toward a day, and post-merge defects are rising. The diffs are getting larger. What's the highest-leverage fix?

Quiz

Why does a 1,600-line PR typically get LESS effective review than a 160-line one?

Quiz

Per Google's data, why do small changes get picked up in under an hour while large ones wait ~5 hours?

Order the steps

Order how a too-large PR turns into a rubber stamp:

1 Author batches a feature + refactor + migration into one 1,600-line PR
2 No reviewer has a free 90-min block, so it queues for two days
3 Reviewer finally opens it; the diff exceeds working memory
4 They comment on locally visible trivia (names, a log line) and approve
5 The real design flaw ships unreviewed and surfaces weeks later

Small PRs fit calendar gaps and working memory — fast pickup and high detection move together. Large PRs exceed both limits: they queue for days and get rubber-stamped.

Recall before you leave

01
A teammate argues 'a bigger PR gives the reviewer more context, so it should catch more bugs.' Explain why the data says the opposite.
02
Why is 'review faster' usually the wrong instruction, and what variable should you change instead?

Recap

PR size is the master variable of code review because it sets latency and detection at the same time, and in the same direction. Detection peaks at 200–400 LOC reviewed over 60–90 minutes — a 70–90% yield — and falls off a cliff past ~400 lines and past ~450–500 LOC/hour, because a reviewer’s attention is a fixed cognitive budget that doesn’t scale with diff size; a 1,600-line PR therefore gets less real scrutiny than a 160-line one, and the rubber stamp (nits plus LGTM) is the predictable result. Latency tells the same story from the queue’s side: pickup dominates turnaround and scales with size, so Google sees small CLs picked up in under an hour and very large ones waiting around five, because a small diff fits a five-minute gap while a big one needs a block nobody has. That’s why “review faster” and SLAs backfire — they pressure the symptom and force skimming. The real lever is service time: keep PRs small, and decompose genuinely large work into stacked diffs so even a big change is reviewed as a sequence of readable windows. Small PRs cut latency and raise quality together; the only cost is the author’s discipline to decompose. Now when you see a PR ballooning past 400 lines, that’s the moment to stop and split — not after the rubber stamp ships the flaw.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 6 done

Connected lessons

builds on

Code review: what humans catch, what tooling should, and why latency is the real costjunior

unlocks

Giving and receiving review without the frictionmiddle

deepens into

Giving and receiving review without the frictionmiddle

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.