Engineering Practice ENG · 04 · 04

The green-trunk gate is the whole bargain

A shared trunk means one break halts everyone, so trunk-based needs a fast, blocking CI gate. At scale a naive pre-merge gate races itself, so teams adopt merge queues that test against the projected trunk — but one FIFO queue bottlenecks past ~20-30 devs.

ENG Senior ◷ 17 min

Level

FoundationsJuniorMiddleSenior

A 40-engineer team goes trunk-based and within a month declares it a failure: trunk is red half the day, and when it’s red nobody can pull a clean base or merge on green, so the whole team stalls. The diagnosis isn’t “trunk-based doesn’t scale” — it’s that they adopted the risk (everyone on one branch) without the mitigation (a gate that keeps that branch green). Trunk-based without a fast, blocking CI gate isn’t trunk-based; it’s a shared branch with no brakes.

The shared trunk concentrates risk

Trunk-based development’s strength — everyone integrating into one branch — is also its single point of failure. With long-lived branches, one person’s broken branch is one person’s problem. With a shared trunk, a broken trunk is everyone’s problem: nobody can pull a clean base to start work, nobody can merge because their PR builds on red, and a “quick fix” lands on top of breakage that makes the failure harder to diagnose. The practice deliberately trades isolated, deferred pain for shared, immediate risk — and that trade only pays off if the trunk is kept green.

So the gate is not an add-on; it’s the other half of the bargain. The rule is no green, no merge: every change runs the full build and test suite before it lands on trunk, and a red result blocks the merge. This is why trunk-based and continuous integration are the same practice described from two angles — “merge to trunk daily” is only safe because “an automated gate keeps trunk always releasable” holds at the same time. DORA’s data shows the strength of trunk-based runs through CI; adopt the branching model without the gate and you get the red-trunk failure mode instead of the benefit.

The gate must be fast, or people route around it

A gate’s effectiveness decays with its latency. If the build takes 45 minutes, developers start batching changes to amortize the wait — which lengthens branches, which reabsorbs the drift from lesson 1. They cut corners, skip running tests locally, or pressure to merge red “just this once.” A slow gate quietly destroys the very behavior it exists to enforce. The target is minutes: a build fast enough that integrating small and often is the path of least resistance. Speed comes from test parallelization, tiering (fast unit tests block the merge; slow end-to-end suites run post-merge or nightly), caching, and ruthlessly cutting the suite’s slowest offenders.

When trunk does go red despite the gate — a flaky test slipped through, an environment-only failure — the senior reflex is stop the line: fixing or reverting the trunk takes priority over new feature work, because every minute it’s red taxes the whole team. git bisect makes finding the offending commit cheap by binary-searching the history, and a small, frequently-integrated history is exactly what makes bisect fast and the revert clean.

Gate design	When tests run	Can trunk break?	Scaling limit
Post-merge CI only	After landing on trunk	Yes — detected late	Breaks constantly past a few devs
Pre-merge (test the PR branch)	Before merge, vs stale base	Yes — two green PRs can conflict semantically	Races itself at high merge rate
Merge queue (test vs projected trunk)	Against trunk + PRs ahead in queue	No — only green-against-final-state lands	FIFO bottleneck ~20-30 devs; flaky tests stall it

At scale the gate races itself — enter merge queues

A naive pre-merge gate has a subtle hole. You test PR-A against trunk: green. I test PR-B against the same trunk: green. We both merge. But A and B were never tested together — they can conflict semantically and break trunk even though each passed. At a few merges a day this is rare; at dozens it’s constant. The fix is a merge queue: instead of merging directly, PRs enter a queue and CI tests each against the projected state of trunk — trunk plus all PRs ahead of it in line. A PR only lands if it’s green against the exact state it will create. To keep throughput up, queues batch: test several queued PRs together as one combined candidate and merge the whole batch if green, falling back to bisecting the batch if it fails.

This is how large teams keep a single trunk green at high volume — GitHub’s own merge queue scaled the monorepo from about a thousand merges a month in 2016 to over thirty thousand by 2023, with hundreds of engineers merging thousands of PRs a month and roughly a third lower average deploy time. But the queue has limits a senior must plan for. A single FIFO queue becomes a bottleneck past roughly 20-30 active developers: unrelated changes compete for one lane, and slow CI stalls everything behind it. Worse, a single flaky test is catastrophic in a queue — at 50+ PRs a day it can stall the line for hours, because a nondeterministic failure rejects batches that were actually fine. This is why flake quarantine and queue parallelism (independent lanes for independent code) become first-order infrastructure concerns at scale, not nice-to-haves.

▸Why this works

Notice the gate’s quality requirement is symmetrical with the flag discipline of the last lesson. A flaky test in a merge queue is the gate’s version of a stale flag: a piece of the system that’s supposed to give a clear signal but instead emits noise, and the noise compounds at scale until it halts everyone. Trunk-based at scale is less about the branching model and more about keeping every signal in the system — the gate, the queue, the flags — trustworthy. A signal you can’t trust is worse than no signal, because people route around it.

Pick the best fit

At 50 PRs/day on one trunk, trunk breaks a few times a week even though every PR was green before merge. What's the right fix?

Quiz

Why is trunk-based development inseparable from a fast, blocking CI gate?

Quiz

Two PRs each pass CI against the current trunk, then both merge — and trunk breaks. What happened, and what prevents it?

Order the steps

Order how a merge queue lands a change safely at scale:

1 PR passes its own CI and is approved
2 PR enters the merge queue rather than merging directly
3 CI tests it (often batched) against trunk + the PRs ahead in line
4 Green against the projected final state → the batch merges to trunk
5 On failure, bisect the batch to eject the offending PR and re-run

No green, no merge. Red blocks trunk access until fixed or reverted; the gate is inseparable from the shared branch.

Recall before you leave

01
Why does a team that 'went trunk-based' end up with a permanently red trunk, and what's the missing piece?
02
Explain why a naive pre-merge gate races itself at scale and how merge queues fix it, including the limits of queues.

Recap

The shared trunk is trunk-based development’s strength and its single point of failure: when everyone integrates into one branch, a broken trunk blocks the whole team from a clean base and from merging, so the practice is inseparable from a fast, blocking CI gate — no green, no merge — that keeps trunk always releasable. The gate must run in minutes, because a slow gate pushes developers to batch changes and reabsorb the drift trunk-based exists to remove, and when trunk does go red the senior reflex is to stop the line and fix or revert before new work, using git bisect on the small, frequent history to find the culprit fast. At scale a naive pre-merge gate races itself — two PRs each green against the same base can still break trunk when combined — so merge queues test each PR against the projected trunk and batch for throughput, the technique that let GitHub keep one monorepo green from a thousand to thirty thousand merges a month. But queues are not free: a single FIFO queue bottlenecks past about 20-30 developers, and one flaky test can stall it for hours, making flake control and parallel lanes core infrastructure. The through-line: trunk-based at scale is the discipline of keeping every signal — gate, queue, flags — trustworthy, because a signal people can’t trust gets routed around. Now when you see trunk going red frequently despite a pre-merge gate, you know what to investigate first: are two PRs racing each other against the same stale base?

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 5 done

Connected lessons

builds on

Feature flags decouple deploy from releasemiddle

unlocks

Flag debt and rollout disciplinesenior

deepens into

Flag debt and rollout disciplinesenior

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.