Engineering Practice ENG · 04 · 05

Flag debt and rollout discipline

Trunk-based trades branch debt for flag debt: a release flag left on after rollout is a long-lived branch hiding in an if, and N stale flags make 2^N untested configs. The discipline that makes it pay off is cleanup — delete the branch on merge, the flag on rollout.

ENG Senior ◷ 17 min

Level

FoundationsJuniorMiddleSenior

A team that adopted trunk-based two years ago opens their flag dashboard: 400 flags, and nobody can say which still matter. A “new checkout” flag has been at 100% for a year — but the old code path is still there behind the else, untested, occasionally hit by a config typo. An incident traces to two flags whose combination nobody ever tried. They escaped merge hell and walked straight into a different prison: the flags they never deleted. Trunk-based didn’t remove the debt. It changed its shape.

The debt didn’t vanish — it changed shape

The previous lessons made flags the hero: they decouple deploy from release and make daily integration possible. The senior caveat is that every flag is also a branch in your runtime. A long-lived git branch splits your codebase into divergent histories; a long-lived flag splits your running system into divergent execution paths. You moved the fork from version control into an if statement, which is harder to see and harder to delete, because it’s live in production and someone might be depending on it.

So a release flag that outlives its feature is not neutral leftover — it is a long-lived branch in disguise, with the same drift problem reappearing as untested code paths. The old path behind the else rots: nobody runs it, nobody updates it, but it’s still reachable. Trunk-based only delivers its benefit if you remove flags as aggressively as you remove branches: delete the branch on merge, delete the flag on full rollout. Cleanup isn’t hygiene you do when you have time; it’s the other half of the trade that makes the whole practice net-positive.

Why stale flags explode combinatorially

You might think ten forgotten flags is merely ten times the problem of one. It’s far worse than that. The cost of flag debt isn’t linear in the number of flags — it’s exponential. Each independent boolean flag doubles the number of possible runtime configurations of your system. With 3 flags you have 2³ = 8 configurations; with 10 flags, 1,024; with 20, over a million. You test a handful of those paths and ship; the rest are live but unverified. Most incidents from flag debt are interaction bugs: flag A is fine, flag B is fine, but the state where both are on (or one on and one half-ramped) was never exercised and does something wrong. This is the same 2^N untested-configuration problem that stale branches would cause if you kept them all alive — it just hides inside conditionals instead of in git branch -a.

This is why the lifetime taxonomy from lesson 3 is load-bearing. Release flags must be short-lived precisely because they’re the ones that silently become permanent. The mitigation is lifecycle discipline: give each release flag an owner and an expiry, track flag age the way you track branch age, fail the build or alert when a release flag outlives its expected life, and make “remove the flag and the dead path” a required closing step of every rollout — not an optional follow-up ticket that never gets prioritized.

Active release flags	Possible runtime configs (2^N)	Realistically tested	Interaction-bug surface
3	8	A few	Small, manageable
10	1,024	Still a few	Most configs never run
20	> 1,000,000	A vanishing fraction	Interaction bugs are inevitable

Read those rows together: at 3 flags you can still reason about the system; at 20 you cannot — and most codebases that ignore cleanup land somewhere between 10 and 400. Without giving every release flag an expiry and owner from day one, the table keeps shifting right while nobody notices.

Progressive delivery, and an honest read of the data

Done right, the flag lifecycle is also a delivery technique: progressive delivery is the discipline of releasing through controlled, measured stages — internal → 1% → canary cohort → 100% — with automated guardrails that halt or roll back the ramp if error rate, latency, or business metrics regress. The flag is the actuator; the metrics are the controller. This is the operational maturity trunk-based is building toward: release as a closed feedback loop, not a one-way push.

It’s also worth being precise about what the famous numbers prove. DORA’s research consistently ranks trunk-based development among the strongest correlates of elite delivery — the 2024 report (39,000+ respondents) found only ~19% of teams reach elite, and elite performers deploy on the order of 182× more frequently with 127× faster lead times and far lower change-failure rates, and are markedly more likely to practice trunk-based. But correlation runs through the prerequisites: trunk-based only delivers when the fast green gate, the flags, and the cleanup discipline are all present. Adopt the branching model alone and you get the failure modes from these five lessons — merge-free drift if branches linger, a red trunk if the gate is missing, or a combinatorial flag swamp if cleanup is skipped. The lever isn’t the branch model; it’s the whole disciplined system around it.

▸Why this works

The clean symmetry of this unit: a long-lived branch and a stale flag are the same bug in two locations. Both are forks that should have been temporary, both accumulate untested divergence, and both impose a 2^N cost if you let them pile up — branches in your history, flags in your runtime. Trunk-based development is, at root, a single discipline applied to both: keep forks short-lived and delete them the moment they’ve served their purpose. Get that discipline and the DORA numbers follow; skip it and you’ve just relocated the mess.

Pick the best fit

A feature has been at 100% rollout and stable for two weeks. The release flag is still in the code. What should happen?

Quiz

Why is a release flag left on after full rollout described as 'a long-lived branch hiding in an if'?

Quiz

Why is flag debt described as exponential rather than linear in the number of flags?

Order the steps

Order a disciplined release flag's full lifecycle:

1 Create the release flag with an owner and an expiry date
2 Ship the work to trunk dark behind the off flag, integrating daily
3 Progressively ramp 1% → 100% with metric guardrails that can auto-halt
4 Hold at stable 100% briefly to confirm no regression
5 Delete the flag and the dead old path as the closing step of the rollout

Skipping the final step leaves the old path rotting in an else — a long-lived branch hiding in an if.

Recall before you leave

01
Explain the claim that trunk-based 'trades branch debt for flag debt' and why that trade is only worth it with cleanup discipline.
02
What does DORA's data actually prove about trunk-based, and what's the honest causal story?

Recap

Trunk-based development doesn’t eliminate debt — it changes its shape from branch debt to flag debt, because every flag is a branch in your runtime: a release flag left on after full rollout is a long-lived branch hiding in an if, with the old path rotting untested but still reachable. The cost is exponential, since each independent boolean flag doubles the runtime configurations — 20 flags is over a million states, of which you test a handful — so interaction bugs (A fine, B fine, both-on never tried) become inevitable, the same 2^N untested-configuration problem stale branches would cause, relocated into conditionals. The discipline that makes the whole practice net-positive is cleanup run as the closing step of the rollout: give release flags owners and expiries, track flag age like branch age, and delete the flag and dead path the moment a feature is stable at 100% — ideally as the end of a progressive-delivery ramp where metrics guard each step and can auto-halt. And the DORA numbers — ~19% elite, 182× deployment frequency, 127× faster lead times — correlate with trunk-based only when the fast green gate, the flags, and the cleanup discipline are all present; the lever was never the branching model alone, but the disciplined system around it that keeps every fork, in history or in runtime, short-lived. Now when you see a release flag still in the code two weeks after a feature reached 100%, you know it isn’t harmless leftover — it’s a long-lived branch hiding in an if, and removing it is the step that completes the rollout.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 5 done

Connected lessons

builds on

The green-trunk gate is the whole bargainsenior

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.