Engineering Practice ENG · 09 · 01

Putting it together: practices are one feedback loop, not a checklist

TDD, contract tests, review, trunk-based dev, flags, on-call, and postmortems are not independent rituals — they form one loop where each enables the next. Adopt one without its dependencies and it backfires. Optimize the whole loop with DORA, not one ceremony.

ENG Junior ◷ 17 min

Level

FoundationsJuniorMiddleSenior

A team reads the DORA report, picks the headline practice — “elite teams deploy on every commit to trunk” — and mandates it on Monday. By Wednesday the trunk is red half the day: there is no fast test suite to gate merges, so every push lands broken work on top of broken work. The “fix” is a Friday afternoon hotfix behind a feature flag. Six months later there are 340 flags, none cleaned up, and a stale one fires in prod during Black Friday. They adopted the ceremony and skipped everything that made it safe.

The practices form a loop, not a list

The earlier lessons each looked like a standalone discipline: TDD and property tests, contract testing, code review, trunk-based development, feature flags, on-call, blameless postmortems. The senior insight is that they are not seven independent boxes to tick. They are one closed feedback system, and the order matters because each link feeds the next.

Tests and contract checks produce a CI signal fast and trustworthy enough that merging to trunk many times a day is safe. Safe frequent merging is trunk-based development. Trunk-based development plus feature flags lets you deploy unfinished or risky code dark, decoupling deploy from release. Frequent small deploys mean that whatever still slips through is a tiny blast radius, caught by on-call. On-call feeds the incident into a blameless postmortem, whose action items become new tests, new contract assertions, or a new flag default — closing the loop back at the start. Pull one link out and the chain does not just weaken; the downstream practice starts actively hurting you.

The seam failures: one practice without its dependencies

Ask yourself: if you had to pick one practice to skip, which dependency would you be quietly removing from everything downstream? The failures are not subtle, and they are predictable from the dependency graph. Each practice has a precondition; skip the precondition and the practice inverts from asset to liability.

Practice adopted	Required dependency	Failure if skipped
Trunk-based development	Fast, trustworthy CI (tests + contracts)	Broken trunk: everyone integrates on top of red, builds stall daily
Feature flags	A cleanup/lifecycle discipline (TTL)	Flag debt: hundreds of dead toggles, combinatorial test matrix, a stale flag fires in prod
Frequent deploys	On-call + fast rollback/flag-kill	High deploy frequency with no recovery path → high MTTR, every deploy a gamble
Postmortems	Owned, tracked action items	Theater: a doc nobody acts on, the same outage recurs next quarter
Code review	Small batches (from trunk-based + CI)	2000-line PRs get a rubber-stamp `LGTM`; review catches nothing

Together, those five rows mean a single broken dependency propagates invisibly through every practice that depends on it. Flag debt is the most quietly expensive. A flag is supposed to be short-lived: Unleash models a lifecycle of initial → pre-live → live → completed → archived, where completed means “rollout done, code removal pending.” Skip the removal step and toggles accumulate. Every live flag doubles a path through the code, so N flags imply up to 2^N reachable states — a test matrix you can never fully cover. The classic outage is a “release toggle” treated as a permanent “permission toggle”: it never gets removed, sits forgotten for a year, and a config change flips it during peak traffic.

Optimize the loop, not the ceremony

The trap is local optimization: a team proudly reports “we deploy 40 times a day” while change-fail rate quietly climbs to 30%. DORA exists to stop exactly this — it measures the whole loop with four metrics that must move together, not one in isolation:

Lead time for changes — commit to running in production.
Deployment frequency — how often you ship to prod.
Change failure rate — share of deploys that cause a degraded service or need remediation.
MTTR / failed-deployment recovery time — how fast you restore service after a bad change.

When you see a team celebrating one of these numbers while ignoring the others, you are watching a local optimization in progress — and the 2024 DORA data shows exactly where that leads. The 2024 DORA report’s spread between elite and low performers is the argument: elites deploy roughly 182x more often, with lead times about 127x faster, recover from incidents about 2,293x faster, and have change-fail rates about 8x lower. Notice these are not in tension — speed and stability rise together, because the loop that makes you fast (small batches, strong CI, flags, fast rollback) is the same loop that makes you safe. The 2024 report even saw a “high” cluster whose CFR was worse than a slower-but-careful “medium” cluster — proof that chasing deploy frequency alone, with no stability metric to govern it, regresses the system.

▸Why this works

Speed and stability being a tradeoff is the most stubborn myth here. The data says the opposite: the same capabilities that shrink lead time — tiny batches, trustworthy automated tests, deploy/release decoupling via flags — also shrink change-fail rate and MTTR, because small changes are easier to review, easier to reason about, and trivial to roll back. A team that is fast and unstable has not over-optimized speed; it has skipped the dependencies (CI, flags, recovery) that make speed safe.

Reading the loop in an incident

Walk a real incident through the loop and the dependencies become concrete. A bad change ships. Because batches are small (trunk-based + CI), the change is one PR, not fifty, so the on-call engineer bisects to it in minutes, not hours — that is low MTTR. Because the risky path was behind a flag, recovery is flipping the flag off, not a code rollback and redeploy. The blameless postmortem asks not “who pushed it” but “which link let it through”: Was the CI gap a missing contract test? Then the action item is a contract assertion. Was it an untested flag combination? Then the action item is a flag-cleanup policy plus a default. Each answer feeds a specific upstream practice. A postmortem with no owned action item is the one seam that silently breaks the whole loop, because it is the link that turns an incident into a permanent improvement.

Pick the best fit

Leadership wants 'elite DORA numbers in one quarter.' A team currently deploys weekly with a slow flaky test suite and no flags. What's the senior sequencing?

Quiz

A team adopts trunk-based development but still has a 25-minute, frequently-flaky test suite. What is the predictable outcome?

Quiz

Your team deploys 40x/day and brags about it, but change-fail rate is 28% and MTTR is creeping up. What does DORA say you've done wrong?

Order the steps

Order the feedback loop so each link enables the next:

1 Tests + contract checks give a fast, trustworthy CI signal
2 Trustworthy CI makes merging to trunk many times a day safe (trunk-based)
3 Trunk-based + feature flags decouple deploy from release; small risky changes ship dark
4 Small frequent deploys mean what slips through has a tiny blast radius, caught by on-call
5 Blameless postmortem turns the incident into action items: new tests, contracts, flag defaults — closing the loop

Each link is only safe once the previous link exists. Skip CI → broken trunk. Skip flag cleanup → flag debt fires in prod. Skip postmortem action items → same outage recurs. DORA tracks all four metrics together because they rise and fall as one.

Recall before you leave

01
Explain why mandating trunk-based development before fixing CI predictably backfires, and what the correct sequencing is.
02
Why does DORA insist on four metrics moving together, and what goes wrong when a team optimizes deploy frequency alone?

Recap

The seven practices of this track are not a checklist of independent rituals — they are one closed feedback loop, and the order is load-bearing. Tests and contract checks produce a CI signal trustworthy enough that merging to trunk many times a day is safe; trunk-based development plus feature flags decouples deploy from release; small frequent deploys keep the blast radius of any escape tiny, where on-call catches it; and a blameless postmortem with owned action items turns that incident into new tests, contracts, and flag defaults, closing the loop. Pull a dependency out and the downstream practice inverts: trunk-based without CI gives a broken trunk, flags without cleanup give flag debt and a stale toggle firing in prod, frequent deploys without a recovery path give high MTTR, postmortems without action items give repeat outages. The capstone discipline is to stop optimizing single ceremonies and instead optimize the whole loop, governed by the four DORA metrics together — because the data is unambiguous that speed and stability rise as one when the loop is intact, and regress together when a link is missing. Now when you hear a team brag about one DORA number while another quietly regresses, you will know which link they skipped — and what to ask first.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 5 done

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.