Engineering Practice ENG · 01 · 04

When TDD pays off and when it actively hurts

TDD''''s return scales with how stable the spec is and how long the code lives. On exploratory spikes and fast-moving UI it taxes you 15–22% to lock in a design you''''re about to throw away. The senior move is spike-then-rebuild, not test-induced design damage.

ENG Senior ◷ 16 min

Level

FoundationsJuniorMiddleSenior

A team is asked to integrate a new recommendation vendor in two days to decide whether it’s worth a contract. A senior insists on full TDD. They spend a day writing tests against an API whose response shape they’re still guessing, mocking endpoints they don’t yet understand. On day two the vendor’s real responses don’t match the mocked ones at all — paginated, differently nested, rate-limited in ways the docs omitted. Every test is rewritten or deleted. The “discipline” cost them a day of pinning down a design they had no business committing to, on code whose entire purpose was to be thrown away after the decision. TDD didn’t fail; it was applied to the one situation it punishes.

The cost is real and front-loaded

Be honest about the price first. Controlled measurements put test-first at roughly 15–22% more upfront time than writing the same feature test-after, and the gap between studies is itself the lesson: academic experiments on well-specified tasks tend to show TDD gains, while several industrial reports on real, messy work show productivity losses. The variable that explains the split is how well you knew the answer before you started. TDD front-loads the cost of pinning down behavior. When the behavior is knowable and the code will live for years, that front-loading is an investment — design feedback now, a regression net forever. When the behavior is unknown and the code is short-lived, the same front-loading is pure waste: you pay to specify something you’re about to learn is wrong.

So the return on TDD is governed by two variables, not one. Spec stability: can you state the correct behavior before writing the code? Code lifetime: will this code be read and changed for months, or deleted next week? High on both — a billing engine, an auth flow, a long-lived domain core — and TDD’s return is at its peak. Low on both — a spike, a one-off migration script, a UI you’ll redesign twice before launch — and the discipline charges you for certainty you don’t have about code that won’t survive.

Code situation	Spec stable?	Long-lived?	TDD verdict
Billing / auth / domain core	Yes	Yes	Full TDD — peak payoff
Exploratory spike / vendor probe	No	No (delete after)	No tests — spike, then rebuild
Fast-churning UI / layout	No (design moving)	Maybe	Test logic, not pixels; defer UI tests
One-off migration script	Maybe	No	Test the dangerous parts only

Spike, then rebuild — don’t TDD the unknown

The right tool for unknown territory is the spike: a quick, throwaway program written with no tests whose only job is to answer a question — does this vendor work, is this algorithm fast enough, what shape is this data. You optimize for learning speed, not correctness, and you throw it away. Then, once the spike has converted unknowns into knowns, you rebuild the keeper version with TDD, now that you can actually state the spec. Trying to TDD the spike inverts the order: you specify before you understand, and you pay the 15–22% tax to lock in guesses you’re about to discard. The senior failure mode here is misplaced rigor — applying the discipline hardest exactly where its precondition (a knowable spec) is absent.

UI is the other classic trap, and it’s worth being precise rather than dogmatic. The problem isn’t “UI can’t be tested” — it’s that fast-moving presentation changes daily while logic is stable. Test the logic (the reducer, the formatter, the validation, the state machine) hard; defer or thin out tests that assert exact markup and pixels while the design churns, because those break on every redesign and protect nothing durable. Pin the UI down with tests once it stabilizes, not while it’s still being argued over in Figma.

Test-induced design damage is a real failure, not an excuse

DHH’s 2014 argument — “test-induced design damage” — names a genuine failure mode you have to take seriously even if you practice TDD. It’s when the drive to make code unit-testable pushes you to add indirection that serves the test, not the user: extracting a layer, introducing an interface and a mock, splitting a cohesive thing into pieces solely so a test can isolate it. The result is more abstraction, more files, more ceremony, and a design that’s worse for everyone except the test. The honest position from the “Is TDD Dead?” conversations is the synthesis: TDD’s design pressure is valuable when it surfaces a real coupling problem (as in lesson 01’s god object), and damaging when you contort the design to satisfy a testability rule for its own sake. The skill is telling those apart — listen to the test when it reveals a design smell, ignore it when it’s demanding ceremony.

▸Why this works

The deep reason “always TDD” and “TDD is dead” are both wrong is that TDD is a bet on certainty, and certainty isn’t uniform across your codebase. Where you know the spec and the code will live for years, paying upfront to nail behavior and leaving a regression net is a great trade. Where you don’t know the spec, paying upfront to specify guesses is negative-value, and the spike exists precisely to buy the certainty TDD assumes you already have. The senior skill isn’t a fixed answer; it’s reading which regime you’re in — stable-and-durable versus unknown-and-disposable — and switching tools at the boundary instead of applying one ritual to both.

Pick the best fit

You have two days to evaluate an unfamiliar vendor API and decide whether to sign. How do you approach the code?

Quiz

Which two variables most determine whether TDD pays off on a given piece of code?

Quiz

What exactly is 'test-induced design damage,' and how is it different from TDD's legitimate design pressure?

Order the steps

Order the senior spike-then-rebuild workflow for unknown territory:

1 Recognize the spec is unknown and the code is disposable
2 Write a quick spike with no tests, optimizing for learning speed
3 Use the spike to convert unknowns into a knowable spec
4 Throw the spike away
5 Rebuild the keeper with full TDD now that the spec is stated

Two regimes: stable spec + long-lived code → full TDD for peak payoff. Unknown spec + throwaway code → spike first (no tests, optimize for learning), discard it, then rebuild the keeper with TDD once the spec is knowable.

Recall before you leave

01
A senior insists on full TDD for a two-day vendor evaluation. Why is that the wrong call, and what's the right one?
02
Reconcile 'TDD's design pressure is valuable' (lesson 01) with DHH's 'test-induced design damage.' How do you tell them apart?

Recap

TDD is a bet on certainty, and certainty isn’t uniform across a codebase, which is why both “always TDD” and “TDD is dead” are wrong. The price is real and front-loaded — roughly 15–22% more upfront time than test-after — and the split between academic gains and industrial losses is explained by one thing: how well you knew the answer before starting. Two variables govern the return: spec stability (can you state correct behavior up front) and code lifetime (months versus deleted next week). High on both — billing, auth, a long-lived domain core — is peak payoff: design feedback now, a regression net forever. Low on both — an exploratory spike, a vendor probe, a one-off script — and you pay to specify guesses about disposable code, so the right tool is the spike: a quick, test-free, throwaway program that converts unknowns into a knowable spec, after which you rebuild the keeper with TDD. UI is the same logic: test the stable logic hard, defer pixel-and-markup tests while the design churns. And DHH’s test-induced design damage is a genuine failure mode to hold alongside lesson 01’s design pressure — indirection that serves the test rather than the user is damage, while a test that surfaces a real coupling smell is pressure worth heeding; the senior skill is telling the two apart at the boundary. Now when you start a new task, ask yourself two questions before writing the first test: do I know the spec well enough to state it, and will this code still exist in six months? Your answers decide whether to reach for TDD or for a throwaway spike first.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 5 done

Connected lessons

builds on

Property-based testing: invariants over examples, with shrinkingmiddle

unlocks

Mutation testing: the honest metric for test qualitysenior

deepens into

Mutation testing: the honest metric for test qualitysenior

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.