awesome-everything RU
↑ Back to the climb

Engineering Practice

The integration-testing dilemma

Crux Spinning every service up together works for two services and collapses for twenty: e2e is slow, flaky on any hop, and grows ~N² so teams mute it and catch nothing. The test pyramid breaks at the service boundary, where unit tests mock the very thing that breaks.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at junior altitude — the surface
◷ 15 min

At 02:14 the orders service starts throwing 500s. A provider team renamed a field from total_cents to amount_cents and shipped it — a clean, well-tested change, green on their own suite. Nothing in their CI knew that three downstream consumers read total_cents. The team had a “full integration environment,” but it had been red for nine days and everyone had stopped looking at it. The first real signal anyone got was a pager, in production, on a Friday. The fix took ten minutes. Finding out it had broken cost an outage — because the only test that could have caught it was the one nobody trusted anymore.

The intuitive answer doesn’t scale

Ask “will these two services still talk to each other?” and the obvious answer is: run them both, send a real request, check the response. That instinct is correct — for two services. The trouble is what happens as the system grows. With N services that call each other, the number of interaction pairs you’d want to exercise grows roughly with , and an end-to-end test of any one journey needs every hop in that journey to be healthy at the same moment. A request that fans out through gateway → orders → pricing → inventory → payments only passes its e2e test when all five services, plus their databases and queues, are simultaneously up, migrated, and seeded with the right data.

That “all healthy at once” requirement is the quiet killer. Each service has, say, a 98% chance of being green in the shared environment at any moment. String five together and the journey is green only ~90% of the time; string twelve together and you’re under 80%. The environment isn’t broken because of your change — it’s broken because someone else’s migration is half-applied, or a seed script failed, or a dependency is mid-deploy. You inherit all of that noise on every run.

Slow, flaky, and therefore ignored

Those two forces — combinatorial setup and shared fragility — produce the three symptoms every team on a big e2e suite recognizes. Slow: booting a half-dozen services, a database, and a message broker before a single assertion runs pushes suites to 40+ minutes, so they move out of the inner loop and into a nightly job nobody watches. Flaky: because any hop can time out, a large fraction of failures — often cited around 1-in-5 runs — have nothing to do with the change under test. Ignored: a test that cries wolf four times for every real bug gets muted, retried-until-green, or quarantined. A muted test catches nothing, which is strictly worse than no test, because it still costs you the runtime and gives false comfort.

This is why mature platforms invert the classic advice. Netflix and Spotify famously reshaped the test “pyramid” into a honeycomb or diamond at the service layer: a thin cap of true end-to-end journeys (the guidance is roughly 5-10% of total tests), with the bulk of cross-service confidence pushed down into something faster and more isolated. The question this whole unit answers is: what is that faster, more isolated thing?

Test layerWhat it bootsSpeed / determinismCatches the 02:14 rename?
Unit testNothing — the boundary is mockedMilliseconds, deterministicNo — the mock returns the old shape
End-to-endAll services + DB + queue togetherMinutes, flaky on any hopYes — if the env is green, which it isn’t
The missing layerOne side, against a recorded agreementSeconds, deterministicYes — and at the author’s desk

The boundary is exactly where the pyramid has a hole

Here is the subtle part. The classic test pyramid says “lots of unit tests, fewer integration tests, very few e2e.” Inside a monolith that works, because a unit test exercises real in-process calls between modules. Across a network boundary it springs a leak: a unit test of the orders consumer mocks the pricing provider. The mock returns whatever shape the consumer’s author believed pricing returns. The day pricing renames a field, the consumer’s unit tests stay green — they’re asserting against the author’s stale belief, not against reality. The thing most likely to break (the wire contract between services) is the one thing unit tests deliberately stub out.

So you’re squeezed from both sides. Unit tests are fast and reliable but blind to the boundary by construction. End-to-end tests can see the boundary but are too slow and too flaky to gate every deploy. The renamed-field outage falls straight through the gap. What’s needed is a test that checks the boundary specifically — the shape and semantics of the requests and responses two services exchange — without booting both services together. That is the shape of the problem the rest of this unit solves.

Why this works

“Just keep the integration environment green” sounds like a discipline problem, but it’s a structural one. A shared environment’s uptime is the product of every service’s individual uptime, so it degrades multiplicatively as you add services — and its health is owned by everyone, which means it’s owned by no one. Telling teams to try harder doesn’t change the math. The only durable fix is to stop requiring all services to be simultaneously healthy to learn whether two of them agree.

Pick the best fit

A platform has 18 services with dense HTTP dependencies. The e2e suite takes 45 minutes and fails ~20% of runs for reasons unrelated to the change. The team wants reliable cross-service compatibility feedback. What's the most sound direction?

Quiz

Why does an end-to-end suite's reliability degrade as you add more services to a journey?

Quiz

Why do unit tests fail to catch a provider renaming a field that a consumer reads?

Order the steps

Order how an e2e-only strategy decays into the 02:14 outage:

  1. 1 Two services integrate; a shared e2e environment is stood up to test them
  2. 2 More services join; the journey needs all of them healthy at once
  3. 3 Suite slows past 40 minutes and flakes ~1-in-5 for unrelated reasons
  4. 4 The team mutes or stops watching the env; it sits red for days
  5. 5 A provider renames a field; the only test that would catch it is muted; prod pages at 02:14
Recall before you leave
  1. 01
    A colleague says 'our integration environment just needs more discipline to stay green.' Explain why that's a structural problem, not a discipline one.
  2. 02
    Where exactly does the test pyramid 'break' at a service boundary, and why does that let a field rename reach production?
Recap

The instinct to test integration by running every service together is right for two services and wrong for twenty. End-to-end suites grow combinatorially: with N interacting services the pairs to exercise scale with N², and any one journey passes only when every hop is healthy at the same instant, so a shared environment’s reliability is the product of its parts and degrades multiplicatively as you add services. The result is the familiar trio — slow (40+ minute boots), flaky (~1-in-5 failures unrelated to the change), and therefore muted, and a muted test catches nothing while still costing runtime. The classic pyramid doesn’t save you, because at a network boundary unit tests mock the provider and assert against the author’s stale belief about its shape, leaving them green the day the provider renames a field. So you’re squeezed between a fast-but-blind layer and a sees-it-but-untrusted layer, and a cross-service rename falls through the gap into a 2 a.m. pager. What’s needed is a layer that checks the boundary itself — the shape and semantics two services exchange — without requiring both to be booted together. That layer is contract testing, and building it up is what the rest of this unit does.

Connected lessons
Continue the climb ↑Consumer-driven contracts: the consumer states the truth
shortcuts expand
search
K
prev piece
k
next piece
j
cycle tier
t
this menu
?
sources4
expand
  1. 01
  2. 02
  3. 03
  4. 04

Trademarks belong to their respective owners. Editorial reference only.