Engineering Practice ENG · 02 · 05

Evolving contracts safely and the limits of contract testing

Break a provider safely with expand-then-contract, let pending and WIP pacts absorb new expectations without failing the build, and know the hard limits: contracts test shape not business logic, need known consumers, and never replace a few real e2e smoke tests.

ENG Senior ◷ 17 min

Level

FoundationsJuniorMiddleSenior

The pricing team needs to rename amount_cents to price_minor_units — a name three downstream consumers read. Sam’s instinct is to do it in one PR: rename the field, update the schema, ship. The provider’s own contract tests stay green, because the broker only has the current consumer pacts and they all still expect amount_cents… wait, no — they expect the old name, so verification against the renamed provider fails for all three. Good, the gate caught it. But now Sam is stuck: he can’t deploy without breaking three teams, and he can’t make three teams change in lockstep with him. The contract gate told him the change is breaking; it did not tell him how to ship it anyway. The whole quarter, contract tests have quietly replaced a dozen flaky cross-service e2e suites — and that success is exactly why nobody noticed the one bug they never caught: last month pricing returned a structurally-perfect amount_cents: -500 for a refund, every contract passed, and checkout charged a negative balance. The shape was right. The number was wrong. No contract test on earth asserts that.

Expand-then-contract: how to break a provider without breaking consumers

A consumer-driven contract gate has a built-in asymmetry that feels like a trap the first time you hit it: it will block a breaking provider change, but blocking is not the same as enabling. You still have to ship the rename. The technique is expand-then-contract (also called parallel change or additive change), and it turns one breaking change into three non-breaking deploys, each of which keeps every contract green.

Expand. Deploy a provider that supports both the old and the new shape simultaneously. For Sam’s rename, that means returning amount_cents and price_minor_units with the same value. Every existing pact still verifies — consumers read amount_cents and it’s still there — so can-i-deploy says yes and the provider ships with zero consumer changes.
Migrate. One at a time, on their own schedule, each consumer updates its test to read price_minor_units, which regenerates its pact to expect the new field. The provider already returns it, so each migrated consumer’s pact verifies immediately. There is no lockstep: three teams migrate across three sprints if they want.
Contract. Once can-i-deploy confirms no consumer pact in production still references amount_cents — the broker knows this from the deployment matrix — the provider deploys a version that drops the old field. Nothing depends on it anymore, so the removal is non-breaking.

The discipline is that you never delete in the same step you add. The window where both shapes exist is the price of zero-downtime evolution; the broker’s can-i-deploy is what tells you when that window can safely close. Skip the expand step and you are back to coordinating a flag-day rename across three teams — the exact pain contracts exist to eliminate.

Pending and WIP pacts: don’t let a not-yet-built feature redden the provider

Expand-then-contract handles provider-led change. The mirror problem is consumer-led change: a consumer adds a new expectation (a new endpoint, a new field) before the provider has implemented it. The consumer publishes the new pact, the provider’s next verification run replays it, the provider doesn’t satisfy it yet — and the provider’s main build goes red over a feature the provider team hasn’t even started. That is backwards: a consumer’s work-in-progress should not break an unrelated provider’s pipeline.

Pending pacts fix this. When a consumer publishes contract content the provider has never successfully verified, the broker flags it as pending. The provider still runs verification against it and still reports the result back to the consumer — so the consumer gets honest feedback — but a pending failure does not change the provider build’s exit code. The build stays green. The moment that pact is verified successfully once, it transitions out of pending; from then on, a failure is a real regression and will fail the build. Enable it with enablePending: true on the provider’s verification config.

WIP pacts build on top of pending. Pending still requires the provider to be told which pacts to pull (by tag or branch). WIP pacts (includeWipPactsSince with a date) make the provider automatically pull in any new, not-yet-verified pact applicable to it — no config change per consumer feature. For WIP pacts the pending flag is hardcoded on, so they can never fail the build either. Together they let a consumer push a new expectation on Monday, see real verification feedback, and the provider team picks it up when they’re ready — without a red build pressuring either side into a flag day.

Pact state	What it means	Effect on provider build
Verified (normal)	Provider has satisfied this pact before	Failure fails the build (real regression)
Pending	New/changed content never yet verified, pulled by tag	Verified + reported, but failure does NOT fail the build
WIP	Any new unverified pact, auto-pulled since a date	Pending flag hardcoded on — never fails the build

Bi-directional contract testing: compare two artifacts, run no provider code

Classic consumer-driven verification has a cost most teams underweight: the provider must execute the consumer’s pacts. The provider runs its real service, sets up each providerState, replays every interaction, and asserts the responses. That requires the provider team to wire up state handlers and run consumer tests they don’t own — friction that grows with every consumer.

Bi-directional contract testing (BDCT) decouples the two sides. The provider publishes the artifact it already produces — its OpenAPI spec — to the broker, with no provider-side test execution at all. Each consumer publishes its pact as usual. The broker then statically compares the consumer’s pact against the provider’s OpenAPI: does the spec contain every endpoint, field, and type the pact expects? Compatibility is a document-vs-document check, not a code run. That is the appeal — the provider adds no test code, just publishes a spec it likely maintains anyway, and the broker tells both sides whether they fit.

The trade is trust placed in the spec. CDC verifies the consumer’s expectations against the provider’s actual running behavior; BDCT verifies them against the provider’s declared behavior. If the OpenAPI spec lies — describes a field the code doesn’t return, or omits one it does — BDCT passes a contract that CDC would have failed. BDCT is only as honest as the spec, so mature BDCT setups generate or validate the spec from the provider’s own tests rather than hand-maintaining it. Note also that BDCT is a PactFlow/SmartBear feature, not part of the OSS Pact Broker.

▸Why this works

Why “no provider code execution” is the whole pitch. In a large estate, the expensive part of CDC is provider verification: every provider must stand up, seed N provider states, and replay M consumer interactions on every relevant build. BDCT replaces that with a string-and-type comparison of two static documents in the broker, which is near-instant and needs no test environment. You buy back provider-side CI time and remove the coordination of state handlers — at the cost of trusting the OpenAPI spec to match the deployed code. That is why the senior move is to derive the spec from real provider tests: it restores the ground-truth that classic CDC had for free, while keeping BDCT’s cheap comparison.

The hard limits: shape is not semantics, and known consumers are not all consumers

Contract testing is bounded by what a contract can express, and a senior earns trust by naming those bounds before someone discovers them in production.

It tests shape and interaction, never business logic or semantics. A contract asserts amount_cents is an integer; it cannot assert the integer is the correct price, is non-negative, or matches the order total. The refund that returned -500 had a perfect shape and broke checkout. Contract tests catch structural breaks between services; they are blind to wrong values, wrong calculations, corrupted data, and broken multi-step workflows.
It needs known consumers. A consumer-driven contract is the union of what known consumers declare. For a public API with unknown, external consumers, there is no one to author the pacts and no way to enumerate what’s depended on — so CDC simply doesn’t apply. (Spec-first or BDCT against a published OpenAPI is the better fit there, but even that can’t know which external clients read which field.)
It adds real infrastructure. A broker to run and secure, pact publishing wired into every consumer’s CI, verification wired into every provider’s CI, can-i-deploy gates, deployment recording. That overhead is justified between a handful of internal services that change often; it is overkill for two services that talk via a stable, rarely-changing interface.
It can give false confidence when provider states diverge from real data. Verification runs against the data you set up in each providerState. If your fixtures are tidier than production — no nulls, no legacy rows, no the-currency-was-once-stored-as-a-string mess — every contract passes while the real provider, fed real data, returns shapes your fixtures never produced. Green contracts then certify a fiction.

This is why the senior judgment call is not “contracts or e2e” but both, in proportion. Contract tests can correctly replace most cross-service e2e for known internal consumers — they are faster, more stable, isolate the failing pair, and don’t need a full environment. But they are not a substitute for a few real end-to-end smoke tests that exercise the actual deployed services together on real-ish data. Keep a thin layer of e2e for the things only a real run catches: business-logic correctness across hops, data that doesn’t match your fixtures, auth and config wired end to end. The pyramid is many contract tests, a few e2e smoke tests — not all of one.

Order the steps

Order the expand-then-contract steps to rename a provider field without breaking three consumers:

1 Deploy a provider that returns BOTH the old field and the new field with the same value
2 can-i-deploy passes for the provider — every existing consumer pact still reads the old field
3 Each consumer, on its own schedule, switches its test to read the new field, regenerating its pact
4 The broker's matrix confirms no production consumer pact references the old field anymore
5 Deploy a provider that drops the old field — the removal is now non-breaking

Quiz

A consumer publishes a pact for an endpoint the provider hasn't built yet. With pending pacts enabled, what happens on the provider's main build?

Quiz

Every contract test passes, yet pricing returned amount_cents: -500 for a refund and checkout charged a negative balance. Why didn't contract testing catch it?

Pick the best fit

Contract tests now cover all your known internal service pairs. How should they sit alongside end-to-end tests?

Never delete in the same step you add. can-i-deploy confirms when the old field is safe to drop — once no production consumer pact references it.

Recall before you leave

01
A provider needs to rename a field three consumers read, but the contract gate blocks it. Walk through how to ship it, and how pending/WIP pacts handle the mirror case of a consumer-led change.
02
Contrast bi-directional contract testing with classic consumer-driven verification, and state the hard limits that mean contracts can't replace all end-to-end tests.

Recap

A contract gate blocks a breaking provider change but doesn’t ship it for you; expand-then-contract does, turning one breaking change into three green deploys — deploy a provider supporting both old and new shapes, migrate each consumer on its own schedule, then drop the old field once can-i-deploy confirms via the matrix that nothing in production still reads it, never deleting in the same step you add. The mirror problem is a consumer publishing an expectation before the provider builds it: pending pacts let the provider verify and report that content while a failure stays non-blocking until the pact is verified once, after which a failure is a real regression; WIP pacts extend this by auto-pulling any new unverified pact since a date with pending hardcoded on, so consumer work-in-progress never reddens the provider build. Bi-directional contract testing decouples the sides differently — the provider publishes only its OpenAPI spec and the broker statically compares each pact against it with no provider code run, cheap and low-friction but only as honest as the spec, so derive the spec from real provider tests. The hard limits bound all of it: contracts verify shape and interaction, never business logic or semantics, so a perfectly-shaped wrong value like amount_cents: -500 sails through; they need known consumers and so don’t fit public APIs with unknown clients; they add broker and CI overhead worth paying only between fast-changing internal services; and they give false confidence when provider-state fixtures are cleaner than production data. The senior judgment call is proportion, not exclusivity: contracts correctly replace most cross-service e2e for known internal consumers, but a few real end-to-end smoke tests on the deployed services remain irreplaceable for catching the value correctness, fixture-divergent data, and wired-up auth that only a real run reveals. Now when a teammate says “all contracts are green, let’s remove e2e,” you’ll be the one who asks: but what about the -500 refund?

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 5 done

Connected lessons

builds on

can-i-deploy and contract versioning: the deployment-safety gatesenior

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.