Engineering Practice ENG · 01 · 03

Property-based testing: invariants over examples, with shrinking

Example tests only check the cases you thought of. Property tests assert an invariant over generated input — round-trip, oracle, metamorphic — and shrink failures to a minimal counterexample, finding the .005-class bugs no human enumerates.

ENG Middle ◷ 17 min

Level

FoundationsJuniorMiddleSenior

A payments service has 94% line coverage and eight months of green CI. Then a customer in Türkiye is double-charged. The bug: a money formatter that rounded half-to-even on one path and half-up on another, and the two disagreed only on values ending in .005. Every unit test used round numbers — 19.99, 100.00 — so every test passed. Nobody had written 0.005, 2.675, or a randomly generated decimal, because nobody thought to. The suite was green and the behavior was broken at the same time, for eight months, because the examples and the blind spots were written by the same person.

An example-based test is one fixed input mapped to one expected output: reverse([1,2,3]) === [3,2,1]. It documents a single case, and you write the cases you can imagine — which means the suite’s blind spots are precisely your blind spots. The money bug is the canonical shape of this: every example was a “nice” number, so the half-even versus half-up disagreement on .005 values lived undisturbed in the gap between the examples. No amount of additional round-number examples would ever close it, because the gap is defined by what you didn’t think to type.

A property-based test inverts the relationship. Instead of stating an input-output pair, you state something that must be true for all valid inputs — an invariant — and the framework generates hundreds of inputs to attack it. “Reversing a list twice returns the original list.” “Parsing then serializing yields the original value.” Hypothesis (Python), fast-check (JS/TS), and QuickCheck (Haskell, the grandfather of the family) throw generated values at the property: empty, single-element, huge, duplicate-heavy, and crucially the boundary values that sit where bugs live. fast-check runs numRuns: 100 per property by default, tunable to 1000+ in CI. You are no longer limited to the inputs your imagination produced.

The four property shapes that actually catch bugs

Stating a useful invariant is a design act, and a weak property (result is a number) tests almost nothing while looking thorough. There are four shapes worth memorizing, because most real properties are an instance of one. Round-trip: decode(encode(x)) === x — any serialize/parse, compress/decompress, save/load pair. This is the most reached-for and would have caught the money bug within a handful of runs. Invariant: a fact that always holds after the operation — sort(xs).length === xs.length, output is ordered, a total is non-negative. Oracle: a trusted reference agrees with the new code — a rewrite versus the old function, the fast path versus the slow path. Metamorphic: a relation between two related runs — search(q) results ⊇ search(q + " AND x").

Property shape	Form of the assertion	Where it fits
Round-trip	`decode(encode(x)) === x`	JSON serialize/parse, compress, save/load, money format
Invariant	a fact that always holds after the op	`sort(xs).length === xs.length`; output ordered
Oracle	new impl agrees with a trusted reference	rewrite vs old fn; fast path vs slow path
Metamorphic	relation between two related runs	`search(q)` ⊇ `search(q + ” AND x”)`

Shrinking is why the counterexample is debuggable

The reason property testing is usable and not just noise is shrinking. When the framework finds a failing input it does not hand you the giant random value it happened to generate; it automatically searches for the smallest input that still fails, usually via binary-search-style reduction. fast-check might generate a failing list of 40 random integers, then shrink it down to [23, 22] — the minimal pair that still violates the property. Hypothesis does the same and then runs the minimal example one extra time to confirm the failure isn’t flaky before reporting it.

That is the difference between a usable bug report and an unusable one. Without shrinking you get “it failed on [483, -29, 0, 17, -6, 92, ...]” and bisect by hand; with shrinking you get “it failed on [0, -1]” and the root cause is often obvious on sight. Both frameworks also print a seed: re-run with that seed and you reproduce the exact failing case on your laptop, which makes the otherwise-scary “random tests” deterministic to debug. The generator finds the bug; the shrinker makes it cheap to understand; the seed makes it reproducible.

▸Why this works

The senior objection to property tests is real: generated randomness is a flake risk. A property that secretly depends on a clock, an unseeded Math.random, wall-time, or external state can pass 999 runs and fail the 1000th, turning CI red on an unrelated commit. The discipline that defuses it is twofold — make the property pure and let the framework own all input generation, and pin the seed the moment a flake appears so the failure is reproducible rather than a coin toss. A property test you can’t reproduce on demand isn’t a test; it’s a rumor. Reproducibility, not raw cleverness, is what makes generated testing trustworthy in CI.

Pick the best fit

You're rewriting a battle-tested but slow CSV parser used company-wide. How do you test the rewrite for highest leverage?

Quiz

A teammate says '94% coverage — why would property tests find anything new?' What's the precise answer?

Quiz

A property fails on a randomly generated input. Why does shrinking matter, and how is it different from the printed seed?

Order the steps

Order the lifecycle of a property test that catches the money bug:

1 State the invariant: parse(format(x)) === x for any decimal x
2 The framework generates hundreds of decimals, including boundary values
3 A generated value ending in .005 violates the property
4 Shrinking reduces it to the minimal failing value, e.g. 0.005
5 Pin the printed seed, replay deterministically, and fix the rounding

The framework generates hundreds of inputs and checks the invariant on each. If it holds, the next input is tried. If it fails, the shrinker finds the minimal counterexample; a seed is printed so the exact failure replays deterministically.

Recall before you leave

01
We have 94% coverage. Make the case for adding property tests anyway.
02
Walk through how a property test reports a failure usefully, from generation to a reproducible fix.

Recap

Example-based tests map one fixed input to one expected output, so you only test the cases you can imagine — which means the suite’s blind spots are precisely your own, the way the money formatter stayed green for eight months because every example was a round number and the .005 disagreement lived in the gap between them. Property-based testing inverts this: you state an invariant true for all valid inputs and the framework generates hundreds of them — empty, huge, duplicate-heavy, boundary — to attack it, with fast-check at 100 runs by default and Hypothesis and QuickCheck doing the same across languages. The four shapes worth knowing are round-trip (decode∘encode is identity, the most reached-for and the one that catches the money bug), invariant (a fact always true after the op), oracle (a trusted reference agrees, ideal for rewrites), and metamorphic (a relation between related runs). Shrinking is what makes a failure debuggable: the framework reduces a 40-element failing array to [23, 22], and a printed seed replays the exact run. The cost is real — properties are harder to write, slower, and flaky if impurity leaks in — so the discipline is purity, framework-owned generation, and pinning the seed when a flake appears. Now when you see a function that serializes, formats, or transforms data, your first question should be: can I state a round-trip invariant and let the generator find what my examples missed?

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 5 done

Connected lessons

builds on

Test doubles: London vs Detroit, and the over-mocking trapmiddle

unlocks

When TDD pays off and when it actively hurtssenior

deepens into

When TDD pays off and when it actively hurtssenior

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.