Crux Read real test code — a first failing test, a fragile mock-heavy test, a property, and a surviving mutant — and pick the fix or diagnosis a senior makes first.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at senior altitude — in orbit
◷ 14 min
Test quality is judged in the test file, not the coverage report. Read each snippet the way you’d read a colleague’s PR, then choose the change a senior would make first.
Goal
Practise the eye you bring to a real review: spot the test that asserts structure instead of behavior, write the failing test that drives design, state a property that beats examples, and read what a surviving mutant is telling you.
Snippet 1 — the first failing test
// Driving a not-yet-built refund rule, test-first.test("refund for a fully-shipped order is rejected", () => { const order = anOrder({ status: "SHIPPED" }); const result = refundPolicy.evaluate(order); expect(result.allowed).toBe(false); expect(result.reason).toBe("ALREADY_SHIPPED");});
Quiz
Completed
Before refundPolicy exists, what does writing this test first buy you, and why is it shaped well?
Heads-up The red step is the load-bearing one. Calling evaluate(order) before it exists forces the interface decisions — name, arguments, return shape, failure mode — at the cheapest possible moment, before any call sites depend on them.
Heads-up That couples the test to internal call structure — the fragile-test anti-pattern. It would break on a harmless refactor and could stay green while the refund outcome is wrong. Assert the outcome, not the dance.
Heads-up A single passing example proves one case, not the absence of bugs. The red step's product is design feedback; broad bug-finding is what properties and mutation testing add later.
Snippet 2 — the fragile test
test("checkout charges the customer", () => { const tax = mock(TaxRule); // pure, owned by us const gateway = mock(PaymentGateway); // third-party network when(tax.apply(100)).thenReturn(108); checkout.pay(cart, tax, gateway); verify(tax.apply(100)).once(); // assert the call happened verify(gateway.charge(108)).once();});
Quiz
Completed
A behavior-preserving refactor of checkout turns this test red, yet a real total-computation bug once shipped with it green. What's the root cause?
Heads-up The matchers are fine; the design of the test is the defect. Asserting interactions on owned, pure code is the over-mocking trap regardless of how the matchers are written.
Heads-up Concurrency isn't the issue. The test breaks on refactor because it asserts which calls happened, and misses bugs because it stubbed the result it should have computed.
Heads-up Coverage is irrelevant here — the line ran. The bug slipped through because the asserted total was a stubbed constant (108), never a real computation. Assert the computed state with the real rule.
Snippet 3 — the property
// fast-check: rewriting a battle-tested CSV parser, oldParse is the trusted reference.fc.assert(fc.property(fc.string(), (raw) => { expect(newParse(raw)).toEqual(oldParse(raw)); // oracle property}), { numRuns: 1000 });
Quiz
Completed
Which statement about this property is correct?
Heads-up A round-trip checks parse∘serialize against identity and can pass while both are wrong the same way. An oracle compares against an independent trusted reference, so it catches divergences a round-trip can't.
Heads-up Reproducibility comes from the printed seed, not the run count. Re-run with the seed and you replay the exact failing case; numRuns just sets how hard the generator attacks.
Heads-up fc.string() generates arbitrary strings including quotes, commas, and newlines — exactly the quoted-comma and embedded-newline horrors that break real CSV parsers. That breadth is the point.
Snippet 4 — the surviving mutant
// Production code under mutation testing:function withinBudget(amount, limit) { return amount <= limit; // Stryker mutated this to: amount < limit}// The full suite still passed after the mutation. Mutant SURVIVED.
Quiz
Completed
The '<=' → '<' mutant survived with 100% line coverage. What does that tell you, and what's the fix?
Heads-up They differ at exactly amount === limit. An equivalent mutant has no input that distinguishes it; this one does, so it's a real gap, not noise to dismiss.
Heads-up The premise is 100% line coverage — the line ran every time. It survived because no assertion distinguished the boundary case, which is precisely what coverage can't measure.
Heads-up Coverage is already maxed and still missed it. The fix is a specific assertion at amount === limit; chasing a coverage number that's already 100% changes nothing.
Recap
Every test in this unit is read the same way. The first failing test is design feedback — it fixes the interface and asserts an observable outcome, so it survives refactors. The fragile test over-mocks owned, pure code: it breaks on refactor and ships bugs green behind stubbed results — use real objects and assert state. The oracle property lets a trusted reference define correctness over generated input and shrinks to a minimal counterexample. And a surviving mutant is a named missing assertion — most often a boundary the suite never fed. Diagnose from the test, fix the assertion, re-run to confirm the mutant is killed.