Queues, Streams, Eventing QUE · 05 · 10

Outbox pattern: build a crash-safe event pipeline

Hands-on project — build an order service that eliminates the dual write with a transactional outbox, ship events with a polling relay, and prove no event is lost or double-applied under injected crashes.

QUE Senior ◷ 240 min

Level

FoundationsJuniorMiddleSenior

Reading about the dual-write gap is not the same as watching an event vanish and then closing the hole yourself. Build a small order service, prove it loses events the naive way, then add a transactional outbox plus an idempotent consumer and show — with crashes injected on purpose — that nothing is lost and nothing is applied twice.

Goal

Turn the unit’s mental model into a working pipeline: write the business row and an outbox row in one transaction, ship rows with a crash-safe polling relay, scale the relay without double-publishing, and verify the at-least-once guarantee holds end to end under injected failures.

Project

0 of 8

Objective

Build an order service that publishes an OrderPlaced event for every committed order with zero lost events under crash injection, using a transactional outbox and a polling relay, and prove the at-least-once guarantee with an idempotent consumer that never double-applies.

Requirements

Acceptance criteria

A before/after comparison: the naive dual write loses at least one event under crash injection; the outbox version loses zero across the same crash schedule.
Evidence (logs or a counter) that the relay delivered at least one event more than once under crash injection, AND that the side effect was still applied exactly once — proving the consumer's idempotency.
With multiple relay replicas running, no event id is published twice in the same pass — SKIP LOCKED claims disjoint batches.
A short write-up: where the dual-write gap was, why one local transaction closes it, why delivery is still at-least-once, and how the consumer dedupes.

Senior stretch

Swap the polling relay for a CDC relay (Debezium tailing the WAL, or your DB's logical replication) and measure the end-to-end latency drop versus polling at a 500ms interval.
Add outbox cleanup that does not hurt the write path: either batched deletes of old sent rows or daily partitions reaped with DROP PARTITION, and show the unsent-row poll stays fast as the table grows.
Preserve per-aggregate ordering: key events by aggregate id so all events for one order land on one partition and one worker, and demonstrate ordering holds even with multiple relays.
Add an inbox-retention sweep (drop processed event ids older than N hours) and reason about the window: too short risks re-applying a very late duplicate, too long bloats the dedupe table.

Recap

This is the pipeline you will reach for whenever a write must reliably trigger a notification: prove the naive dual write loses events, then write the business row and an outbox row in one local transaction so the intent to publish is durable, ship rows with a relay that claims disjoint batches via SKIP LOCKED, and make consumers idempotent on a stable event id so the unavoidable at-least-once duplicate is a no-op. Build it once with crashes injected on purpose, and the production version — never silently dropping a write — becomes muscle memory.

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.