awesome-everything RU
↑ Back to the climb

Queues, Streams, Eventing

Delivery guarantees: build a crash-proof payment consumer

Crux Hands-on project — build a payment consumer that survives crashes with zero double-charges, then prove effectively-once under injected failures with before/after duplicate counts.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at senior altitude — in orbit
◷ 240 min

Reading about double-charges is not the same as stopping one. Build a small payment consumer on an at-least-once queue, drive it into every failure leg with injected crashes and timeout misconfigurations, and harden it until the duplicate count is exactly zero — with evidence at every step.

Goal

Turn the unit’s mental model into a reproducible engineering loop: reproduce duplicates and loss on purpose, add an INSERT-first transactional dedup plus an Idempotency-Key, close the producer-side dual-write with an outbox, and prove effectively-once under chaos with before/after numbers.

Project
0 of 8
Objective

Build a payment-processing consumer on an at-least-once queue (SQS, RabbitMQ, or Kafka) that achieves effectively-once: zero double-charges and zero lost events under injected consumer crashes, visibility-timeout expiry, and producer-side failures — proven with measured duplicate and loss counts, not assertions.

Requirements
Acceptance criteria
  • A before/after table: double-charge count and lost-event count, measured under an identical chaos run (kill consumer mid-processing, expire the timeout, drop a publish) — naive version vs hardened version. Hardened must show zero of both.
  • A demonstration that killing the consumer AFTER the DB commit but BEFORE the ack causes a redelivery that is silently deduplicated (UNIQUE violation -> rollback -> ack), with the dedup_hit_rate metric registering the catch.
  • A demonstration that a dropped/failed publish on the producer side does NOT lose the event, because the outbox row stays pending and the sender republishes it.
  • A short write-up mapping each fix to the failure leg it closes (Leg 1 / Leg 3 / dual-write / timeout) and naming why consumer idempotency — not a broker setting — is the load-bearing guarantee.
Senior stretch
  • Add Kafka idempotent producer + transactions on the within-Kafka path and measure the throughput cost (~3% vs ~20-30%); then show the cross-system Postgres write STILL needs the consumer dedup, proving where the Kafka transaction boundary ends.
  • Build a chaos harness that randomly kills the consumer at each step (before charge, after charge before commit, after commit before ack) on a loop for 1000 messages, and assert charges-per-order == 1 for every order at the end.
  • Add a one-page on-call runbook: how to read a dedup_hit_rate spike, the SQS visibility-timeout rule, the DLQ redrive checklist (snapshot, sample-audit, rate-limit), and the duplicate-vs-loss decision tree.
  • Swap the outbox poller for CDC (Debezium reading the WAL) and compare latency and operational load against the polling sender.
Recap

This is the loop you will run in every real delivery-guarantee incident: reproduce the duplicate or loss on purpose, identify the failure leg, apply the structural fix (INSERT-first transactional dedup, stable Idempotency-Key, outbox for dual-write, timeout sized to processing), and verify with measured before/after counts under chaos — never assertions. Build it once on a toy payment consumer and the production version becomes muscle memory: at-least-once delivery, effectively-once processing, correctness enforced in the consumer.

Continue the climb ↑Kafka partitions: the unit of parallelism, ordering, and a one-way door
shortcuts expand
search
K
prev piece
k
next piece
j
cycle tier
t
this menu
?
sources3
expand
  1. 01
  2. 02
  3. 03

Trademarks belong to their respective owners. Editorial reference only.