Crux Read real consumer code, SQL, and a metrics line, predict the delivery-guarantee behaviour, and pick the highest-leverage fix a senior engineer makes first.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at senior altitude — in orbit
◷ 14 min
Delivery-guarantee bugs are found in the consumer code, the SQL, and the metrics line — rarely in the broker. Read each snippet, predict where the duplicate or the loss comes from, then choose the fix a senior engineer reaches for first.
Goal
Practise the loop you run in every duplicate-charge incident: read the handler, find the crash window or the race, and reach for the structural fix rather than nudging a timeout.
Snippet 1 — the SELECT-then-act consumer
def handle(msg): if db.execute("SELECT 1 FROM processed WHERE msg_id=%s", msg.id): return ack(msg) # already done, skip stripe.charge(msg.amount) # side effect db.execute("INSERT INTO processed (msg_id) VALUES (%s)", msg.id) ack(msg)
Quiz
Completed
Two consumers receive the same message during a rebalance. What happens, and what is the highest-leverage fix?
Heads-up There is no UNIQUE constraint shown, and even with one the charge happens BEFORE the INSERT — both cards are charged before either INSERT runs. The fix is to INSERT first inside the same transaction as the side effect.
Heads-up At-least-once explicitly allows concurrent redelivery after a visibility timeout or during a rebalance. Assuming single delivery is the root error.
Heads-up A longer timeout shrinks the window but the SELECT/act race is a correctness bug independent of timing; a rebalance re-opens it regardless of timeout. Make the dedup atomic.
Snippet 2 — the transactional dedup
BEGIN; INSERT INTO processed (msg_id) VALUES ('msg-7a3f'); -- UNIQUE(msg_id) UPDATE orders SET status = 'paid' WHERE id = 'O-123';COMMIT;-- on UNIQUE violation: ROLLBACK, log "duplicate", ack the broker
Quiz
Completed
The consumer COMMITs this transaction, then crashes before acking the broker. On redelivery, what happens?
Heads-up The UNIQUE violation on the INSERT aborts the transaction before the UPDATE can re-apply. INSERT-first is exactly what prevents the second side effect.
Heads-up The crash happened BEFORE the ack, so the broker still has the message and redelivers it — no loss. This is the safe ordering: commit work, then ack.
Heads-up There is only one consumer at a time here (the original crashed). Even concurrently, the UNIQUE constraint serialises them via a violation, not a deadlock.
Snippet 3 — the external-API consumer
def handle(msg): resp = stripe.charge(msg.amount, idempotency_key=msg.id) # external dedup db.execute("UPDATE intents SET charge_id=%s WHERE msg_id=%s", resp.id, msg.id) ack(msg)
Quiz
Completed
The Stripe call succeeds, then the process dies before the UPDATE and the ack. The broker redelivers minutes later. What is the actual behaviour?
Heads-up The Idempotency-Key is keyed on msg.id, not on the DB row. Stripe deduplicates on its side regardless of whether your UPDATE ran; it returns the cached charge.
Heads-up On redelivery the same key returns the cached charge, the UPDATE finally lands, and the ack clears the message. The intent row plus the stable key make the retry safe.
Heads-up Stripe returns HTTP 200 with the cached response for a repeated key within the TTL; it is not an error path.
Snippet 4 — the dedup metric
metric: dedup_check_hit_rate steady state: 0.05% 14:02-14:12 : 8.0% (no deploy in this window) 14:13+ : 0.06%
Quiz
Completed
Reading this metric, what most likely happened — and is it a correctness emergency?
Heads-up A hit-rate SPIKE means the UNIQUE constraint is firing MORE — duplicates are being caught, not leaking. A failing table would show errors or a drop in successful inserts, not a hit-rate rise.
Heads-up Idempotent-producer retries are deduplicated at the broker before the consumer ever sees them, so they cannot move consumer-side dedup_hit_rate. The cause is redelivery to the consumer.
Heads-up Exactly-once delivery was never in effect — the system is at-least-once by design. A transient redelivery spike that the idempotent consumer absorbs is expected behaviour, not a broken guarantee.
Recap
Every delivery-guarantee incident is read in code, SQL, and metrics: SELECT-then-act races under concurrent delivery; INSERT-first in one transaction makes redelivery harmless; an Idempotency-Key derived from a stable message ID extends dedup across an external API; and a dedup_hit_rate spike means the broker is redelivering, not that dedup failed. Diagnose the crash window or the race, apply the structural fix (atomic dedup, stable idempotency key), then confirm the duplicate path is closed — before touching a single timeout.