Crux Read real pipeline config and consumer code — outbox plus CDC, offset-commit order, dedup, and DLQ handling — predict the behaviour, and pick the fix a senior would make first.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at senior altitude — in orbit
◷ 14 min
The pipeline’s bugs do not live in prose — they live in a transaction boundary, a commit order, a missing unique constraint, and a connector config. Read each snippet across the order pipeline and choose the fix a senior engineer reaches for first.
Goal
Practise reading the seams of an event-driven pipeline the way you read an incident: spot where atomicity, ordering, idempotency, or DLQ handling is silently broken, and name the highest-leverage correction.
Snippet 1 — the outbox write
-- order handler, one DB transactionBEGIN; INSERT INTO orders (id, customer_id, status, total_cents) VALUES ('ord_9f', 'cust_3a', 'placed', 4200); INSERT INTO outbox (id, aggregate_id, type, payload, created_at) VALUES ('evt_7c', 'ord_9f', 'order.placed', '{"order_id":"ord_9f"}', now());COMMIT;-- separately, a relay later:-- SELECT * FROM outbox WHERE sent = false ...-- publish to Kafka, then UPDATE outbox SET sent = true
Quiz
Completed
What guarantee does this structure give, and what must downstream consumers do as a result?
Heads-up Only the two INSERTs share a transaction. The relay's publish and the sent-flag UPDATE are separate, non-atomic steps against two systems, so a crash between them re-publishes — at-least-once, not exactly-once.
Heads-up The relay re-publishes on a crash-before-mark, so the same event id can arrive twice. Without consumer dedup, order.placed gets processed twice.
Heads-up Mark-then-publish just converts a duplicate into a lost event: crash after the UPDATE and before publish, and the row looks sent but nothing was emitted. Publish-then-mark plus idempotent consumers is the safe choice.
This connector tails the outbox table and routes events to a Kafka topic keyed by aggregate_id. Two settings interact badly with a low-traffic deployment plus a stalled consumer. Which is the real production hazard?
Heads-up Routing by aggregate_id (the order id) is exactly right — it keys each order's events to one partition for per-order ordering. That is the intended design, not the hazard.
Heads-up Capturing the purpose-built outbox table is the deliberate choice — you control its shape and avoid coupling consumers to raw table schemas. Capturing every domain table is the alternative, not a bug here.
Heads-up The EventRouter SMT exists precisely to route outbox events by a field; route.by.field is a supported, common setting. The hazard is the slot/heartbeat interaction, not the routing.
Snippet 3 — the consumer commit order
for msg in consumer: # at-least-once Kafka consumer event = parse(msg.value) consumer.commit() # commit offset FIRST charge_payment(event.order_id, # then do the side effect event.amount_cents)
Quiz
Completed
This payment consumer commits the offset before charging. What is the failure mode, and what is the correct ordering?
Heads-up It prevents duplicates by losing messages: a crash after the commit drops the charge entirely. For payments a lost charge is worse than a duplicate; commit-after-process plus idempotency is the right combination.
Heads-up Parse order is irrelevant to the guarantee. The defect is committing the offset before the side effect runs, which turns a crash into a lost charge.
Heads-up Kafka redelivers from the committed offset. Since the offset was already committed, the lost charge's event is never redelivered — it is gone.
Snippet 4 — DLQ handling
def handle(event): try: process(event) # idempotent side effect except Exception: send_to_dlq(event) # any failure -> DLQ consumer.commit()
Quiz
Completed
This handler sends every failure straight to the DLQ on the first exception. What goes wrong, and what is the senior fix?
Heads-up It floods the DLQ with transient failures that idempotent retry would have cleared, turning automatic recovery into manual toil and hiding genuinely unprocessable messages among the noise.
Heads-up Committing after the message is safely in the DLQ is correct — it stops the poison message from blocking the partition. The defect is DLQ-ing on the first failure with no retry budget.
Heads-up Idempotent process is required regardless: at-least-once means replays happen on crash-before-commit, and the DLQ does not dedupe live retries. Keep idempotency; add a retry budget.
Recap
Every seam of the pipeline is read in code and config: the outbox makes two INSERTs atomic but the relay’s publish-then-mark is at-least-once, so consumers must dedupe; a CDC connector keyed by aggregate_id preserves per-order ordering, but heartbeat.interval.ms = 0 with no slot cap is a disk-fill outage waiting on a stall; committing the offset before the side effect is at-most-once and silently loses charges, so commit after processing and stay idempotent; and DLQ-ing on the first exception buries poison messages under recoverable transients, so retry with a budget and quarantine only on exhaustion. Read the transaction boundary, the commit order, the unique constraint, and the connector config — that is where the pipeline’s correctness actually lives.