Crux Read real saga snippets — a compensating action, an orchestrator step, and a retried handler — predict the failure, and pick the highest-leverage fix.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at senior altitude — in orbit
◷ 14 min
Saga bugs do not look like saga bugs in code — they look like an ordinary handler that double-charges under retry, or a compensation that assumes a rollback it never gets. Read each snippet and pick the fix a senior would make before shipping.
Goal
Practise the read you do in every saga review: spot the missing idempotency key, the compensation that is really a rollback in disguise, and the orchestrator step that loses the in-flight workflow on a crash.
Snippet 1 — the compensating action
// T2 already committed: the card was charged and a payment row written.// C2 is supposed to undo it after a later step fails.func compensateCharge(ctx context.Context, orderID string) error { // delete the payment row so it looks like the charge never happened return db.Exec(ctx, "DELETE FROM payments WHERE order_id = $1", orderID)}
Quiz
Completed
Why is this compensation wrong, and what should it do instead?
Heads-up Consistency is with the outside world, not just the local table. The gateway still holds the customer's money; deleting the row only hides the charge and breaks reconciliation.
Heads-up Soft-deleting is better record-keeping, but it still does not move the money back. A compensation is a new forward action against the gateway, not a local-row edit.
Heads-up There is no transaction to roll back — T2 committed durably long ago. The only undo for a real charge is a real refund.
Snippet 2 — the orchestrator step
# Orchestrator drives the saga forward, step by step, in memory.def run_order_saga(order): book_flight(order) # T1 book_hotel(order) # T2 try: charge_card(order) # T3 except StepFailed: cancel_hotel(order) # C2 cancel_flight(order) # C1 # progress lives only in this call frame
Quiz
Completed
The orchestrator process crashes right after book_hotel returns but before charge_card. What is the failure, and what fixes it?
Heads-up A restart does not re-enter a function that was mid-execution; the call frame is gone. Without persisted state the booked flight and hotel are orphaned.
Heads-up Retry policy is a separate concern. The crash here happens between steps, so even a perfect retry never runs — the lost in-memory state is the real defect.
Heads-up The steps span separate services and their own databases; no single transaction spans them. That is the whole reason a saga exists. Durable step state, not a global transaction, is the fix.
Snippet 3 — the idempotent step
// Message delivery is at-least-once, so this handler can be invoked// more than once for the same saga step.func handleChargeCard(msg ChargeMsg) error { amount := msg.Amount if err := gateway.Charge(msg.CustomerID, amount); err != nil { return err } return savePaymentRow(msg.OrderID, amount)}
Quiz
Completed
With at-least-once delivery this handler has a real-money bug. What is it, and what is the minimal fix?
Heads-up Gateways only dedupe when you give them an idempotency key. With none supplied, two identical Charge calls are two distinct charges. The handler must supply the key.
Heads-up End-to-end exactly-once is largely a myth across a network — brokers give at-least-once and you make consumers idempotent. The fix lives in the handler, not the broker setting.
Heads-up Reordering still races: two redeliveries can both pass the check before either writes. You need an atomic dedupe — a unique idempotency key the gateway or a unique DB constraint enforces.
Snippet 4 — the semantic lock
-- Saga A starts working an order. Saga B may touch the same order-- concurrently because nothing is locked across saga steps.UPDATE orders SET status = 'PENDING_PAYMENT' WHERE id = $1;-- ... later steps of saga A run, then on success ...UPDATE orders SET status = 'CONFIRMED' WHERE id = $1;
Quiz
Completed
What is the PENDING_PAYMENT status doing here, and what must other sagas do to make it work?
Heads-up The row lock lasts only for that single UPDATE statement, not across the saga's steps. The status is the cross-step lock — but only if other sagas actually honour it.
Heads-up It guarantees nothing on its own — it is advisory. Enforcement is in the application code that reads the status and chooses to skip or wait.
Heads-up A semantic lock approximates isolation in application logic; it does not restore database-level isolation. Sagas remain ACID-minus-I — the marker just lets cooperating sagas avoid each other.
Recap
Every saga defect in this set is a known shape: a compensation that deletes a local record instead of issuing a real reversing action; an orchestrator that keeps progress only in memory and loses the workflow on a crash; a step that double-charges because at-least-once delivery met a non-idempotent handler; and a status field that only works as a lock if every saga honours it. Read for the missing idempotency key, the rollback-in-disguise compensation, and the unpersisted step — that is where saga bugs actually live.