Queues, Streams, Eventing
Designing UX over async backends: optimistic UI, pending states, read-your-writes
A user clicks “Publish”, the spinner stops, the toast says “Published!” — and the post is nowhere. Support tickets pile up. The bug: the POST returned 202 Accepted and dropped the work onto a Kafka topic; a consumer writes the post ~800ms later. The frontend treated 202 as 200, refetched the list before the consumer ran, and got an empty result it then trusted. The UI was not slow. It was confidently wrong about a write that had not happened yet.
The consistency window is now your problem
The moment a write stops being synchronous — it goes onto a queue, an event bus, a worker — there is a gap between “the server accepted it” and “the server will return it on read.” That gap is the consistency window. Under normal load it might be 50–300ms. Under backpressure, a lagging consumer, or a partition rebalance, it stretches to seconds. Your backend team treats this as a throughput win. Your users experience it as the app forgetting what they just did.
The first thing this breaks is read-your-own-writes: the guarantee that I can see my change immediately, even if other users can’t yet. Naive flow — POST, then refetch the list — violates it constantly, because the refetch races the consumer and usually wins. The user’s own edit vanishes from their own screen. This is the single most-reported “ghost bug” on async backends, and it is not random: it is the deterministic outcome of trusting a read that happened inside the consistency window.
Optimistic UI: apply locally, reconcile later
The senior reflex is optimistic UI: apply the change to local state the instant the user acts, send the request in the background, and reconcile when the truth arrives. The justification is not aesthetic — it is the perception thresholds Nielsen documented in 1993 and that still hold: ~100ms feels instantaneous (direct manipulation), ~1s keeps the user in flow but they notice the wait, and ~10s is the limit before attention breaks and they assume a crash. An async pipeline easily blows past 1s end-to-end, so you can’t make the real round-trip fast — you make the perceived one ~0ms by rendering the predicted result immediately.
The contract has three parts, and the third is the one juniors skip:
- Apply the predicted change to local state and show it.
- Send the request; keep enough context to undo.
- Reconcile — on success, replace the prediction with server truth (it may differ); on failure, roll back to the snapshot you took before applying.
TanStack Query encodes exactly this in useMutation: onMutate cancels in-flight refetches, snapshots the cache, and writes the optimistic value (returning the snapshot as context); onError restores that snapshot; onSettled calls invalidateQueries to refetch real state. React 19 ships useOptimistic for the lighter, transition-scoped version. The shape is the same everywhere: snapshot → predict → reconcile-or-rollback.
Why this works
Optimistic UI silently assumes your client can predict the server’s result. That holds for “add this todo” but breaks for “the server assigns the order number,” “the server applies a discount,” or “another user already changed this.” When the prediction and the reconciled truth diverge, you must show the corrected value — not keep the optimistic one. An optimistic update that never reconciles is just a lie with better latency.
Pending states beat fake success
When you can’t safely predict the outcome (payment captured? video transcoded? export ready?), the honest move is an explicit pending/processing state, not a fake “done.” Show that the work is in flight, give it an affordance — a progress step, an ETA, a “we’ll email you” — and let the UI converge when the event lands. The cardinal sin is collapsing 202 Accepted into a success toast: you’re telling the user something completed that is still sitting in a queue.
Two failure modes live here. First, the infinite spinner: the UI waits for an event that never arrives (consumer crashed, message dead-lettered) and spins forever. Every pending state needs a timeout + reconciliation path — after N seconds, poll the authoritative endpoint or surface “still processing, check back.” Second, double-submit on retry: the user sees no feedback, clicks again, and now two messages are on the queue. Without an idempotency key, that’s two charges, two orders, two emails.
| Situation | Right UX pattern | Why |
|---|---|---|
| Result is predictable (add todo, like, rename) | Optimistic UI + rollback | Perceived latency ~0ms; reconcile on settle |
| Result unpredictable (payment, transcode, export) | Explicit pending state + timeout | Can’t fake a value you don’t know; never show “done” early |
| Refetch right after a write | Local echo / sticky read | Refetch races the consumer and wins → read-your-writes violation |
| User retries / double-clicks | Idempotency key per intent | Same key = at-most-once effect; no double charge |
When two async updates collide
Eventual consistency means two writes can be in flight at once — yours and a teammate’s, or your phone’s and your laptop’s. Reconciliation has to pick a winner, and the strategy is a real tradeoff. Last-write-wins (LWW) stamps each write with a timestamp and keeps the latest; it’s trivial and stateless but silently drops the loser’s data and depends on synchronized clocks — clock skew between nodes can make an older edit win. Merge logic (field-level or three-way) preserves both sides when changes don’t overlap but needs domain rules. CRDTs (conflict-free replicated data types) let replicas merge in any order to the same result with no coordination — the foundation of real-time collab editors — but they accumulate metadata and can grow unbounded without garbage collection. For a “rename” LWW is fine; for a shared document, LWW destroys text, so you reach for a CRDT.
The frontend’s job is to make the chosen strategy legible: if LWW dropped the user’s change, say so and offer to reapply; if a merge happened, show what merged; surface a 409 Conflict as a real choice (“their version / your version / merge”), not a silent overwrite.
A user clicks 'Mark complete' on a task that goes through a queue (consumer writes ~500ms later). Pick the UX.
A POST that drops work on a queue returns 202 Accepted in 40ms; a consumer writes the row ~700ms later. The UI POSTs then immediately refetches the list. What does the user see?
A 'Pay' button shows a spinner. The event confirming the charge never arrives (consumer crashed). With no extra handling, what happens?
Order an optimistic mutation against an async backend (TanStack Query's contract):
- 1 onMutate: cancel in-flight refetches so they can't clobber the optimistic value
- 2 onMutate: snapshot the current cache (so you can roll back)
- 3 onMutate: write the predicted value to the cache and return the snapshot as context
- 4 onError: restore the snapshot from context — roll back the prediction
- 5 onSettled: invalidateQueries to reconcile with real server state
- 01Why does 'POST then immediately refetch' break read-your-own-writes on an async backend, and how do you fix it without giving up the refresh?
- 02When is optimistic UI the wrong tool, and what do you do instead?
Async backends move the consistency window — the gap between “accepted” and “readable” — onto the frontend, and a naive UI either lies or shows stale data. The two honest tools split by predictability: when the client can predict the result, use optimistic UI — apply locally, snapshot, send, then reconcile with server truth on settle or roll back on failure — which buys ~0ms perceived latency against the 100ms/1s/10s perception thresholds. When the result is the server’s to decide, show an explicit pending state with a timeout and reconciliation so a never-arriving event can’t spin forever, and never collapse a 202 Accepted into a success toast. Read-your-own-writes breaks when a refetch races the consumer, so echo the change locally instead of trusting an immediate racing read. Make retries idempotent with a key so a double-click can’t double-charge. And when two async writes collide, choose reconciliation deliberately — last-write-wins for simple fields (accepting it drops the loser and trusts clocks), CRDTs for shared documents — then make the outcome legible to the user instead of silently overwriting their work.