Crux Read real producer and consumer snippets, predict the ordering behaviour, and pick the highest-leverage fix a senior would make first.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at senior altitude — in orbit
◷ 14 min
Ordering bugs live in the producer config and the consumer’s handler, not in the broker. Read the code, predict where order breaks, and choose the fix a senior engineer would reach for first.
Goal
Practise the loop you run on every ordering incident: read the partition key and producer config, find where two messages can race or swap, and reach for the structural fix — key choice, idempotent producer, version guard — before blaming the broker.
Snippet 1 — the partition key
// Events for one account must stay ordered relative to each other.ProducerRecord<String, Event> record = new ProducerRecord<>("account-events", UUID.randomUUID().toString(), event);producer.send(record);
Quiz
Completed
The requirement is per-account ordering. What does this key choice actually do, and what is the fix?
Heads-up Kafka preserves order only within a partition. A random key spreads one account's events across partitions, so there is no partition that holds them in sequence. The key must be the consistency boundary.
Heads-up A topic per account does not scale and is unnecessary. One topic partitioned by accountId gives per-account order and parallelism across accounts. The defect is the random key.
Heads-up acks controls durability, not which partition a key lands on. A random key still scatters the account's events; only keying by accountId fixes ordering.
The key is correct and everything is on one partition. Why can two messages still swap, and what is the one-line fix?
Heads-up retries=0 trades reordering for lost messages on transient failures — worse. The real fix is the idempotent producer, which preserves order across retries via per-partition sequence numbers.
Heads-up acks=all is the durable, correct setting and does not cause reordering. The reorder comes from non-idempotent retries with multiple in-flight requests.
Heads-up A single partition guarantees order only if the producer cannot let a retried batch overtake a later one — which is exactly what non-idempotent, multi-in-flight retries allow.
Snippet 3 — the consumer handler
def handle(msg): user = db.get(msg.user_id) user.name = msg.new_name # last write wins on whatever arrives last db.save(user)
Quiz
Completed
Consumers are concurrent on an at-least-once queue. Two updates for one user can arrive out of order or twice. What is the strongest fix?
Heads-up A transaction makes one update atomic but does not order two independent messages. An older update can still commit after a newer one and overwrite it; you need a version guard, not just atomicity.
Heads-up Dedup stops applying the same message twice, but two different updates for one user can still apply in the wrong order. Dedup plus a version check is what you need.
Heads-up That serializes unrelated users to fix a per-user problem, collapsing throughput. Partition by user_id and version-guard instead.
Snippet 4 — the DLQ replay
# Nightly job: drain the dead-letter queue back into the main topicfor failed in dead_letter_queue.drain_all(): main_topic.produce(failed.key, failed.value) # full speed, no throttle
Quiz
Completed
The consumer is idempotent and version-guarded. Even so, what does this replay do to live ordering, and how do you run it safely?
Heads-up Same key preserves which partition they hit, but they still land after newer events — a reorder. Safe to apply only because of the version guard, and you still throttle to protect live throughput.
Heads-up DLQ replay is a normal recovery tool. With idempotent, version-guarded consumers the disorder is harmless on apply; the fix is to rate-limit the replay, not to forbid it.
Heads-up Dropping the key scatters the replayed messages across partitions, making things worse. Keep the key, rely on the version guard, and throttle the replay.
Recap
Ordering is read in the producer and consumer, not the broker: a random partition key scatters an entity’s events and must be the consistency boundary; a non-idempotent producer with multiple in-flight requests reorders at the source, fixed by enable.idempotence=true; concurrent consumers on an at-least-once queue need a per-entity version guard, not just a transaction or dedup; and DLQ replay reorders against live traffic, safe to apply under a version guard but only when rate-limited. Fix the structure first, then verify the order holds under load.