Queues, Streams, Eventing QUE · 02 · 10

Kafka partitions: break and fix per-key ordering

Hands-on project — build a keyed order pipeline, prove per-key ordering, break it with an in-place partition increase, then migrate to a larger topic the safe way with evidence at every step.

QUE Senior ◷ 240 min

Level

FoundationsJuniorMiddleSenior

Reading that partition count is a one-way door is not the same as watching per-key ordering evaporate on your own cluster. Build a small order pipeline, prove ordering holds, break it on purpose with an in-place partition bump, then migrate to a larger topic the safe way — with evidence at every step.

Goal

Turn the unit’s mental model into a reproducible engineering loop: key a topic for ordering, verify the guarantee, reproduce the partition-increase failure that breaks it, and execute a dual-write migration that preserves order — measuring consumer lag and ordering correctness throughout.

Project

0 of 7

Objective

Build an order-events pipeline that requires strict per-order ordering, demonstrate that an in-place partition increase breaks it, then scale capacity safely via a new-topic migration — proving each claim with consumer output and lag/rebalance metrics, not assertions.

Requirements

Acceptance criteria

A baseline run on 6 partitions with the ordering validator reporting zero per-order violations under concurrent load.
Evidence (assignment dump or per-consumer lag) that with 8 consumers on 6 partitions, exactly 2 consumers own no partitions.
A reproduced in-place partition increase showing a non-zero ordering-violation count for rehashed keys, with the specific orderIds and their old/new partitions identified.
A completed dual-write migration to a larger v2 topic with zero ordering violations across the cutover, plus a lag/rebalance snapshot showing the cutover was controlled.
A one-paragraph write-up contrasting the in-place resize and the migration: why the modulo change breaks ordering, and why the migration preserves each key's history.

Senior stretch

Inject a hot key (one whale orderId carrying 60%+ of traffic) and show one consumer pegged while others idle; then apply key-salting and show the load spreading — and document the per-entity ordering you traded away.
Switch the group from the eager RangeAssignor to CooperativeStickyAssignor and add a stable group.instance.id; measure rebalance duration during a rolling restart before and after, and report the reduction.
Add a Kafka Streams (or stateful) consumer with a partition-keyed state store and show how the in-place partition increase corrupts its local state, then confirm the migration path keeps the store consistent.
Sweep partition counts (6 / 50 / 200) on the same hardware and chart producer throughput and p99 produce latency, finding the point where more partitions stop helping for your broker.

Recap

This is the loop you will run before any real Kafka capacity change: key for the ordering you need, prove the guarantee with a validator, reproduce the failure mode on a toy cluster so you trust it, and scale via a new-topic dual-write migration rather than an in-place resize — measuring lag, rebalances, and ordering violations at each step. Watching per-key order break once, on purpose, is what makes the one-way door real instead of a slogan, and turns the safe migration into muscle memory.

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.