Queues, Streams, Eventing QUE · 02 · 01

Kafka partitions: the unit of parallelism, ordering, and a one-way door

A topic splits into N partitions; the key''''s hash picks one; ordering holds only within a partition; one consumer per partition caps your parallelism. Partition count is a decision you can''''t take back.

QUE Junior ◷ 17 min

Level

FoundationsJuniorMiddleSenior

Already know this unit? Take a 1-minute quick check →

The order-events topic was lagging, so someone bumped it from 6 partitions to 12 to “add throughput.” It worked for ten minutes. Then support flooded in: a cancel for an order arrived and got processed before the create. Same orderId, two events, now landing on different partitions because hash(orderId) % 12 no longer equals hash(orderId) % 6. Per-key ordering — the one guarantee the whole pipeline relied on — silently evaporated the instant the partition count changed. There is no rollback that puts the old keys back.

A topic is N independent logs, and the key picks the log

Why does choosing the wrong key silently destroy your ordering guarantee while everything else looks healthy? Because a single formula — hash(key) % N — is doing three separate jobs at once, and most engineers only notice one of them.

A Kafka topic is not one stream — it is N partitions, each an independent append-only log with its own offsets. When a producer sends a keyed record, the default partitioner computes murmur2(key) % N to choose exactly which partition the record lands in. Same key, same partition, every time — as long as N never changes. Records with no key are spread across partitions (round-robin or sticky batching) since they have nothing to pin them.

This single mechanism is doing three jobs at once, and conflating them is where designs go wrong:

Distribution — spreading load across brokers and disks.
Ordering — Kafka guarantees order only within a single partition, never across the topic. There is no global order.
Co-location — every record for one key is in one place, so a consumer sees that key’s full history in sequence.

So “ordering” in Kafka really means per-key ordering, and you get it for free only because all of a key’s records hash to the same partition. The choice of key is therefore the choice of what stays ordered: key by orderId and an order’s events are sequenced; key by customerId and you serialize everything for a customer (and risk one big customer overwhelming a partition — more on that below). When you inherit a Kafka pipeline and find an ordering bug, the first thing to check is not the brokers or the consumer code — it is the key the producer is using.

Consumer groups: one partition, one consumer

On the read side, a consumer group splits the partitions among its members so each partition is owned by exactly one consumer in the group at a time. That assignment is the source of Kafka’s parallelism — and its hardest cap. If a topic has 6 partitions, at most 6 consumers in a group do useful work. Start a 7th and it sits idle with nothing assigned. Your maximum consumer parallelism is permanently bounded by the partition count.

This is why partition count is a capacity-planning decision, not a tuning knob. You pick N up front to cover the parallelism you’ll need at peak, because raising it later is the dangerous move this lesson keeps circling back to.

Both orderId=A events hash to the same partition (P1), so the key's history stays ordered. Each partition is owned by exactly one consumer in the group; with 3 partitions and 2 consumers, one consumer owns two — and a 3rd consumer would sit idle, since parallelism is capped by partition count.

A fixed key is deterministic: hash(key) % N picks one partition, and the producer routes every record for that key to it — which is exactly why per-key ordering holds, and why changing N reroutes it.

If you have…	…then	Consequence
6 partitions, 4 consumers	Some consumers own 2 partitions	Fine — headroom to scale to 6
6 partitions, 6 consumers	1:1 mapping	Max parallelism reached
6 partitions, 8 consumers	2 consumers get nothing	Idle capacity — partitions cap you
A hot key on 1 partition	1 consumer overloaded, others idle	Skew — adding consumers won’t help

Together these rows say one thing: the partition count is the dial that controls how much parallelism is even theoretically possible. Add consumers without adding partitions and you buy nothing. Ignore skew and even a perfectly balanced assignment still bottlenecks on one overloaded key.

The one-way door: changing N rehashes everything

You can increase a topic’s partition count. You cannot decrease it, and increasing it is rarely as cheap as it looks. The problem is the partitioner: hash(key) % N is a function of N. Change N from 6 to 12 and the modulo result changes for most keys, so a key that always went to partition 2 now goes to partition 8. Future records for that key route to the new partition while its history sits in the old one — the two streams are no longer ordered relative to each other, and any consumer that built per-key state (a running balance, a state-machine position) now reads a split, out-of-order history.

For stateful Kafka Streams apps this is worse than a hiccup: the changelog/state store is keyed by partition, so a partition-count change can effectively corrupt the local state. When you see a Kafka Streams job producing wrong results after a “routine” partition bump, this split-history corruption is the first thing to suspect. The standard senior move is therefore not to resize in place. You over-provision partitions up front for expected growth, or — when you truly must scale — you create a new topic with the larger count and migrate (dual-write to old and new, drain the old, cut consumers over). It is a migration, not a config edit.

▸Why this works

Why can’t you decrease partitions? A partition is an ordered log with committed offsets and possibly compacted state. Merging two logs would have to interleave their records into one timeline, but there is no correct interleaving — their offsets and orderings are independent. Kafka refuses rather than silently destroy ordering, so the count only ratchets upward. Treat it as permanent.

Hot partitions, rebalances, and the cost of too many

Two production failure modes dominate. The first is skew: if one key (a whale customer, a viral product) dominates traffic, all of it hashes to one partition, so one consumer is pegged at 100% while the rest idle. More partitions and more consumers do nothing — the bottleneck is a single key on a single partition. The fix is at the key level (salt the key, split the hot entity), not the partition level.

The second is rebalancing. When a consumer joins, leaves, or is presumed dead (a missed heartbeat, a long GC pause, a slow deploy), the group reassigns partitions. The classic “eager” protocol is stop-the-world: every consumer revokes all its partitions and consumption halts across the whole group for the duration — often seconds, sometimes longer in a rebalance storm. Two mitigations matter: incremental cooperative rebalancing (Kafka 2.4+) reassigns only the partitions that must move so most consumers keep processing, and static membership (KIP-345) gives each consumer a stable group.instance.id so a quick restart doesn’t trigger a rebalance at all.

Numbers ground the tradeoff. A single partition handles roughly tens of MB/s, so throughput scales with partition count — up to a point. Push past ~1,000 partitions per broker and producer throughput and p99 latency degrade sharply (Confluent’s own tests show throughput collapsing as you climb from hundreds toward 10K partitions). Under the legacy ZooKeeper metadata plane, a broker failure meant leader elections that scaled with partition count, pushing controller failover into the 5–7 second range with thousands of partitions; KRaft cut that to under a second and lifts the practical ceiling into the hundreds of thousands of partitions per cluster. The senior balance: enough partitions for parallelism and headroom, not so many that failover latency and per-partition overhead dominate.

The metadata plane — not the brokers — is what really caps your partition count: KRaft turns a 5–7s, ~1,000-partition limit into a sub-second, hundreds-of-thousands ceiling.

Pick the best fit

An order-events topic (keyed by orderId, 6 partitions) is lagging at peak and you need more consumer throughput. Per-order ordering must hold. Pick the move.

Quiz

A topic has 8 partitions. A consumer group running 12 consumers is still lagging. What's the real constraint?

Quiz

Why does raising a keyed topic from 6 to 12 partitions risk breaking ordering?

Order the steps

Order what happens when a producer sends a keyed record and a consumer group reads it:

1 Producer computes hash(key) % N to pick one partition
2 Record is appended to that partition's log at the next offset
3 The group assigns that partition to exactly one consumer
4 That consumer reads the partition in offset order, seeing the key's history in sequence
5 If the consumer dies, a rebalance reassigns its partitions to another member

Recall before you leave

01
A teammate wants to bump a live keyed topic from 6 to 24 partitions to fix consumer lag. Explain the risk and the safer path.
02
What is a stop-the-world rebalance, why does it hurt, and what reduces it?

Recap

A Kafka topic is N independent partitions, and the default partitioner sends each keyed record to hash(key) % N — so a key always lands on the same partition, which is why ordering is guaranteed only within a partition and never across the topic. On the read side a consumer group gives each partition to exactly one consumer, so the partition count is a hard ceiling on consumer parallelism: extra consumers just sit idle. That makes the count a capacity decision, and a near-permanent one — you can raise it but never lower it, and raising it rehashes most keys, splitting each key’s history across old and new partitions and silently breaking per-key ordering (and corrupting Kafka Streams state). So you over-provision up front or migrate to a new topic rather than resize in place. In production the pain shows up as hot-partition skew (one key pegs one consumer while the rest idle — fix at the key level) and rebalances (stop-the-world pauses of seconds, tamed by incremental cooperative rebalancing and static membership). A partition does tens of MB/s, but past ~1,000 partitions per broker throughput and p99 degrade, and on legacy ZooKeeper controller failover ran 5–7 seconds with thousands of partitions — KRaft cuts that under a second. The whole game is choosing N to buy parallelism and ordering you can live with for a long time. Now when you see lag climbing on a keyed topic, you’ll know to check the partition count, the consumer count, the key distribution, and the rebalance logs — in that order — before touching any config.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 5 done

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.

Apply this

Put this lesson to work on a real build.

Collaborative cursorsShow every connected user's live cursor and selection in a shared document, conflict-free, over WebSocket.