awesome-everything RU
↑ Back to the climb

Queues, Streams, Eventing

Kafka partitions: the unit of parallelism, ordering, and a one-way door

Crux A topic splits into N partitions; the key''''s hash picks one; ordering holds only within a partition; one consumer per partition caps your parallelism. Partition count is a decision you can''''t take back.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at junior altitude — the surface
◷ 17 min

The order-events topic was lagging, so someone bumped it from 6 partitions to 12 to “add throughput.” It worked for ten minutes. Then support flooded in: a cancel for an order arrived and got processed before the create. Same orderId, two events, now landing on different partitions because hash(orderId) % 12 no longer equals hash(orderId) % 6. Per-key ordering — the one guarantee the whole pipeline relied on — silently evaporated the instant the partition count changed. There is no rollback that puts the old keys back.

A topic is N independent logs, and the key picks the log

A Kafka topic is not one stream — it is N partitions, each an independent append-only log with its own offsets. When a producer sends a keyed record, the default partitioner computes murmur2(key) % N to choose exactly which partition the record lands in. Same key, same partition, every time — as long as N never changes. Records with no key are spread across partitions (round-robin or sticky batching) since they have nothing to pin them.

This single mechanism is doing three jobs at once, and conflating them is where designs go wrong:

  • Distribution — spreading load across brokers and disks.
  • Ordering — Kafka guarantees order only within a single partition, never across the topic. There is no global order.
  • Co-location — every record for one key is in one place, so a consumer sees that key’s full history in sequence.

So “ordering” in Kafka really means per-key ordering, and you get it for free only because all of a key’s records hash to the same partition. The choice of key is therefore the choice of what stays ordered: key by orderId and an order’s events are sequenced; key by customerId and you serialize everything for a customer (and risk one big customer overwhelming a partition — more on that below).

Consumer groups: one partition, one consumer

On the read side, a consumer group splits the partitions among its members so each partition is owned by exactly one consumer in the group at a time. That assignment is the source of Kafka’s parallelism — and its hardest cap. If a topic has 6 partitions, at most 6 consumers in a group do useful work. Start a 7th and it sits idle with nothing assigned. Your maximum consumer parallelism is permanently bounded by the partition count.

This is why partition count is a capacity-planning decision, not a tuning knob. You pick N up front to cover the parallelism you’ll need at peak, because raising it later is the dangerous move this lesson keeps circling back to.

If you have……thenConsequence
6 partitions, 4 consumersSome consumers own 2 partitionsFine — headroom to scale to 6
6 partitions, 6 consumers1:1 mappingMax parallelism reached
6 partitions, 8 consumers2 consumers get nothingIdle capacity — partitions cap you
A hot key on 1 partition1 consumer overloaded, others idleSkew — adding consumers won’t help

The one-way door: changing N rehashes everything

You can increase a topic’s partition count. You cannot decrease it, and increasing it is rarely as cheap as it looks. The problem is the partitioner: hash(key) % N is a function of N. Change N from 6 to 12 and the modulo result changes for most keys, so a key that always went to partition 2 now goes to partition 8. Future records for that key route to the new partition while its history sits in the old one — the two streams are no longer ordered relative to each other, and any consumer that built per-key state (a running balance, a state-machine position) now reads a split, out-of-order history.

For stateful Kafka Streams apps this is worse than a hiccup: the changelog/state store is keyed by partition, so a partition-count change can effectively corrupt the local state. The standard senior move is therefore not to resize in place. You over-provision partitions up front for expected growth, or — when you truly must scale — you create a new topic with the larger count and migrate (dual-write to old and new, drain the old, cut consumers over). It is a migration, not a config edit.

Why this works

Why can’t you decrease partitions? A partition is an ordered log with committed offsets and possibly compacted state. Merging two logs would have to interleave their records into one timeline, but there is no correct interleaving — their offsets and orderings are independent. Kafka refuses rather than silently destroy ordering, so the count only ratchets upward. Treat it as permanent.

Hot partitions, rebalances, and the cost of too many

Two production failure modes dominate. The first is skew: if one key (a whale customer, a viral product) dominates traffic, all of it hashes to one partition, so one consumer is pegged at 100% while the rest idle. More partitions and more consumers do nothing — the bottleneck is a single key on a single partition. The fix is at the key level (salt the key, split the hot entity), not the partition level.

The second is rebalancing. When a consumer joins, leaves, or is presumed dead (a missed heartbeat, a long GC pause, a slow deploy), the group reassigns partitions. The classic “eager” protocol is stop-the-world: every consumer revokes all its partitions and consumption halts across the whole group for the duration — often seconds, sometimes longer in a rebalance storm. Two mitigations matter: incremental cooperative rebalancing (Kafka 2.4+) reassigns only the partitions that must move so most consumers keep processing, and static membership (KIP-345) gives each consumer a stable group.instance.id so a quick restart doesn’t trigger a rebalance at all.

Numbers ground the tradeoff. A single partition handles roughly tens of MB/s, so throughput scales with partition count — up to a point. Push past ~1,000 partitions per broker and producer throughput and p99 latency degrade sharply (Confluent’s own tests show throughput collapsing as you climb from hundreds toward 10K partitions). Under the legacy ZooKeeper metadata plane, a broker failure meant leader elections that scaled with partition count, pushing controller failover into the 5–7 second range with thousands of partitions; KRaft cut that to under a second and lifts the practical ceiling into the hundreds of thousands of partitions per cluster. The senior balance: enough partitions for parallelism and headroom, not so many that failover latency and per-partition overhead dominate.

Pick the best fit

An order-events topic (keyed by orderId, 6 partitions) is lagging at peak and you need more consumer throughput. Per-order ordering must hold. Pick the move.

Quiz

A topic has 8 partitions. A consumer group running 12 consumers is still lagging. What's the real constraint?

Quiz

Why does raising a keyed topic from 6 to 12 partitions risk breaking ordering?

Order the steps

Order what happens when a producer sends a keyed record and a consumer group reads it:

  1. 1 Producer computes hash(key) % N to pick one partition
  2. 2 Record is appended to that partition's log at the next offset
  3. 3 The group assigns that partition to exactly one consumer
  4. 4 That consumer reads the partition in offset order, seeing the key's history in sequence
  5. 5 If the consumer dies, a rebalance reassigns its partitions to another member
Recall before you leave
  1. 01
    A teammate wants to bump a live keyed topic from 6 to 24 partitions to fix consumer lag. Explain the risk and the safer path.
  2. 02
    What is a stop-the-world rebalance, why does it hurt, and what reduces it?
Recap

A Kafka topic is N independent partitions, and the default partitioner sends each keyed record to hash(key) % N — so a key always lands on the same partition, which is why ordering is guaranteed only within a partition and never across the topic. On the read side a consumer group gives each partition to exactly one consumer, so the partition count is a hard ceiling on consumer parallelism: extra consumers just sit idle. That makes the count a capacity decision, and a near-permanent one — you can raise it but never lower it, and raising it rehashes most keys, splitting each key’s history across old and new partitions and silently breaking per-key ordering (and corrupting Kafka Streams state). So you over-provision up front or migrate to a new topic rather than resize in place. In production the pain shows up as hot-partition skew (one key pegs one consumer while the rest idle — fix at the key level) and rebalances (stop-the-world pauses of seconds, tamed by incremental cooperative rebalancing and static membership). A partition does tens of MB/s, but past ~1,000 partitions per broker throughput and p99 degrade, and on legacy ZooKeeper controller failover ran 5–7 seconds with thousands of partitions — KRaft cuts that under a second. The whole game is choosing N to buy parallelism and ordering you can live with for a long time.

Continue the climb ↑Kafka partitions: multiple-choice review
shortcuts expand
search
K
prev piece
k
next piece
j
cycle tier
t
this menu
?
sources4
expand
  1. 01
  2. 02
  3. 03
  4. 04

Trademarks belong to their respective owners. Editorial reference only.