awesome-everything RU
↑ Back to the climb

Queues, Streams, Eventing

Kafka partitions: code and config reading

Crux Read real producer code, consumer config, partition math, and a rebalance log, predict the behaviour, and pick the highest-leverage fix.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at senior altitude — in orbit
◷ 14 min

Partition bugs are diagnosed in producer code, consumer config, and rebalance logs — not in prose. Read each snippet, predict what Kafka does with it, and choose the fix a senior engineer would make first.

Goal

Practise the loop you run in every Kafka incident: read the producer key, the consumer config, the partition math, and the rebalance log, then reach for the fix that respects ordering and parallelism rather than papering over it.

Snippet 1 — the producer key

// Order events: created, paid, shipped, cancelled all flow through here
ProducerRecord<String, OrderEvent> record =
    new ProducerRecord<>("order-events", event.getType(), event);
//                                       ^^^^^^^^^^^^^^^^^
//                                       key = event TYPE, not order id
producer.send(record);
Quiz

A consumer must process each order's events in order (created before cancelled). With this keying, what happens, and what is the fix?

Snippet 2 — the partition math

# Default murmur2-style partitioner: partition = hash(key) % N
def partition_for(key, N):
    return hash(key) % N

# orderId "A-4711" before and after a partition increase
partition_for("A-4711", 6)    # -> 2
partition_for("A-4711", 12)   # -> 8     # same key, different partition!
Quiz

A topic was raised from 6 to 12 partitions while live. Reading this math, what is the consequence for key A-4711, and why can't you undo it?

Snippet 3 — the consumer config

# Consumer group: payments-processor, 4 instances behind a rolling deploy
group.id=payments-processor
session.timeout.ms=10000
heartbeat.interval.ms=3000
# group.instance.id is NOT set
partition.assignment.strategy=org.apache.kafka.clients.consumer.RangeAssignor
Quiz

Every rolling deploy causes a multi-second consumption stall across the whole group. Reading this config, what is the cause, and what is the lowest-risk change?

Snippet 4 — the rebalance log

[Consumer clientId=c3, groupId=payments] (Re-)joining group
[Consumer clientId=c3, groupId=payments] Lost previously assigned partitions order-events-2, order-events-5
[Consumer clientId=c3] Revoke previously assigned partitions order-events-2, order-events-5
... 7 such cycles in 90 seconds ...
[Consumer clientId=c3] Member c3 sending LeaveGroup request due to consumer poll timeout has expired
Quiz

Reading this log — repeated join/revoke cycles ending in a poll-timeout LeaveGroup — what is the failure mode, and what is the first fix?

Recap

Every partition incident is read in code, config, and logs: the producer key decides what stays ordered (key by the entity, not the event type); hash(key) % N means a partition increase reroutes live keys and can never be undone; an eager assignor plus a missing group.instance.id makes every deploy a stop-the-world rebalance; and a join/revoke loop ending in poll-timeout is a rebalance storm from slow processing, not a broker or partition problem. Read the key and the config first, fix the structural cause, then confirm against lag and rebalance metrics.

Continue the climb ↑Kafka partitions: break and fix per-key ordering
shortcuts expand
search
K
prev piece
k
next piece
j
cycle tier
t
this menu
?
sources2
expand
  1. 01
  2. 02

Trademarks belong to their respective owners. Editorial reference only.