Queues, Streams, Eventing QUE · 01 · 04

Kafka exactly-once semantics: idempotent producer and transactions

How Kafka''''s idempotent producer eliminates producer-retry duplicates at 3% cost, and how transactions extend exactly-once across multi-partition writes and offset commits at 20–30% cost.

QUE Middle ◷ 12 min

Level

FoundationsJuniorMiddleSenior

A Kafka Streams job enriches order events and writes results to an output topic. During a broker rolling restart, the job retries a batch already written, producing 3,000 duplicate output records downstream. The fix was one config line: enable.idempotence=true. The feature existed since 2017. Nobody had turned it on.

Layer 1 — Idempotent producer (KIP-98, Kafka 0.11.0)

Why does this matter to you beyond “turn on the flag and forget”? Because each of the three layers costs something different and covers a different failure class — choosing the wrong layer means paying the high price while leaving the actual duplicate path open.

The idempotent producer eliminates duplicates caused by producer retries to the broker.

On startup, the producer is assigned a unique producer-ID by the broker. Every message sent to a partition carries a monotonically increasing sequence number scoped to that producer-ID and partition. The broker tracks the last-seen sequence number per (producer-ID, partition) pair.

When a producer retries — because the network dropped the ack — the broker sees the same (producer-ID, sequence) again. It recognises the duplicate, silently acks it, and discards the write. Out-of-order sequences (possible if a retry arrives late) are rejected.

Enabled by: enable.idempotence=true
Overhead: ~3% throughput
Available since: Kafka 0.11.0 (mid-2017)

This solves Leg 1 duplicates only. It does not help if the consumer crashes after processing.

Layer 2 — Transactional producer (KIP-98)

Transactions extend exactly-once to the full read-process-write pipeline within Kafka.

A transactional producer wraps a batch of writes across multiple partitions plus a consumer-offset commit into one atomic unit managed by a transaction coordinator broker. Either all writes and the offset commit become visible to downstream consumers, or none do.

Typical Kafka Streams pattern:

beginTransaction()
  consume from input-partition P0 at offset 42
  produce to output-partition P1
  sendOffsetsToTransaction(group-id, {P0: offset 43})
commitTransaction()

If the job crashes mid-flight, the transaction aborts on restart. The input offset is not advanced. The partial output is rolled back (consumers with isolation.level=read_committed skip aborted records). The job reprocesses from offset 42 — idempotent producer deduplicates the retry.

Enabled by: transactional.id=my-app-v1
Consumer must set: isolation.level=read_committed
Overhead: ~20–30% throughput (two-phase commit between coordinator and partition leaders)

An order of magnitude separates the two layers — paying the ~25% transaction cost to fix a duplicate the ~3% idempotent producer already covers is the classic over-spend.

The broker dedups the retried (PID, seq) write before it lands. The transaction then makes the output write and the offset commit atomic; a read_committed consumer sees the records only after the commit.

Kafka exactly-once: three layers

Idempotent producer

Eliminates: producer-retry duplicates (Leg 1)

Config: enable.idempotence=true | Cost: ~3%

Transactions

Eliminates: partial-write + offset-skip (within Kafka only)

Config: transactional.id + read_committed | Cost: ~20–30%

Consumer dedup

Eliminates: cross-system duplicates (Kafka → Postgres, Kafka → Stripe)

Config: ON CONFLICT DO NOTHING + Idempotency-Key | Cost: one DB write

What Kafka transactions do NOT cover

Kafka transactions are atomic within Kafka. The moment you write to Postgres or call Stripe, you leave the transaction boundary. If the Kafka offset commits but the Postgres write fails (or vice versa), you have a partial-write gap.

For Kafka-to-DB pipelines, the correct pattern is still consumer-side dedup with an idempotent DB write (ON CONFLICT DO NOTHING with the Kafka offset as part of the primary key). The 20–30% Kafka transaction cost is then replaced by one cheap DB unique-constraint check. Most production stream-to-DB pipelines choose this hybrid: at-least-once Kafka delivery + idempotent DB consumer.

Quiz

Kafka's idempotent producer eliminates which class of duplicates?

Quiz

A Kafka Streams job uses transactions and writes both to a Kafka output topic and a Postgres table. Does the Kafka transaction guarantee exactly-once for the Postgres write?

Recall before you leave

01
What two values does the Kafka broker track to deduplicate idempotent producer retries?
02
What does isolation.level=read_committed do on a Kafka consumer?
03
KIP-98 shipped in which Kafka version and year?

Recap

Kafka’s exactly-once semantics are built in three layers. The idempotent producer (enable.idempotence=true, ~3% cost) assigns producer-IDs and per-partition sequence numbers so the broker can deduplicate retries transparently. Transactions (transactional.id + read_committed consumers, ~20–30% cost) wrap multi-partition writes and offset commits into one atomic unit managed by a coordinator broker, enabling the full read-process-write-commit cycle inside Kafka to be exactly-once. But transactions are scoped to Kafka: any write to Postgres or external API exits the transaction boundary and requires its own idempotency mechanism — typically an ON CONFLICT DO NOTHING with the Kafka offset as part of the key. Now when you see a Kafka pipeline writing to both a topic and a database, you will know immediately that the Kafka transaction alone is not enough — the DB write needs its own idempotency layer.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 5 done

Connected lessons

builds on

Consumer-side dedup: the cheapest path to exactly-once processingmiddle

unlocks

Exactly-once in production: impossibility proof, hybrid patterns, and real incidentssenior

deepens into

Exactly-once in production: impossibility proof, hybrid patterns, and real incidentssenior

appears again in204

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.