Queues, Streams, Eventing QUE · 06 · 01

Change-Data Capture: streaming the write-ahead log without filling the disk

CDC tails the database''''s replication log so every committed insert/update/delete becomes an event — lower latency than polling, no app changes, but the replication slot can grow WAL until the primary halts.

QUE Junior ◷ 16 min

Level

FoundationsJuniorMiddleSenior

Already know this unit? Take a 1-minute quick check →

3am page: “Primary database is read-only.” Not a crash — Postgres ran out of disk and stopped accepting writes to protect itself. The culprit was not the app. A Kafka Connect worker had OOM’d six hours earlier, the Debezium connector died, and its replication slot quietly pinned every WAL segment since. On a busy OLTP box producing ~30 GB of WAL per hour, the data volume went from 40% to full overnight. The fix was one line — drop the dead slot — but nobody knew the slot existed until the disk was gone.

By the end of this lesson you’ll know exactly why a stalled connector can take down your primary database, and what two controls to put in place before you ever ship CDC to production.

Reading the log instead of asking the table

Every relational database already writes a durable, ordered record of every committed change: the write-ahead log (WAL in Postgres, the binlog in MySQL). It exists for crash recovery and replication. Change-Data Capture hijacks that log for a second purpose — instead of polling a table (“any rows newer than my last cursor?”) or having the app dual-write to both the database and a message bus, a CDC connector tails the replication log and emits one event per row change.

The mechanism in Postgres: you create a logical replication slot backed by a logical-decoding output plugin (pgoutput, shipped since Postgres 10). The slot is a server-side bookmark — it records the LSN (log sequence number) the consumer has confirmed, and the server promises not to recycle WAL past that point. A connector like Debezium opens a replication connection, decodes the WAL into structured change events, and ships each one — typically to Kafka — as an insert, update, or delete with a before and after image of the row.

The payoff is real. Polling every 5 minutes means up to 5 minutes of latency and a row inserted-then-deleted between polls is missed entirely. Log-based CDC delivers changes within milliseconds of commit, captures every intermediate state, and adds zero query load to your tables because it reads the log, not the rows. And it needs no application change at all — the app keeps doing plain INSERT/UPDATE/DELETE.

The connector reads the WAL/binlog (not the tables), so events flow in LSN order. Delivery is at-least-once on resume — sinks must be idempotent.

The slot is a loaded gun pointed at your disk

Here is the part that turns a feature into a 3am outage. The slot’s promise — “I won’t recycle WAL you haven’t confirmed” — is unconditional. If the consumer stalls, lags, or dies, the server keeps every WAL segment since the slot’s restart_lsn, forever, even as new writes pile more on. WAL is not a small thing: a busy OLTP system can generate 20–50 GB of WAL per hour, so a dead slot can take a healthy disk from comfortable to full in under two hours.

A stalled slot only adds WAL — there is no decay term. At ~30 GB/hour, retention crosses a typical disk headroom in a couple of hours, which is why it fills overnight.

The insidious failure modes are the quiet ones:

Consumer down — a connector crash or Kafka Connect OOM stops confirmations; WAL grows until disk-full → Postgres rejects all writes → full outage.
Long-running transaction — logical decoding cannot advance past an open transaction, so restart_lsn is frozen and WAL accumulates even though the connector looks healthy.
A low-traffic database — counterintuitively, an idle DB can bloat too: with no commits, the slot never advances its confirmed position, so Debezium emits periodic heartbeats to nudge it forward.

What unites all three: the slot’s promise is unconditional and Postgres has no way to know whether the consumer is temporarily slow or dead for good. Without monitoring, you find out when the disk is gone.

Approach	Latency	App change	Main risk
Poll a table on a cursor	Up to the poll interval (minutes)	Query + cursor logic	Misses intra-interval changes; query load
App dual-write (DB + bus)	Immediate	Yes — every write path	No atomicity: one write succeeds, the other fails
Outbox + polling relay	Poll interval (often sub-second)	Yes — write to outbox table	Relay polling load; outbox table growth
Log-based CDC (Debezium)	Milliseconds	None	Slot retains WAL → disk-full halts the primary

The senior mitigation is two-layered: alert on slot lag (pg_replication_slots, watch the byte distance between confirmed_flush_lsn and the current WAL position) so you act before disk pressure, and cap the damage with max_slot_wal_keep_size so Postgres will invalidate a runaway slot rather than die. Capping has its own tradeoff — an invalidated slot means the connector must re-snapshot — but a re-snapshot beats a primary outage.

▸Why this works

CDC and the outbox pattern are not rivals; they compose. The outbox pattern solves atomicity — write your business row and an event row in the same transaction so they commit together. But you still need something to ship the outbox rows out. A polling relay is the simple option; pointing Debezium at the outbox table is the log-based one (Debezium even has an outbox event router). So “CDC vs outbox” is the wrong axis: the real choice is whether to capture your domain tables directly or capture a purpose-built outbox table that you control the shape of.

Snapshot, then stream — and the moment they meet

A connector starting on an existing database has a cold-start problem: the WAL only contains recent changes, but a downstream consumer needs the current state of every row. So Debezium does a snapshot then stream: read the whole table (or a filtered set) to seed initial state, then switch to tailing the log from the LSN captured at snapshot start.

The snapshot is where load and locking bite. A classic blocking snapshot on MySQL takes a global read lock (FLUSH TABLES WITH READ LOCK) for consistency, which can freeze writes on a large table for an uncomfortable window; MySQL 8.0.17+ with snapshot.locking.mode=minimal narrows that. Debezium’s incremental snapshot (the signal-based, watermark approach) is the modern answer: it snapshots in chunks while streaming continues, can be paused and resumed, and never takes a long lock — at the cost of more moving parts. For a multi-terabyte table, the difference is between a multi-hour write freeze and a background trickle.

Deletes, ordering, and exactly-once that isn’t

Three more sharp edges that bite in production:

Capturing deletes needs the right REPLICA IDENTITY. A Postgres DELETE only logs the primary key by default, so the change event’s before image is just the key. To capture the full pre-delete row you must set REPLICA IDENTITY FULL on the table — which makes every update log the entire old row, increasing WAL volume. Debezium also emits a tombstone (a null-value record after a delete) so compacted Kafka topics can drop the key entirely. Forget the tombstone semantics and your compacted topic keeps ghosts.

Ordering is per-key, not global. Debezium preserves order within a table/key by routing each key to the same Kafka partition. Across partitions there is no global order — so a consumer joining two tables cannot assume it sees their changes in commit order.

Delivery is at-least-once; design for it. The slot/LSN bookmark means that on crash the connector resumes from the last confirmed position — but it may re-emit events committed after that point. Debezium 2.x can do exactly-once to Kafka via Kafka transactions, but the moment your consumer reads and acts, you are back to at-least-once end-to-end. The only safe stance: consumers must be idempotent — dedupe on the event’s LSN/key, or make the downstream operation a no-op on replay.

Pick the best fit

You need a search index kept fresh from a high-write Postgres `orders` table, with no app code changes and sub-second freshness. Pick the capture approach.

Quiz

A Debezium connector's Kafka Connect worker OOMs and stays down for hours. What happens to the source Postgres?

Quiz

Why must a CDC consumer be idempotent?

Order the steps

Order how a Debezium connector starts capturing an existing Postgres table:

1 Create a logical replication slot + publication; record the current LSN
2 Snapshot: read existing rows to seed initial state (blocking, or incremental in chunks)
3 Switch to streaming: tail the WAL from the LSN captured at snapshot start
4 Decode each change into an insert/update/delete event with before/after images
5 Confirm consumed LSN back to the slot so Postgres can recycle old WAL

Recall before you leave

01
Explain to a teammate why a logical replication slot can take down the primary database, and what you'd put in place before shipping CDC.
02
Why is CDC delivery effectively at-least-once, and how does that change how you write the consumer?

Recap

Change-Data Capture turns the database’s own write-ahead log into an event stream: instead of polling a table or dual-writing from the app, a connector like Debezium tails the WAL via a logical replication slot and emits every insert, update, and delete within milliseconds of commit, with no application change and no query load on your tables. That power comes attached to one dangerous mechanism — the slot retains WAL until the consumer confirms it, so a stalled, lagging, or dead consumer (or a long-running transaction) grows WAL until the disk fills and the primary stops accepting writes, a real outage class on busy systems doing tens of GB of WAL per hour. You manage it by alerting on slot lag and capping with max_slot_wal_keep_size. Starting up means snapshot-then-stream, where blocking snapshots can lock large tables and incremental snapshots trade simplicity for a lock-free background load. Deletes need the right REPLICA IDENTITY and tombstone handling, ordering is per-key not global, and because resume is at-least-once your consumers must be idempotent. Now when you see a new CDC connector being proposed, your first questions are: what’s the alert on slot lag, and what’s the max_slot_wal_keep_size? Those two controls are what separate CDC as a feature from CDC as a 3am outage.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 5 done

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.