awesome-everything RU
↑ Back to the climb

Queues, Streams, Eventing

Change-Data Capture: streaming the write-ahead log without filling the disk

Crux CDC tails the database''''s replication log so every committed insert/update/delete becomes an event — lower latency than polling, no app changes, but the replication slot can grow WAL until the primary halts.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at junior altitude — the surface
◷ 16 min

3am page: “Primary database is read-only.” Not a crash — Postgres ran out of disk and stopped accepting writes to protect itself. The culprit was not the app. A Kafka Connect worker had OOM’d six hours earlier, the Debezium connector died, and its replication slot quietly pinned every WAL segment since. On a busy OLTP box producing ~30 GB of WAL per hour, the data volume went from 40% to full overnight. The fix was one line — drop the dead slot — but nobody knew the slot existed until the disk was gone.

Reading the log instead of asking the table

Every relational database already writes a durable, ordered record of every committed change: the write-ahead log (WAL in Postgres, the binlog in MySQL). It exists for crash recovery and replication. Change-Data Capture hijacks that log for a second purpose — instead of polling a table (“any rows newer than my last cursor?”) or having the app dual-write to both the database and a message bus, a CDC connector tails the replication log and emits one event per row change.

The mechanism in Postgres: you create a logical replication slot backed by a logical-decoding output plugin (pgoutput, shipped since Postgres 10). The slot is a server-side bookmark — it records the LSN (log sequence number) the consumer has confirmed, and the server promises not to recycle WAL past that point. A connector like Debezium opens a replication connection, decodes the WAL into structured change events, and ships each one — typically to Kafka — as an insert, update, or delete with a before and after image of the row.

The payoff is real. Polling every 5 minutes means up to 5 minutes of latency and a row inserted-then-deleted between polls is missed entirely. Log-based CDC delivers changes within milliseconds of commit, captures every intermediate state, and adds zero query load to your tables because it reads the log, not the rows. And it needs no application change at all — the app keeps doing plain INSERT/UPDATE/DELETE.

The slot is a loaded gun pointed at your disk

Here is the part that turns a feature into a 3am outage. The slot’s promise — “I won’t recycle WAL you haven’t confirmed” — is unconditional. If the consumer stalls, lags, or dies, the server keeps every WAL segment since the slot’s restart_lsn, forever, even as new writes pile more on. WAL is not a small thing: a busy OLTP system can generate 20–50 GB of WAL per hour, so a dead slot can take a healthy disk from comfortable to full in under two hours.

The insidious failure modes are the quiet ones:

  • Consumer down — a connector crash or Kafka Connect OOM stops confirmations; WAL grows until disk-full → Postgres rejects all writes → full outage.
  • Long-running transaction — logical decoding cannot advance past an open transaction, so restart_lsn is frozen and WAL accumulates even though the connector looks healthy.
  • A low-traffic database — counterintuitively, an idle DB can bloat too: with no commits, the slot never advances its confirmed position, so Debezium emits periodic heartbeats to nudge it forward.
ApproachLatencyApp changeMain risk
Poll a table on a cursorUp to the poll interval (minutes)Query + cursor logicMisses intra-interval changes; query load
App dual-write (DB + bus)ImmediateYes — every write pathNo atomicity: one write succeeds, the other fails
Outbox + polling relayPoll interval (often sub-second)Yes — write to outbox tableRelay polling load; outbox table growth
Log-based CDC (Debezium)MillisecondsNoneSlot retains WAL → disk-full halts the primary

The senior mitigation is two-layered: alert on slot lag (pg_replication_slots, watch the byte distance between confirmed_flush_lsn and the current WAL position) so you act before disk pressure, and cap the damage with max_slot_wal_keep_size so Postgres will invalidate a runaway slot rather than die. Capping has its own tradeoff — an invalidated slot means the connector must re-snapshot — but a re-snapshot beats a primary outage.

Why this works

CDC and the outbox pattern are not rivals; they compose. The outbox pattern solves atomicity — write your business row and an event row in the same transaction so they commit together. But you still need something to ship the outbox rows out. A polling relay is the simple option; pointing Debezium at the outbox table is the log-based one (Debezium even has an outbox event router). So “CDC vs outbox” is the wrong axis: the real choice is whether to capture your domain tables directly or capture a purpose-built outbox table that you control the shape of.

Snapshot, then stream — and the moment they meet

A connector starting on an existing database has a cold-start problem: the WAL only contains recent changes, but a downstream consumer needs the current state of every row. So Debezium does a snapshot then stream: read the whole table (or a filtered set) to seed initial state, then switch to tailing the log from the LSN captured at snapshot start.

The snapshot is where load and locking bite. A classic blocking snapshot on MySQL takes a global read lock (FLUSH TABLES WITH READ LOCK) for consistency, which can freeze writes on a large table for an uncomfortable window; MySQL 8.0.17+ with snapshot.locking.mode=minimal narrows that. Debezium’s incremental snapshot (the signal-based, watermark approach) is the modern answer: it snapshots in chunks while streaming continues, can be paused and resumed, and never takes a long lock — at the cost of more moving parts. For a multi-terabyte table, the difference is between a multi-hour write freeze and a background trickle.

Deletes, ordering, and exactly-once that isn’t

Three more sharp edges that bite in production:

Capturing deletes needs the right REPLICA IDENTITY. A Postgres DELETE only logs the primary key by default, so the change event’s before image is just the key. To capture the full pre-delete row you must set REPLICA IDENTITY FULL on the table — which makes every update log the entire old row, increasing WAL volume. Debezium also emits a tombstone (a null-value record after a delete) so compacted Kafka topics can drop the key entirely. Forget the tombstone semantics and your compacted topic keeps ghosts.

Ordering is per-key, not global. Debezium preserves order within a table/key by routing each key to the same Kafka partition. Across partitions there is no global order — so a consumer joining two tables cannot assume it sees their changes in commit order.

Delivery is at-least-once; design for it. The slot/LSN bookmark means that on crash the connector resumes from the last confirmed position — but it may re-emit events committed after that point. Debezium 2.x can do exactly-once to Kafka via Kafka transactions, but the moment your consumer reads and acts, you are back to at-least-once end-to-end. The only safe stance: consumers must be idempotent — dedupe on the event’s LSN/key, or make the downstream operation a no-op on replay.

Pick the best fit

You need a search index kept fresh from a high-write Postgres `orders` table, with no app code changes and sub-second freshness. Pick the capture approach.

Quiz

A Debezium connector's Kafka Connect worker OOMs and stays down for hours. What happens to the source Postgres?

Quiz

Why must a CDC consumer be idempotent?

Order the steps

Order how a Debezium connector starts capturing an existing Postgres table:

  1. 1 Create a logical replication slot + publication; record the current LSN
  2. 2 Snapshot: read existing rows to seed initial state (blocking, or incremental in chunks)
  3. 3 Switch to streaming: tail the WAL from the LSN captured at snapshot start
  4. 4 Decode each change into an insert/update/delete event with before/after images
  5. 5 Confirm consumed LSN back to the slot so Postgres can recycle old WAL
Recall before you leave
  1. 01
    Explain to a teammate why a logical replication slot can take down the primary database, and what you'd put in place before shipping CDC.
  2. 02
    Why is CDC delivery effectively at-least-once, and how does that change how you write the consumer?
Recap

Change-Data Capture turns the database’s own write-ahead log into an event stream: instead of polling a table or dual-writing from the app, a connector like Debezium tails the WAL via a logical replication slot and emits every insert, update, and delete within milliseconds of commit, with no application change and no query load on your tables. That power comes attached to one dangerous mechanism — the slot retains WAL until the consumer confirms it, so a stalled, lagging, or dead consumer (or a long-running transaction) grows WAL until the disk fills and the primary stops accepting writes, a real outage class on busy systems doing tens of GB of WAL per hour. You manage it by alerting on slot lag and capping with max_slot_wal_keep_size. Starting up means snapshot-then-stream, where blocking snapshots can lock large tables and incremental snapshots trade simplicity for a lock-free background load. Deletes need the right REPLICA IDENTITY and tombstone handling, ordering is per-key not global, and because resume is at-least-once your consumers must be idempotent. Get the operational discipline right and CDC is the lowest-latency, least-invasive way to fan database changes out to the rest of your system.

Continue the climb ↑Change data capture: multiple-choice review
shortcuts expand
search
K
prev piece
k
next piece
j
cycle tier
t
this menu
?
sources4
expand
  1. 01
  2. 02
  3. 03
  4. 04

Trademarks belong to their respective owners. Editorial reference only.