Queues, Streams, Eventing QUE · 06 · 09

Change data capture: config and query reading

Read a Debezium config, replication-slot SQL, and a slot-lag log line, predict the CDC behaviour, and pick the highest-leverage fix.

QUE Senior ◷ 14 min

Level

FoundationsJuniorMiddleSenior

CDC problems are diagnosed in the connector config and the database’s own catalog views. Read the config, the SQL, and the monitoring line, then choose the fix a senior engineer would make first.

Goal

Practise the loop you run in every CDC incident: read the connector setup and the slot’s state in the catalog, predict where WAL retention or delivery breaks, and reach for the highest-leverage fix.

Snippet 1 — the Debezium connector config

{
  "name": "orders-cdc",
  "config": {
    "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
    "plugin.name": "pgoutput",
    "slot.name": "orders_slot",
    "publication.name": "orders_pub",
    "table.include.list": "public.orders",
    "snapshot.mode": "initial",
    "heartbeat.interval.ms": "0",
    "tombstones.on.delete": "true"
  }
}

Quiz

This connector captures a low-traffic orders table. Which setting is the latent disk-filler, and why?

Snippet 2 — inspecting the slot in the catalog

SELECT slot_name,
       active,
       pg_size_pretty(
         pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)
       ) AS retained_wal
FROM pg_replication_slots
WHERE slot_name = 'orders_slot';
--  slot_name   | active | retained_wal
-- -------------+--------+--------------
--  orders_slot | t      | 47 GB

Quiz

The slot is active = t but retaining 47 GB of WAL. What does this tell you, and what is the first move?

Snippet 3 — the slot-lag alert log line

WARN  slot=orders_slot active=true retained_wal_bytes=50465865728
      confirmed_flush_lsn=6F/A2000000 current_wal_lsn=7B/40000000
      disk_used_pct=88 -> ALERT: slot lag exceeds 40GB threshold

Quiz

This alert fires at 88% disk. Reading the fields, what is the correct interpretation and response order?

Snippet 4 — preparing a table for full delete capture

-- delete events currently carry only the primary key in `before`
ALTER TABLE public.orders REPLICA IDENTITY FULL;

Quiz

You run this so DELETE events carry the full pre-delete row. What is the side effect a senior engineer flags before merging?

Recap

Every CDC incident is read in config and catalog state: a zero heartbeat starves a low-traffic slot until WAL fills; an active slot can still retain tens of GB if a long transaction freezes restart_lsn, so diagnose pg_stat_activity before dropping anything; the LSN gap in pg_replication_slots is the number you alert on, with max_slot_wal_keep_size as the backstop; and REPLICA IDENTITY FULL buys full delete images at the cost of fatter updates. Read the slot’s state first, fix the cause of non-advancement, and treat the disk as the clock.

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.