awesome-everything RU
↑ Back to the climb

Queues, Streams, Eventing

Change data capture: config and query reading

Crux Read a Debezium config, replication-slot SQL, and a slot-lag log line, predict the CDC behaviour, and pick the highest-leverage fix.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at senior altitude — in orbit
◷ 14 min

CDC problems are diagnosed in the connector config and the database’s own catalog views. Read the config, the SQL, and the monitoring line, then choose the fix a senior engineer would make first.

Goal

Practise the loop you run in every CDC incident: read the connector setup and the slot’s state in the catalog, predict where WAL retention or delivery breaks, and reach for the highest-leverage fix.

Snippet 1 — the Debezium connector config

{
  "name": "orders-cdc",
  "config": {
    "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
    "plugin.name": "pgoutput",
    "slot.name": "orders_slot",
    "publication.name": "orders_pub",
    "table.include.list": "public.orders",
    "snapshot.mode": "initial",
    "heartbeat.interval.ms": "0",
    "tombstones.on.delete": "true"
  }
}
Quiz

This connector captures a low-traffic orders table. Which setting is the latent disk-filler, and why?

Snippet 2 — inspecting the slot in the catalog

SELECT slot_name,
       active,
       pg_size_pretty(
         pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)
       ) AS retained_wal
FROM pg_replication_slots
WHERE slot_name = 'orders_slot';
--  slot_name   | active | retained_wal
-- -------------+--------+--------------
--  orders_slot | t      | 47 GB
Quiz

The slot is active = t but retaining 47 GB of WAL. What does this tell you, and what is the first move?

Snippet 3 — the slot-lag alert log line

WARN  slot=orders_slot active=true retained_wal_bytes=50465865728
      confirmed_flush_lsn=6F/A2000000 current_wal_lsn=7B/40000000
      disk_used_pct=88 -> ALERT: slot lag exceeds 40GB threshold
Quiz

This alert fires at 88% disk. Reading the fields, what is the correct interpretation and response order?

Snippet 4 — preparing a table for full delete capture

-- delete events currently carry only the primary key in `before`
ALTER TABLE public.orders REPLICA IDENTITY FULL;
Quiz

You run this so DELETE events carry the full pre-delete row. What is the side effect a senior engineer flags before merging?

Recap

Every CDC incident is read in config and catalog state: a zero heartbeat starves a low-traffic slot until WAL fills; an active slot can still retain tens of GB if a long transaction freezes restart_lsn, so diagnose pg_stat_activity before dropping anything; the LSN gap in pg_replication_slots is the number you alert on, with max_slot_wal_keep_size as the backstop; and REPLICA IDENTITY FULL buys full delete images at the cost of fatter updates. Read the slot’s state first, fix the cause of non-advancement, and treat the disk as the clock.

Continue the climb ↑Change data capture: ship CDC and survive a stalled slot
shortcuts expand
search
K
prev piece
k
next piece
j
cycle tier
t
this menu
?
sources2
expand
  1. 01
  2. 02

Trademarks belong to their respective owners. Editorial reference only.