Data Engineering
Materialized views: multiple-choice review
Six questions that cut across the whole unit. Each one is a decision you make when someone proposes “just materialize it” — not a definition to recite, but a tradeoff to weigh against a real workload.
Confirm you can connect read cost, refresh strategy, staleness budget, write-rate, and operational ownership — the synthesis the lesson built toward.
A plain view never goes stale, yet teams reach for a materialized view anyway. What is the single thing a materialized view buys, and what does it cost in exchange?
An analyst dashboard reads a materialized view constantly. Ops runs a nightly plain REFRESH MATERIALIZED VIEW (no CONCURRENTLY) on a 90-second aggregate. Users report the dashboard hangs every night. Why, and what is the fix?
Your team switched to REFRESH MATERIALIZED VIEW CONCURRENTLY to stop read-blocking. Refreshes now sometimes never complete and a backlog builds. What is the most likely cause?
A fact table ingests ~5,000 inserts/sec; analysts read an aggregate MV all day and tolerate ~5 minutes of staleness. Someone proposes pg_ivm incremental maintenance so the view is always fresh. Why is that the wrong call?
A materialized view showed yesterday's numbers all morning. The nightly refresh cron had failed on a lock timeout six days earlier and nothing alerted. What is the real lesson?
A ClickHouse incremental materialized view (MV-on-insert) reports wrong totals: rows that existed before the MV was created are missing, and some groups are split across rows. What explains both symptoms?
The through-line of the unit is one decision sequence: confirm the read is genuinely expensive and repeated, bound the staleness the consumer tolerates, then pick refresh to match the write-rate — scheduled REFRESH CONCURRENTLY for minute-scale staleness, incremental (pg_ivm) only for read-heavy low-write tables, streaming MVs to remove the refresh job entirely. Every strategy carries a hidden cost: CONCURRENTLY needs a unique index and waits on old transactions, incremental taxes every write, ClickHouse MVs see only the inserted block. And the failure mode that recurs everywhere is the same — an unmonitored refresh serves stale data while the system looks green, so own it and alert on it.