Data Engineering DATA · 04 · 09

Materialized views: SQL and refresh reading

Read real SQL — REFRESH CONCURRENTLY setup, pg_ivm immv creation, and a staleness-window check — and pick the behaviour or the highest-leverage fix a senior would make first.

DATA Senior ◷ 14 min

Level

FoundationsJuniorMiddleSenior

The refresh statement and the SQL around it are where materialized-view problems are actually born. Read each snippet, predict what the database will do, then choose the fix a senior reaches for first.

Goal

Practise the loop you run when reviewing an MV: read the DDL and the refresh path, predict the lock and staleness behaviour, and reach for the structural fix before the tuning knob.

Snippet 1 — the CONCURRENTLY prerequisite

CREATE MATERIALIZED VIEW daily_revenue AS
SELECT date_trunc('day', created_at) AS day,
       region,
       sum(amount) AS revenue
FROM orders
GROUP BY 1, 2;

-- nightly job:
REFRESH MATERIALIZED VIEW CONCURRENTLY daily_revenue;

Quiz

The nightly job fails immediately with 'cannot refresh materialized view concurrently' / a unique-index error. What is missing, and what exactly must it be?

Snippet 2 — incremental maintenance with pg_ivm

CREATE EXTENSION IF NOT EXISTS pg_ivm;

-- turn the aggregate into an incrementally-maintained view
SELECT create_immv('daily_revenue_ivm',
  $$ SELECT date_trunc('day', created_at) AS day,
            region,
            sum(amount) AS revenue
     FROM orders
     GROUP BY 1, 2 $$);

Quiz

After create_immv, a teammate notices that every INSERT into orders got measurably slower. Is this a bug, and what is happening?

Snippet 3 — detecting staleness

-- ops wants an alert if the MV has not refreshed recently.
-- they query the view's own data for the newest day:
SELECT max(day) AS newest_day FROM daily_revenue;
-- alert fires only if newest_day < current_date;

Quiz

Why is alerting on max(day) inside the view an unreliable staleness check, and what should you track instead?

Snippet 4 — the refresh that never finishes

-- a long-running analytics session is open:
BEGIN;
SELECT ... FROM orders JOIN ... ;   -- runs for 20+ minutes, still open

-- meanwhile, the scheduler fires:
REFRESH MATERIALIZED VIEW CONCURRENTLY daily_revenue;  -- appears to hang

Quiz

The CONCURRENTLY refresh sits and does nothing while the analytics session is open. What is the mechanism, and what is the durable fix?

Recap

The refresh path is where MV problems live: CONCURRENTLY needs a unique index on plain view columns or it errors; pg_ivm’s create_immv moves cost onto every write via triggers, so a per-write slowdown is the design, not a bug; staleness must be measured from the refresh job’s own success and timestamp, never from the view’s data content; and a CONCURRENTLY refresh stalls behind long open transactions, so bound them or it never catches up. Read the DDL and the refresh, predict the lock and staleness behaviour, then fix the structure before reaching for a knob.

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.