Data Engineering DATA · 02 · 07

ELT vs ETL: multiple-choice review

Multiple-choice synthesis across the ELT-vs-ETL unit: where the Transform runs, replayability, warehouse cost, idempotency, and the medallion contract.

DATA Senior ◷ 13 min

Level

FoundationsJuniorMiddleSenior

Six questions that cut across the whole unit. Each one is a decision you actually make designing a pipeline — not a definition to recite, but a tradeoff to weigh against cost, replay, and compliance.

Goal

Confirm you can connect where the Transform runs to its downstream consequences: replayability, the warehouse bill, schema discipline, and the idempotency that keeps a retry from doubling your data.

Quiz

What single architectural change in cloud warehouses (Snowflake, BigQuery) is the real reason the industry flipped from ETL to ELT?

Quiz

You discover a timezone bug in a transform that has shipped wrong numbers for six months. Under ELT with a medallion architecture, what is the fast, correct fix?

Quiz

A dbt model was set to full-refresh by default and scheduled hourly; it rebuilds a 2 TB fact table from scratch every run and the Snowflake bill jumped 40%. The output is correct. Where is the bug and what is the fix?

Quiz

A regulated fintech ingests payment events containing card PANs, and compliance forbids raw cardholder data from ever sitting in the analytics warehouse. Which pattern fits, and why is the modern ELT default wrong here?

Quiz

An EL tool retried a partially-succeeded load and your revenue fact table now shows inflated totals. What property was missing, and what is the durable design fix?

Quiz

Someone calls schema-on-read 'pure freedom — no schema to fight at load time.' What does the unit's framing say they are missing?

Recap

The through-line: where the Transform runs decides everything downstream. Decoupled storage/compute made landing raw cheap, which buys replayability through the medallion contract (immutable bronze, cleaned silver, business-ready gold). But the T now meters on the warehouse bill, so you go incremental by default. And because loaders retry, every load must be idempotent — merge on a unique_key — or a retry doubles your data. ELT is the default; ETL still wins when a hard rule says raw PII must never touch the warehouse. Now when you’re reviewing a pipeline design or a dbt model, your first three questions are: where does the transform run, is it incremental, and does it have a unique_key — because those three answers predict the bill, the data quality, and the retry safety.

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.