Data Engineering DATA · 08 · 07

Data platform: multiple-choice review

Cross-track synthesis: pick the right store, format, and contract per workload across OLTP/OLAP, ELT, Parquet, MVs, event sourcing, search, and vectors.

DATA Senior ◷ 13 min

Level

FoundationsJuniorMiddleSenior

Six questions that cut across the whole track. Each one is a design call you make when one fact has to live in several stores at once — not a definition, but a choice of store, format, or contract under a real workload.

Goal

Confirm you can route a workload to the right store and layout, and reason about the seams between them — the synthesis the OLTP/OLAP, ELT, Parquet, MV, event-sourcing, search, and vector units all built toward.

Quiz

A single product fact must serve point lookups in the checkout path AND a full-table revenue scan for analytics. What is the senior architecture?

Quiz

Your team is choosing between ETL (transform in a separate engine before load) and ELT (load raw, transform in-warehouse with dbt). The data is messy and the business keeps changing the definition of 'active user'. Which fits, and why?

Quiz

A nightly dashboard query filters event_date = '2026-05-01' AND country = 'US' over a 2 TB Parquet/Iceberg table and still scans most of the data. What is the highest-leverage fix?

Quiz

A gold materialized view serving a dashboard refreshes every 6 hours. A finance lead complains the number 'is wrong' versus a live SQL count. Both are internally correct. What did the design get wrong?

Quiz

A service writes an order to Postgres, then publishes 'OrderPlaced' to Kafka so search and analytics react. Occasionally the search index never learns about an order. What is the root cause and the fix?

Quiz

Catalog search must match misspelled product names AND the RAG assistant must answer 'a laptop good for video editing'. One team proposes using the vector index for both. What is the correct split?

Recap

The track’s through-line is one habit: route each workload to the store and layout that fits it, then design the contract at every seam. Row-store OLTP for point writes, columnar Parquet for scans (with footer-stat pruning), ELT over retained raw for replayable definitions, MVs for read latency with a declared freshness SLA, an outbox to kill the dual-write, inverted indexes for lexical search, and vector ANN for semantic retrieval. Each store is correct for its job; the system stays correct only when you own the schema, delivery guarantee, freshness SLA, and reconciliation between them. Now when you see a design question about “one fact, multiple stores,” your first move is to name each seam and answer those four contract questions — before touching the stores themselves.

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.