Data Engineering
Data platform: free-recall review
Retrieval beats re-reading. For each prompt, reconstruct a full answer from memory before you open the model answer — the effort of pulling the whole track together is what fuses the separate stores into one mental model.
Reconstruct the track’s spine without looking back: why OLTP and OLAP split, where the transform runs, how Parquet prunes, what an MV trades, why the log is the source of truth, and how search and vectors divide relevance — and how the seams between them are designed.
- 01Why can't one storage layout serve both the OLTP checkout path and OLAP revenue scans, and what is the standard two-store answer?
- 02What does choosing ELT over ETL actually buy you, and what is the cost?
- 03How does Parquet skip data, and what code mistake silently defeats it?
- 04What does a materialized view trade, and how do you keep that tradeoff honest in production?
- 05Why is the append-only event log the source of truth in event sourcing, and how does current state relate to it?
- 06How do full-text search and vector search divide the relevance problem, and why do mature systems run both?
If you reconstructed each answer, you hold the track’s spine: OLTP and OLAP split because no layout wins both access patterns; ELT buys replay by retaining raw; Parquet prunes via footer stats unless a function hides the column; an MV trades staleness for read speed and needs a declared freshness SLA; the event log is the source of truth and state is a fold over it; and search plus vectors divide lexical from semantic relevance. Above all, each store is correct for its job — the system stays correct only when you design the contract, delivery guarantee, freshness, and reconciliation at every seam between them.