Data Engineering
OLTP vs OLAP: free-recall review
Retrieval beats re-reading. For each prompt, say or write a full answer from memory before you open the model answer — the effort of recall is what makes the workload split stick.
Reconstruct the unit’s spine — why layout follows access pattern, the three column-store multipliers, why an index can’t save an aggregate, the replica outage, and the store-split fix — without looking back at the lesson.
- 01Why do OLTP and OLAP need opposite physical layouts, and what is each?
- 02Name the three multipliers that make a column store beat a row store on analytical queries, and roughly how big the combined gap is.
- 03A teammate asks why you can't just add an index to make the 90-day revenue aggregate fast on Postgres. Explain.
- 04Why is running the analytics dashboard on the prod read replica an outage waiting to happen, and what's the correct architecture?
- 05Why does the OLTP schema stay normalized while the OLAP target is denormalized, and why is that not a contradiction?
- 06What is HTAP, and why doesn't it eliminate the need to think about the OLTP/OLAP split?
If you could reconstruct each answer from memory, you hold the unit’s spine: access pattern dictates layout (row store for point writes, column store for scans); column stores win by pruning, 5–10× compression, and vectorization stacking into a 10×–1000× gap no index can close; OLTP normalizes while OLAP denormalizes because each matches its workload; analytics on the prod replica is an outage because one scan evicts the buffer cache, contends for I/O, and drives lag; and the fix is a separate columnar store fed by CDC/ETL, with HTAP managing — never abolishing — the contention.