Data Engineering
OLTP vs OLAP: multiple-choice review
Six questions that cut across the whole unit. Each one mirrors a decision you make in a real architecture review — not a definition to recite, but a tradeoff to weigh when the dashboard and the checkout path share a database.
Confirm you can connect physical layout, the column-store multipliers, the replica-analytics failure mode, and the store-split decision — the synthesis the overview built toward.
A query runs SUM(total) over 40M orders rows that have 50 columns each. Why does a column store crush a row store on exactly this query?
Service A runs point lookups by id at 5k req/s; Service B runs a column aggregate over the same data. Why can't one storage layout serve both well?
On 100M-row ClickBench, ClickHouse stores the data in roughly 9 GiB where Postgres needs about 100 GiB. Why does columnar compress so much better, and why does that widen the scan-throughput gap rather than just saving disk?
A growth dashboard runs a 90-day revenue aggregate against the prod Postgres read replica. Checkout p99 jumps from 40ms to 2s and replica lag climbs to 90s. What root cause does a senior name first?
Product wants the 90-day revenue dashboard. Of these four options, which is the architecture a senior ships, and why?
A team proposes an HTAP engine to serve both transactions and analytics on one system, arguing it removes the need to split stores. What is the senior framing?
The through-line of the unit is one decision: access pattern sets layout. Point lookups and small writes want a row store; full-table scans over a few columns want a column store, where pruning (read ~6% of the bytes), 5–10× compression, and vectorized batch execution stack into a 10×–1000× gap no index can close. The classic senior failure is analytics on the prod OLTP replica, where one scan evicts the hot buffer cache, contends for I/O, and drives replica lag. The fix is architectural — a separate columnar store fed by CDC or ETL — accepting a little data latency for isolated storage, caches, and failure domains. HTAP can manage the contention but never abolishes it.