awesome-everything RU
↑ Back to the climb

Data Engineering

OLTP vs OLAP: split the workload

Crux Hands-on project — reproduce the analytics-on-OLTP outage, then split the workload into a columnar store fed by CDC/ETL and prove the fix with before/after numbers.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at senior altitude — in orbit
◷ 240 min

Reading about the replica-analytics outage is not the same as causing it and then fixing it. Stand up a row-store OLTP database, drive an analytical scan into it until checkout degrades, then split the workload into a columnar store fed by CDC/ETL — and prove every step with measurements.

Goal

Turn the unit’s mental model into a reproducible engineering loop: reproduce the contention, measure it, stand up the right OLAP layout, wire a feed, and verify both that analytics got faster and that the transactional path stopped suffering.

Project
0 of 7
Objective

Reproduce the analytics-on-OLTP failure on a row store, then move analytics to a separate columnar store fed by CDC or ETL — proving the analytical query got dramatically faster AND that the transactional workload is no longer harmed, with before/after numbers under identical load.

Requirements
Acceptance criteria
  • A before/after table: analytical query time, bytes/pages scanned, OLTP p99, buffer-cache hit ratio, and replication lag — all measured under the same load, not estimated.
  • The columnar store's plan shows it reads only the referenced columns (pruning), and the analytical query is at least 10× faster than the row-store scan.
  • With analytics moved off the OLTP instance, transactional p99 and cache hit ratio return to baseline while the analytical query runs — the contention is gone.
  • A one-paragraph write-up: why the index did not help (selectivity vs scan volume), which column-store property gave most of the speedup, and the data-latency cost you accepted for isolation.
Senior stretch
  • Add a compression comparison: report on-disk size of the table in the row store vs the columnar store, and break down which columns compressed most and why (low cardinality, same type).
  • Make the CDC feed near-real-time and measure end-to-end lag from an OLTP commit to its visibility in the columnar store; chart lag under a write spike.
  • Add a guardrail that prevents analytical queries from ever hitting the OLTP instance (a separate role/endpoint, statement timeout, or a query router) and show it blocks the destructive scan.
  • Stand up an HTAP option (e.g. a column replica or a HTAP engine) for the same workload and compare it against the two-store split on analytical latency, OLTP p99 under load, data freshness, and operational cost.
Recap

This is the loop you will run in every real data-architecture decision: reproduce the contention so you understand it, prove an index cannot fix a full-table aggregate, stand up the right columnar layout (denormalized, low-cardinality, pruned), feed it with CDC or ETL, and verify with before/after numbers that analytics got faster and the transactional path stopped suffering — accepting a bounded data-latency window for isolated storage, caches, and failure domains. Doing it once on a toy makes the production version muscle memory.

Continue the climb ↑ELT vs ETL: where the Transform runs, and why the industry flipped
shortcuts expand
search
K
prev piece
k
next piece
j
cycle tier
t
this menu
?
sources3
expand
  1. 01
  2. 02
  3. 03

Trademarks belong to their respective owners. Editorial reference only.