awesome-everything RU
↑ Back to the climb

Data Engineering

OLTP vs OLAP: the workload split that defines data engineering

Crux Row-store OLTP and column-store OLAP optimize opposite shapes: tiny indexed writes vs huge scans. Running analytics on the prod replica is the classic outage — the fix is a separate columnar store fed by ETL/CDC.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at junior altitude — the surface
◷ 16 min

The growth team ships a dashboard: “revenue by country, last 90 days.” It points at the prod Postgres read replica because “it’s just a read.” At 9am the query fans out a SUM over 40M orders rows, evicts the hot B-tree pages every checkout needs out of shared_buffers, and pins I/O for 30 seconds. Checkout p99 jumps from 40ms to 2s, replica lag climbs to 90s, and on-call gets paged for “database down.” Nothing was down. One analytical scan poisoned a transactional system. Adding an index would not have helped — there was no WHERE to narrow; the query needed every row.

Two workloads, two opposite shapes

OLTP — Online Transaction Processing — is the app database: INSERT an order, UPDATE a balance, fetch one user by id. The access pattern is point lookups and small writes against a few rows, thousands of times per second, each in single-digit milliseconds. OLAP — Online Analytical Processing — is the question machine: average order value by region by month, retention cohorts, “which 1% of users drive 40% of revenue.” It scans millions of rows but touches only a handful of the table’s columns, and a query running for seconds is fine.

These are not two settings on one knob. They pull the physical design in opposite directions. OLTP optimizes for write latency and single-row retrieval; OLAP optimizes for scan throughput over a few columns across the whole table. A schema, an index, a storage format that is excellent at one is structurally bad at the other. That tension — not any specific tool — is what the rest of data engineering is built to manage.

Row-store vs column-store: it’s about what sits next to what

A row store (Postgres, MySQL, most OLTP engines) keeps a row’s columns contiguous on disk: [id, user, country, total, created_at] for row 1, then all of row 2, and so on. That is perfect for OLTP — fetching or writing one whole order touches one page. But ask for SUM(total) over 40M rows and the engine must read every page of the table, dragging id, user, country, created_at through memory just to reach total. You pay I/O for 50 columns to use 1.

A column store (ClickHouse, Redshift, BigQuery, Snowflake) flips the layout: all total values sit together, all country values together. Now SUM(total) reads only the total column — column pruning. A query touching 3 of 50 columns reads roughly 6% of the bytes a row store would. That single property is most of the gap.

DimensionRow store (OLTP)Column store (OLAP)
Physical layoutAll columns of a row togetherAll values of a column together
Best atPoint read/write of whole rowsScan + aggregate a few columns
Big aggregate costReads every byte of every rowReads only referenced columns (~6%)
Compression~1.5–3× (mixed types per page)~5–10× (same type per column)
Single-row writeCheap — one row, one placeExpensive — touches every column file

Why columnar compresses 5–10× and runs vectorized

Storing one column together means storing values of one type and often low cardinality next to each other — and that compresses far better than a row’s grab-bag of an int, a string, a timestamp, a float. A country column is a few hundred distinct strings repeated millions of times: dictionary-encode it to small ints, then run-length-encode the runs, then LZ4/ZSTD on top. Row stores typically reach 1.5–3×; column stores hit 5–10× on analytical data and 30×+ on low-cardinality columns. On the 100M-row ClickBench, ClickHouse stores the dataset in ~9 GiB where Postgres needs ~100 GiB. Less data on disk is less I/O, and I/O is the bottleneck for scans — so the compression gap widens the throughput gap to 50–100×.

The second multiplier is vectorized execution. A row engine processes a query tuple-by-tuple: per row, a function call, a type check, a branch. A column engine processes a column in batches of thousands of contiguous, same-type values, so the CPU runs tight loops over arrays, SIMD-friendly, with predictable branches and warm caches. Same arithmetic, an order of magnitude less per-value overhead. Pruning, compression, and vectorization stack: columnar beats row stores by 10×–1000× on analytical queries — and “add an index” cannot close that, because a full-table aggregate has nothing to seek.

Why this works

“Just add an index” is the instinct that fails here, and it’s worth knowing exactly why. An index turns a scan into a seek when a query selects a small fraction of rows. An analytical aggregate — SUM, AVG, COUNT(*), a 90-day rollup — needs most or all of the rows, so the planner correctly ignores the index and does a sequential scan; an index would only add write cost. Indexes fix selectivity, not scan volume. The fix for scan volume is a layout that reads fewer bytes per row: a column store.

The failure mode: analytics on the prod OLTP replica

The Hook is the senior failure mode in full. A read replica is still a row store with a buffer cache sized for transactional working sets. Point it at a long analytical scan and three things go wrong at once. First, buffer-cache eviction: the scan pulls cold pages through shared_buffers and evicts the hot OLTP pages, so transactional queries that were memory-hits start hitting disk and slow 10–100×. Second, lock and resource contention: sustained CPU, memory for hash joins, and I/O bandwidth are zero-sum — the analytical query starves the transactional threads sharing the engine. Third, replica lag: a long query holding resources delays apply of the WAL stream, so the replica falls behind the primary, and anything reading it serves stale data.

The fix is architectural, not a tuning flag: give analytics its own home. Stand up a separate OLAP store (a columnar warehouse — Redshift/BigQuery/Snowflake — or ClickHouse) and feed it from OLTP continuously via CDC (change data capture, e.g. Debezium streaming the WAL into Kafka) or batch ETL/ELT. The two systems now have independent storage, independent caches, independent failure domains. The dashboard hits the warehouse; checkout never feels it. The cost you take on is a pipeline to own and a few seconds-to-minutes of data latency — almost always the right trade.

Pick the best fit

Product wants a 'revenue by country, last 90 days' dashboard over a 40M-row orders table in prod Postgres. Pick the architecture.

Quiz

A query computes SUM(total) over 40M rows but reads only the total column. Why does a column store crush a row store here?

Quiz

An analytics dashboard pointed at the prod OLTP read replica causes checkout p99 to spike. What's the root cause a senior names first?

Order the steps

Order the steps to move heavy analytics off the prod OLTP database the right way:

  1. 1 Recognize the query is OLAP — a full-table scan/aggregate, not a selective lookup an index can fix
  2. 2 Stand up a separate OLAP store (columnar warehouse or ClickHouse) with its own storage and caches
  3. 3 Feed it from OLTP continuously via CDC (e.g. Debezium → Kafka) or batch ETL/ELT
  4. 4 Point dashboards/analysts at the OLAP store, leaving the OLTP buffer cache for transactions
  5. 5 Accept seconds-to-minutes of data latency as the trade for an isolated failure domain
Recall before you leave
  1. 01
    A teammate asks why you can't just add an index to make the 90-day revenue aggregate fast on Postgres. Explain.
  2. 02
    Why is running the analytics dashboard on the prod read replica an outage waiting to happen, and what's the correct architecture?
Recap

OLTP and OLAP are two workloads with opposite physical needs: OLTP wants tiny indexed reads and writes of whole rows, so it uses a row store where a row’s columns sit contiguous; OLAP wants to scan and aggregate a few columns across the whole table, so it uses a column store where each column sits contiguous. That layout gives the column store its three multipliers — column pruning (read ~6% of the bytes for a query touching 3 of 50 columns), 5–10× compression because same-type values pack tightly, and vectorized batch execution — which together make analytical queries 10×–1000× faster, a gap no index can close because a full-table aggregate has nothing to seek. The classic senior failure is running heavy analytics on the prod OLTP replica, where one scan evicts the hot buffer cache, contends for I/O and CPU, and drives replica lag, turning “it’s just a read” into a checkout outage. The fix is architectural: a separate columnar OLAP store fed from OLTP by CDC or ETL, accepting a little data latency in exchange for isolated storage, caches, and failure domains. HTAP systems that try to serve both on one engine exist, but no production HTAP eliminates the resource contention — it can be managed, not avoided, which is why the two-store split remains the default.

Continue the climb ↑OLTP vs OLAP: multiple-choice review
shortcuts expand
search
K
prev piece
k
next piece
j
cycle tier
t
this menu
?
sources4
expand
  1. 01
  2. 02
  3. 03
  4. 04

Trademarks belong to their respective owners. Editorial reference only.