Data Engineering DATA · 01 · 01

OLTP vs OLAP: the workload split that defines data engineering

Row-store OLTP and column-store OLAP optimize opposite shapes: tiny indexed writes vs huge scans. Running analytics on the prod replica is the classic outage — the fix is a separate columnar store fed by ETL/CDC.

DATA Junior ◷ 16 min

Level

FoundationsJuniorMiddleSenior

The growth team ships a dashboard: “revenue by country, last 90 days.” It points at the prod Postgres read replica because “it’s just a read.” At 9am the query fans out a SUM over 40M orders rows, evicts the hot B-tree pages every checkout needs out of shared_buffers, and pins I/O for 30 seconds. Checkout p99 jumps from 40ms to 2s, replica lag climbs to 90s, and on-call gets paged for “database down.” Nothing was down. One analytical scan poisoned a transactional system. Adding an index would not have helped — there was no WHERE to narrow; the query needed every row.

In ten minutes you will understand exactly why two workloads that look compatible will destroy each other when they share a database — and what the fix actually is.

Two workloads, two opposite shapes

OLTP — Online Transaction Processing — is the app database: INSERT an order, UPDATE a balance, fetch one user by id. The access pattern is point lookups and small writes against a few rows, thousands of times per second, each in single-digit milliseconds. OLAP — Online Analytical Processing — is the question machine: average order value by region by month, retention cohorts, “which 1% of users drive 40% of revenue.” It scans millions of rows but touches only a handful of the table’s columns, and a query running for seconds is fine.

These are not two settings on one knob. They pull the physical design in opposite directions. OLTP optimizes for write latency and single-row retrieval; OLAP optimizes for scan throughput over a few columns across the whole table. A schema, an index, a storage format that is excellent at one is structurally bad at the other. That tension — not any specific tool — is what the rest of data engineering is built to manage.

Row-store vs column-store: it’s about what sits next to what

Why does the physical layout matter so much? Because when you run SUM(total) over 40M rows, the difference between reading 2 GB and reading 38 GB is entirely determined by where values are stored on disk — not by any query optimizer trick or index.

A row store (Postgres, MySQL, most OLTP engines) keeps a row’s columns contiguous on disk: [id, user, country, total, created_at] for row 1, then all of row 2, and so on. That is perfect for OLTP — fetching or writing one whole order touches one page. But ask for SUM(total) over 40M rows and the engine must read every page of the table, dragging id, user, country, created_at through memory just to reach total. You pay I/O for 50 columns to use 1.

A column store (ClickHouse, Redshift, BigQuery, Snowflake) flips the layout: all total values sit together, all country values together. Now SUM(total) reads only the total column — column pruning. A query touching 3 of 50 columns reads roughly 6% of the bytes a row store would. That single property is most of the gap.

Row store: columns of a row are contiguous — point lookup of one order touches one place; a SUM(total) still reads every column of every row. Column store: each column is stored separately — SUM(total) prunes to just the total file (~6% of the bytes).

Dimension	Row store (OLTP)	Column store (OLAP)
Physical layout	All columns of a row together	All values of a column together
Best at	Point read/write of whole rows	Scan + aggregate a few columns
Big aggregate cost	Reads every byte of every row	Reads only referenced columns (~6%)
Compression	~1.5–3× (mixed types per page)	~5–10× (same type per column)
Single-row write	Cheap — one row, one place	Expensive — touches every column file

Why columnar compresses 5–10× and runs vectorized

Storing one column together means storing values of one type and often low cardinality next to each other — and that compresses far better than a row’s grab-bag of an int, a string, a timestamp, a float. A country column is a few hundred distinct strings repeated millions of times: dictionary-encode it to small ints, then run-length-encode the runs, then LZ4/ZSTD on top. Row stores typically reach 1.5–3×; column stores hit 5–10× on analytical data and 30×+ on low-cardinality columns. On the 100M-row ClickBench, ClickHouse stores the dataset in ~9 GiB where Postgres needs ~100 GiB. Less data on disk is less I/O, and I/O is the bottleneck for scans — so the compression gap widens the throughput gap to 50–100×.

The second multiplier is vectorized execution. A row engine processes a query tuple-by-tuple: per row, a function call, a type check, a branch. A column engine processes a column in batches of thousands of contiguous, same-type values, so the CPU runs tight loops over arrays, SIMD-friendly, with predictable branches and warm caches. Same arithmetic, an order of magnitude less per-value overhead. Pruning, compression, and vectorization stack: columnar beats row stores by 10×–1000× on analytical queries — and “add an index” cannot close that, because a full-table aggregate has nothing to seek.

▸Why this works

“Just add an index” is the instinct that fails here, and it’s worth knowing exactly why. An index turns a scan into a seek when a query selects a small fraction of rows. An analytical aggregate — SUM, AVG, COUNT(*), a 90-day rollup — needs most or all of the rows, so the planner correctly ignores the index and does a sequential scan; an index would only add write cost. Indexes fix selectivity, not scan volume. The fix for scan volume is a layout that reads fewer bytes per row: a column store.

The failure mode: analytics on the prod OLTP replica

The Hook is the senior failure mode in full. A read replica is still a row store with a buffer cache sized for transactional working sets. Point it at a long analytical scan and three things go wrong at once. First, buffer-cache eviction: the scan pulls cold pages through shared_buffers and evicts the hot OLTP pages, so transactional queries that were memory-hits start hitting disk and slow 10–100×. Second, lock and resource contention: sustained CPU, memory for hash joins, and I/O bandwidth are zero-sum — the analytical query starves the transactional threads sharing the engine. Third, replica lag: a long query holding resources delays apply of the WAL stream, so the replica falls behind the primary, and anything reading it serves stale data.

When you see p99 latency spike after a new dashboard goes live, ask yourself what that dashboard query is doing to the buffer cache before you reach for any other knob.

The fix is architectural, not a tuning flag: give analytics its own home. Stand up a separate OLAP store (a columnar warehouse — Redshift/BigQuery/Snowflake — or ClickHouse) and feed it from OLTP continuously via CDC (change data capture, e.g. Debezium streaming the WAL into Kafka) or batch ETL/ELT. The two systems now have independent storage, independent caches, independent failure domains. The dashboard hits the warehouse; checkout never feels it. The cost you take on is a pipeline to own and a few seconds-to-minutes of data latency — almost always the right trade.

Pick the best fit

Product wants a 'revenue by country, last 90 days' dashboard over a 40M-row orders table in prod Postgres. Pick the architecture.

Quiz

A query computes SUM(total) over 40M rows but reads only the total column. Why does a column store crush a row store here?

Quiz

An analytics dashboard pointed at the prod OLTP read replica causes checkout p99 to spike. What's the root cause a senior names first?

Order the steps

Order the steps to move heavy analytics off the prod OLTP database the right way:

1 Recognize the query is OLAP — a full-table scan/aggregate, not a selective lookup an index can fix
2 Stand up a separate OLAP store (columnar warehouse or ClickHouse) with its own storage and caches
3 Feed it from OLTP continuously via CDC (e.g. Debezium → Kafka) or batch ETL/ELT
4 Point dashboards/analysts at the OLAP store, leaving the OLTP buffer cache for transactions
5 Accept seconds-to-minutes of data latency as the trade for an isolated failure domain

The fix is architectural, not a tuning flag: two storage systems with independent caches, independent failure domains, and a CDC feed between them. The replica path is a trap even though it is a read.

Recall before you leave

01
A teammate asks why you can't just add an index to make the 90-day revenue aggregate fast on Postgres. Explain.
02
Why is running the analytics dashboard on the prod read replica an outage waiting to happen, and what's the correct architecture?

Recap

OLTP and OLAP are two workloads with opposite physical needs: OLTP wants tiny indexed reads and writes of whole rows, so it uses a row store where a row’s columns sit contiguous; OLAP wants to scan and aggregate a few columns across the whole table, so it uses a column store where each column sits contiguous. That layout gives the column store its three multipliers — column pruning (read ~6% of the bytes for a query touching 3 of 50 columns), 5–10× compression because same-type values pack tightly, and vectorized batch execution — which together make analytical queries 10×–1000× faster, a gap no index can close because a full-table aggregate has nothing to seek. The classic senior failure is running heavy analytics on the prod OLTP replica, where one scan evicts the hot buffer cache, contends for I/O and CPU, and drives replica lag, turning “it’s just a read” into a checkout outage. The fix is architectural: a separate columnar OLAP store fed from OLTP by CDC or ETL, accepting a little data latency in exchange for isolated storage, caches, and failure domains. HTAP systems that try to serve both on one engine exist, but no production HTAP eliminates the resource contention — it can be managed, not avoided, which is why the two-store split remains the default. Now when you see a dashboard pointed at a read replica, you know what to ask before the on-call page arrives.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 5 done

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.

Apply this

Put this lesson to work on a real build.

Idempotent ETL PipelinePipelines don't fail gracefully — they fail at 3 a.m., halfway through a load, and someone re-runs them. This project teaches the one property that separates a hobby script from production data engineering: a run you can repeat any number of times and still land exactly one copy of each row. You'll build batch ingestion, an idempotent load, a watermark for incremental pulls, and the data-quality gates that stop bad data before it poisons everything downstream.