Data Engineering
OLTP vs OLAP: the workload split that defines data engineering
The growth team ships a dashboard: “revenue by country, last 90 days.” It points at the prod Postgres read replica because “it’s just a read.” At 9am the query fans out a SUM over 40M orders rows, evicts the hot B-tree pages every checkout needs out of shared_buffers, and pins I/O for 30 seconds. Checkout p99 jumps from 40ms to 2s, replica lag climbs to 90s, and on-call gets paged for “database down.” Nothing was down. One analytical scan poisoned a transactional system. Adding an index would not have helped — there was no WHERE to narrow; the query needed every row.
Two workloads, two opposite shapes
OLTP — Online Transaction Processing — is the app database: INSERT an order, UPDATE a balance, fetch one user by id. The access pattern is point lookups and small writes against a few rows, thousands of times per second, each in single-digit milliseconds. OLAP — Online Analytical Processing — is the question machine: average order value by region by month, retention cohorts, “which 1% of users drive 40% of revenue.” It scans millions of rows but touches only a handful of the table’s columns, and a query running for seconds is fine.
These are not two settings on one knob. They pull the physical design in opposite directions. OLTP optimizes for write latency and single-row retrieval; OLAP optimizes for scan throughput over a few columns across the whole table. A schema, an index, a storage format that is excellent at one is structurally bad at the other. That tension — not any specific tool — is what the rest of data engineering is built to manage.
Row-store vs column-store: it’s about what sits next to what
A row store (Postgres, MySQL, most OLTP engines) keeps a row’s columns contiguous on disk: [id, user, country, total, created_at] for row 1, then all of row 2, and so on. That is perfect for OLTP — fetching or writing one whole order touches one page. But ask for SUM(total) over 40M rows and the engine must read every page of the table, dragging id, user, country, created_at through memory just to reach total. You pay I/O for 50 columns to use 1.
A column store (ClickHouse, Redshift, BigQuery, Snowflake) flips the layout: all total values sit together, all country values together. Now SUM(total) reads only the total column — column pruning. A query touching 3 of 50 columns reads roughly 6% of the bytes a row store would. That single property is most of the gap.
| Dimension | Row store (OLTP) | Column store (OLAP) |
|---|---|---|
| Physical layout | All columns of a row together | All values of a column together |
| Best at | Point read/write of whole rows | Scan + aggregate a few columns |
| Big aggregate cost | Reads every byte of every row | Reads only referenced columns (~6%) |
| Compression | ~1.5–3× (mixed types per page) | ~5–10× (same type per column) |
| Single-row write | Cheap — one row, one place | Expensive — touches every column file |
Why columnar compresses 5–10× and runs vectorized
Storing one column together means storing values of one type and often low cardinality next to each other — and that compresses far better than a row’s grab-bag of an int, a string, a timestamp, a float. A country column is a few hundred distinct strings repeated millions of times: dictionary-encode it to small ints, then run-length-encode the runs, then LZ4/ZSTD on top. Row stores typically reach 1.5–3×; column stores hit 5–10× on analytical data and 30×+ on low-cardinality columns. On the 100M-row ClickBench, ClickHouse stores the dataset in ~9 GiB where Postgres needs ~100 GiB. Less data on disk is less I/O, and I/O is the bottleneck for scans — so the compression gap widens the throughput gap to 50–100×.
The second multiplier is vectorized execution. A row engine processes a query tuple-by-tuple: per row, a function call, a type check, a branch. A column engine processes a column in batches of thousands of contiguous, same-type values, so the CPU runs tight loops over arrays, SIMD-friendly, with predictable branches and warm caches. Same arithmetic, an order of magnitude less per-value overhead. Pruning, compression, and vectorization stack: columnar beats row stores by 10×–1000× on analytical queries — and “add an index” cannot close that, because a full-table aggregate has nothing to seek.
Why this works
“Just add an index” is the instinct that fails here, and it’s worth knowing exactly why. An index turns a scan into a seek when a query selects a small fraction of rows. An analytical aggregate — SUM, AVG, COUNT(*), a 90-day rollup — needs most or all of the rows, so the planner correctly ignores the index and does a sequential scan; an index would only add write cost. Indexes fix selectivity, not scan volume. The fix for scan volume is a layout that reads fewer bytes per row: a column store.
The failure mode: analytics on the prod OLTP replica
The Hook is the senior failure mode in full. A read replica is still a row store with a buffer cache sized for transactional working sets. Point it at a long analytical scan and three things go wrong at once. First, buffer-cache eviction: the scan pulls cold pages through shared_buffers and evicts the hot OLTP pages, so transactional queries that were memory-hits start hitting disk and slow 10–100×. Second, lock and resource contention: sustained CPU, memory for hash joins, and I/O bandwidth are zero-sum — the analytical query starves the transactional threads sharing the engine. Third, replica lag: a long query holding resources delays apply of the WAL stream, so the replica falls behind the primary, and anything reading it serves stale data.
The fix is architectural, not a tuning flag: give analytics its own home. Stand up a separate OLAP store (a columnar warehouse — Redshift/BigQuery/Snowflake — or ClickHouse) and feed it from OLTP continuously via CDC (change data capture, e.g. Debezium streaming the WAL into Kafka) or batch ETL/ELT. The two systems now have independent storage, independent caches, independent failure domains. The dashboard hits the warehouse; checkout never feels it. The cost you take on is a pipeline to own and a few seconds-to-minutes of data latency — almost always the right trade.
Product wants a 'revenue by country, last 90 days' dashboard over a 40M-row orders table in prod Postgres. Pick the architecture.
A query computes SUM(total) over 40M rows but reads only the total column. Why does a column store crush a row store here?
An analytics dashboard pointed at the prod OLTP read replica causes checkout p99 to spike. What's the root cause a senior names first?
Order the steps to move heavy analytics off the prod OLTP database the right way:
- 1 Recognize the query is OLAP — a full-table scan/aggregate, not a selective lookup an index can fix
- 2 Stand up a separate OLAP store (columnar warehouse or ClickHouse) with its own storage and caches
- 3 Feed it from OLTP continuously via CDC (e.g. Debezium → Kafka) or batch ETL/ELT
- 4 Point dashboards/analysts at the OLAP store, leaving the OLTP buffer cache for transactions
- 5 Accept seconds-to-minutes of data latency as the trade for an isolated failure domain
- 01A teammate asks why you can't just add an index to make the 90-day revenue aggregate fast on Postgres. Explain.
- 02Why is running the analytics dashboard on the prod read replica an outage waiting to happen, and what's the correct architecture?
OLTP and OLAP are two workloads with opposite physical needs: OLTP wants tiny indexed reads and writes of whole rows, so it uses a row store where a row’s columns sit contiguous; OLAP wants to scan and aggregate a few columns across the whole table, so it uses a column store where each column sits contiguous. That layout gives the column store its three multipliers — column pruning (read ~6% of the bytes for a query touching 3 of 50 columns), 5–10× compression because same-type values pack tightly, and vectorized batch execution — which together make analytical queries 10×–1000× faster, a gap no index can close because a full-table aggregate has nothing to seek. The classic senior failure is running heavy analytics on the prod OLTP replica, where one scan evicts the hot buffer cache, contends for I/O and CPU, and drives replica lag, turning “it’s just a read” into a checkout outage. The fix is architectural: a separate columnar OLAP store fed from OLTP by CDC or ETL, accepting a little data latency in exchange for isolated storage, caches, and failure domains. HTAP systems that try to serve both on one engine exist, but no production HTAP eliminates the resource contention — it can be managed, not avoided, which is why the two-store split remains the default.