awesome-everything RU
↑ Back to the climb

Databases

Why sharding exists: the single-Postgres ceiling

Crux Every Postgres instance has hard CPU, storage, and throughput ceilings; sharding is the only path past them — but it multiplies every operational task by the shard count.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at junior altitude — the surface
◷ 8 min

Otto’s Postgres is running at 95% CPU, the working set has grown past what fits in RAM, and writes are saturating the single primary. Read replicas help for reads, but the write bottleneck is the machine itself. There is no bigger instance left to buy.

The single-Postgres ceiling

A single Postgres instance has hard limits:

  • Working set vs RAM: once the hot portion of your data no longer fits in shared_buffers plus OS page cache, every query starts hitting disk. Cache-hit ratio (blks_hit / (blks_hit + blks_read)) dropping below 95–99% is the leading signal.
  • Write throughput: a single primary can sustain roughly 10–50k transactions per second depending on hardware. Above that, WAL generation dominates CPU and the primary becomes the bottleneck.
  • Storage: practical ceiling around 1–10 TB of working set in RAM on the largest available machines.

Vertical scaling (more RAM, faster NVMe) buys time — often years. The right order is: fix slow queries first, add indexes, add read replicas, vertically scale. Sharding is the last resort, not the first.

What sharding does

Sharding takes one logical dataset (all your orders, all your users) and splits it across N physical Postgres instances. Each row lives on exactly one shard, determined by a shard key — a column or combination of columns (typically tenant_id for B2B SaaS, user_id for B2C).

Single PostgresSharded (4 instances)
All rows on one machineRows split by shard key across 4 machines
One CPU/IO ceiling~4× the CPU/IO capacity
Simple operations (1 backup, 1 migration)4× operations (4 backups, migration runs 4×)
Cross-table joins: freeCross-shard joins: expensive

The win is linear-ish scale-out: 10 shards give roughly 5–8× the throughput (overhead from coordination reduces the theoretical 10×). The cost is that every operational task — backups, migrations, upgrades, VACUUM — must run on every shard.

The hot-shard failure mode

The most important failure mode to design against from day one is the hot shard: one shard receives far more traffic than the others because the shard key was poorly chosen, or because one tenant grew 1000× larger than the median.

Example: a SaaS shards orders by customer_id. An enterprise customer signs and their orders are 5× all other customers combined. Their shard hits 95% CPU while the other shards sit at 10%. The cluster effectively runs at single-shard capacity for the dominant customer.

The fix — moving that tenant to a dedicated shard — is covered in lesson 05. The lesson here is: shard-key selection is the single most consequential design decision in a sharded system, and bad choices are lived with for years.

The library metaphor

Sharding is the difference between one library with a million books (one building, one lookup desk, find anything by walking around) and ten libraries with 100k books each (ten buildings, you need a directory of “which library has this book”). The single library is simpler. The ten-library setup needs a directory (the shard map) and a rule for which building to go to before you set off.

Books that get checked out a million times a year fill one library while the others sit empty — that is the hot-shard failure.

Why this works

Why not just use a bigger cloud instance? Vertical scaling is almost always cheaper than sharding for the first few years. The decision point is when you have measured that the largest available instance is still insufficient at peak — usually 10 TB working set, 50k+ QPS sustained write load, or multi-year projections that cross those thresholds. Sharding is a one-way door: once your schema and app logic are written for a sharded world, reversing it is a multi-month project.

Quiz

What does a shard key determine?

Quiz

A B2B SaaS shards 'orders' by customer_id. One enterprise customer generates 5× the traffic of all others combined. What is the result?

Order the steps

Order the scaling strategies from cheapest to most expensive (operational complexity), before reaching for sharding:

  1. 1 Fix slow queries: add missing indexes, rewrite inefficient SQL
  2. 2 Add read replicas to offload read-heavy traffic from the primary
  3. 3 Add caching layers (Redis, in-process) for hot read paths
  4. 4 Vertical scaling: upgrade to a larger instance (more RAM, faster NVMe)
  5. 5 Shard: split the dataset across multiple Postgres instances by a shard key
Recall before you leave
  1. 01
    What are the three hard ceilings of a single Postgres instance that sharding addresses?
  2. 02
    In one sentence: what is the hot-shard failure mode and what causes it?
  3. 03
    Why is the operational cost of sharding described as 'N×'?
Recap

A single Postgres instance has hard ceilings on working set (RAM), write throughput, and storage that vertical scaling can only delay, not eliminate. Sharding splits the dataset across N physical Postgres instances by a shard key, giving roughly N× the capacity at the cost of N× the operational complexity — every migration, backup, and upgrade runs once per shard. The central failure mode is the hot shard: one shard saturating because the key was poorly chosen or a single tenant outgrew the rest. Senior engineers exhaust indexing, caching, read replicas, and vertical scaling before sharding — and when they do shard, they pick the key deliberately and design the hot-shard response before the first incident.

Connected lessons
appears again in140
Continue the climb ↑Shard-key selection: hash, range, list, and directory strategies
shortcuts expand
search
K
prev piece
k
next piece
j
cycle tier
t
this menu
?
sources3
expand
  1. 01
  2. 02
  3. 03

Trademarks belong to their respective owners. Editorial reference only.