Databases DB · 07 · 01

Why sharding exists: the single-Postgres ceiling

Every Postgres instance has hard CPU, storage, and throughput ceilings; sharding is the only path past them — but it multiplies every operational task by the shard count.

DB Junior ◷ 8 min

Level

FoundationsJuniorMiddleSenior

Already know this unit? Take a 1-minute quick check →

Otto’s Postgres is running at 95% CPU, the working set has grown past what fits in RAM, and writes are saturating the single primary. Read replicas help for reads, but the write bottleneck is the machine itself. There is no bigger instance left to buy.

The single-Postgres ceiling

By the end of this lesson you will know exactly where each ceiling sits, why vertical scaling stops helping, and what the hot-shard failure mode looks like before it hits you in production.

A single Postgres instance has hard limits:

Working set vs RAM: once the hot portion of your data no longer fits in shared_buffers plus OS page cache, every query starts hitting disk. Cache-hit ratio (blks_hit / (blks_hit + blks_read)) dropping below 95–99% is the leading signal.
Write throughput: a single primary can sustain roughly 10–50k transactions per second depending on hardware. Above that, WAL generation dominates CPU and the primary becomes the bottleneck.
Storage: practical ceiling around 1–10 TB of working set in RAM on the largest available machines.

Vertical scaling (more RAM, faster NVMe) buys time — often years. The right order is: fix slow queries first, add indexes, add read replicas, vertically scale. Sharding is the last resort, not the first.

What sharding does

Sharding takes one logical dataset (all your orders, all your users) and splits it across N physical Postgres instances. Each row lives on exactly one shard, determined by a shard key — a column or combination of columns (typically tenant_id for B2B SaaS, user_id for B2C).

Single Postgres	Sharded (4 instances)
All rows on one machine	Rows split by shard key across 4 machines
One CPU/IO ceiling	~4× the CPU/IO capacity
Simple operations (1 backup, 1 migration)	4× operations (4 backups, migration runs 4×)
Cross-table joins: free	Cross-shard joins: expensive

The win is linear-ish scale-out: 10 shards give roughly 5–8× the throughput (overhead from coordination reduces the theoretical 10×). The cost is that every operational task — backups, migrations, upgrades, VACUUM — must run on every shard.

The hot-shard failure mode

The most important failure mode to design against from day one is the hot shard: one shard receives far more traffic than the others because the shard key was poorly chosen, or because one tenant grew 1000× larger than the median.

Example: a SaaS shards orders by customer_id. An enterprise customer signs and their orders are 5× all other customers combined. Their shard hits 95% CPU while the other shards sit at 10%. The cluster effectively runs at single-shard capacity for the dominant customer.

One skewed tenant pins shard 0 at 95% while three shards idle at 10% — the cluster's usable throughput collapses to a single shard's. That imbalance, not raw capacity, is why shard-key choice dominates the design.

The fix — moving that tenant to a dedicated shard — is covered in lesson 05. The lesson here is: shard-key selection is the single most consequential design decision in a sharded system, and bad choices are lived with for years.

The library metaphor

Sharding is the difference between one library with a million books (one building, one lookup desk, find anything by walking around) and ten libraries with 100k books each (ten buildings, you need a directory of “which library has this book”). The single library is simpler. The ten-library setup needs a directory (the shard map) and a rule for which building to go to before you set off.

Books that get checked out a million times a year fill one library while the others sit empty — that is the hot-shard failure.

▸Why this works

Why not just use a bigger cloud instance? Vertical scaling is almost always cheaper than sharding for the first few years. The decision point is when you have measured that the largest available instance is still insufficient at peak — usually 10 TB working set, 50k+ QPS sustained write load, or multi-year projections that cross those thresholds. Sharding is a one-way door: once your schema and app logic are written for a sharded world, reversing it is a multi-month project.

Quiz

What does a shard key determine?

Quiz

A B2B SaaS shards 'orders' by customer_id. One enterprise customer generates 5× the traffic of all others combined. What is the result?

Order the steps

Order the scaling strategies from cheapest to most expensive (operational complexity), before reaching for sharding:

1 Fix slow queries: add missing indexes, rewrite inefficient SQL
2 Add read replicas to offload read-heavy traffic from the primary
3 Add caching layers (Redis, in-process) for hot read paths
4 Vertical scaling: upgrade to a larger instance (more RAM, faster NVMe)
5 Shard: split the dataset across multiple Postgres instances by a shard key

Each row lives on exactly one shard, chosen by its shard key — roughly N× the capacity, but every backup, migration, and upgrade now runs N times.

Recall before you leave

01
What are the three hard ceilings of a single Postgres instance that sharding addresses?
02
In one sentence: what is the hot-shard failure mode and what causes it?
03
Why is the operational cost of sharding described as 'N×'?

Recap

A single Postgres instance has hard ceilings on working set (RAM), write throughput, and storage that vertical scaling can only delay, not eliminate. Sharding splits the dataset across N physical Postgres instances by a shard key, giving roughly N× the capacity at the cost of N× the operational complexity — every migration, backup, and upgrade runs once per shard. The central failure mode is the hot shard: one shard saturating because the key was poorly chosen or a single tenant outgrew the rest. Senior engineers exhaust indexing, caching, read replicas, and vertical scaling before sharding — and when they do shard, they pick the key deliberately and design the hot-shard response before the first incident. Now when you see a Postgres instance trending toward its resource ceiling, you have the ladder: indexes → caching → replicas → vertical scale → shard — and you know which rung to climb before reaching for the last one.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 6 done

Connected lessons

builds on

What a schema migration is and why it replaces ad-hoc DDLjunior

unlocks

deepens into

appears again in166

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.