Databases DB · 07 · 02

Shard-key selection: hash, range, list, and directory strategies

A good shard key has four properties — selectivity, uniform distribution, stability, and availability at routing time. Hash, range, list, and directory strategies each trade different failure modes.

DB Middle ◷ 12 min

Level

FoundationsJuniorMiddleSenior

Two teams both shard their Postgres by customer_id. One uses hash sharding; six months later they have no hot shards. The other uses range sharding with the same key; they have a permanent hot tail on the shard holding the highest-ID customers because new signups all go there. Same column, different strategy, opposite outcome.

The four properties of a good shard key

If you get the shard key wrong, no amount of hardware or tuning can save you — the damage is baked into the schema for years. What does “right” look like?

A shard key must satisfy all four of these simultaneously:

Selectivity: 95%+ of your queries filter by it. Queries without the shard key must fan out to every shard — expensive and slow.
Uniform distribution: each shard receives roughly equal data and traffic. Uneven keys produce hot shards.
Stable: the value never changes for a given row. Changing a shard key would require moving the row to a different shard — a problem you cannot solve at update time.
Available at routing time: every API request that touches this data must carry the shard key. If requests arrive without it, you cannot route them without a cross-shard scan.

For multi-tenant B2B SaaS, tenant_id almost always satisfies all four. For B2C, user_id. For geo-partitioned systems, region. Designs that do not have a column meeting all four properties are red flags — split the data model until one emerges. Together these four properties form a filter: when you see a candidate column pass all four, you have found your key; if any one fails, keep looking.

The four strategies

Strategy	How it works	Win	Risk
Hash	shard = hash(key) mod N	Uniform on uniform keys	Hot shard if keys are skewed; changing N moves most rows
Range	Each shard owns a key range (e.g., tenant_id 1-1000)	Natural for time-series; easy to add a shard at the tail	Hot tail: newest records all on one shard
List	Explicit map: key → shard	Maximum control	Does not scale to millions of keys
Directory	External lookup table maps key → shard; move tenants by updating one row	Move individual tenants between shards; Citus tenant isolation uses this	Lookup cost on every request; the shard map is critical infrastructure

Hash sharding is the default for most OLTP systems because it distributes evenly when keys are uniformly distributed. The problem is that real-world tenants are power-law distributed: one customer is 100–1000× the median. Hash still routes that customer to a single shard — a hot shard.

Range sharding is natural for time-series data (shard by month) but creates a hot tail by default: all new writes go to the latest shard while older shards are idle. Production time-series systems typically combine range (by time bucket) with hash (within the bucket) to spread write load.

Directory-based sharding is the most flexible: a lookup table maps tenant_id → connection_string. Moving a tenant means updating one row. This is the foundation of Citus’s tenant isolation primitive and of app-level routing in multi-tenant SaaS. The cost is that the lookup table becomes critical-tier-1 infrastructure — it must be highly available, low-latency, and cached aggressively.

Consistent hashing is a hash variant used mostly in caches (Memcached, Redis Cluster) where adding or removing a shard moves only K/N keys instead of most of them. Some Postgres sharding setups use directory-based consistent hashing for online shard rebalancing.

Each strategy trades one failure mode for another. Hash buys even distribution but loses range friendliness; range buys time-series convenience but creates a hot tail; directory buys per-tenant flexibility but makes the lookup table tier-1 infrastructure. When you see a time-series write pattern with an auto-increment key, that is the “hot tail” warning sign — reach for a composite approach.

There is no free strategy: every win buys exactly one failure mode. Match the strategy to the failure you can tolerate, not the win you want.

The shard-key contract in practice

The key must be in the primary key of every sharded table. If orders is sharded by tenant_id, then (tenant_id, order_id) is the primary key — not just order_id. This guarantees uniqueness across shards and forces every query path to carry tenant_id.

▸Why this works

Why is changing the shard key so expensive? The shard key determines where each row physically lives. Changing it means moving every row to its new shard — which involves reading the old shard, writing to the new shard, deleting from the old shard, and doing this for potentially billions of rows while production traffic continues. Teams have spent months executing shard-key changes. The lesson: choose carefully from the start and validate the key against production access patterns and tenant size distribution before committing.

Quiz

A SaaS has 5000 tenants where the top 5 tenants generate 60% of traffic. What is the best sharding approach?

Quiz

A query searches users by email globally (no tenant_id available). What happens on a tenant_id-sharded cluster?

Order the steps

Order the shard-key evaluation steps before committing to a key for a multi-tenant B2B SaaS:

1 Audit query patterns: list the WHERE columns across the top-100 queries by frequency
2 Validate the key is in 95%+ of queries and available in every API request
3 Measure tenant size distribution on production data — find the top 5% by traffic and data volume
4 Confirm the key is stable (never updated) and non-null on every sharded table
5 Design outlier policy: tenants above X% cluster traffic get dedicated shards
6 Commit the key to an ADR and document the resharding cost if you later need to change it

A good key is selective, stable, and available at routing time — and uniform, so hash(key) mod N lands keys evenly. A skewed key sends one dominant tenant to a single shard: a hot shard.

Recall before you leave

01
Name the four required properties of a good shard key and explain why each matters.
02
What is the main failure mode of range sharding on a time-series key, and how do production systems mitigate it?
03
Why does the shard key need to be part of the primary key on every sharded table?

Recap

A shard key must satisfy four properties at once: selectivity (in 95%+ of queries), uniform distribution (no one shard gets overloaded), stability (never updated), and availability at routing time. Hash sharding distributes evenly on uniform keys but creates hot shards when tenants are skewed; range sharding is natural for time-series but creates a hot tail; directory-based sharding is the most flexible and is used by Citus’s tenant isolation primitive. Validate the key against real production access patterns and tenant size distribution before committing — changing it later is a months-long project of moving every row. Now when you see a new sharding proposal, run it through all four properties before signing off: a key that fails even one of them will betray you at scale.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 7 done

Connected lessons

builds on

Why sharding exists: the single-Postgres ceilingjunior

unlocks

deepens into

appears again in287

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.