Databases
Sharding: free-recall review
Retrieval beats re-reading. For each prompt, say or write a full answer from memory before you open the model answer — the effort of recall is what makes the mechanism stick when you face it in a design review.
Reconstruct the unit’s spine — the four shard-key properties, partitioning vs sharding, co-location, the hot-shard playbook, cross-shard transactions, and online resharding — without looking back at the lessons.
- 01Name the four properties a shard key must satisfy at once, and why each matters.
- 02What is the difference between partitioning and sharding, and how do production systems compose them?
- 03Explain co-location in Citus and what query performance does when it is violated.
- 04Why does hash sharding still produce hot shards, and what is the production playbook to detect and fix them?
- 05When does a transaction require two-phase commit on a sharded cluster, what does 2PC cost, and what is its dangerous failure mode?
- 06How does Citus online resharding work, why is the per-shard write pause sub-second, and why is changing the shard key still a months-long project?
If you could reconstruct each answer from memory, you hold the unit’s spine: a good shard key is selective, uniform, stable, and present at routing time; partitioning prunes and manages retention on one machine while sharding adds throughput across many; co-location keeps the 99% tenant-scoped case single-node; hot shards are an expected power-law consequence answered by tenant isolation and a tiered policy; cross-shard transactions drag in 2PC with its in-doubt risk, so design single-shard; and online resharding is cheap (sub-second pauses, logical replication) while a shard-key change is the expensive, lived-with-for-years decision.