Databases
Sharding: multiple-choice review
Six questions that cut across the whole unit. Each mirrors a decision you make at the whiteboard or mid-incident — not a definition to recite, but a tradeoff to weigh when one Postgres is no longer enough.
Confirm you can connect shard-key choice, the partitioning-vs-sharding distinction, co-location, the hot-shard failure mode, cross-shard transactions, and online resharding — the synthesis the seven lessons built toward.
A B2B SaaS shards by tenant_id. The top 5 tenants generate 60% of traffic. Pure hash sharding is proposed because it 'distributes evenly'. What is the flaw, and the senior fix?
A team adds `PARTITION BY RANGE (created_at)` to their largest table and reports they have 'sharded the database'. A month later write throughput is unchanged. Why?
On a tenant_id-sharded Citus cluster, an engineer adds `audit_log` distributed by `id` instead of `tenant_id`, then runs `SELECT … FROM orders o JOIN audit_log a ON a.order_id = o.id WHERE o.tenant_id = 42`. What happens?
A skew alert fires: one shard is at 94% CPU, the other 31 average 18%, and pg_stat_statements attributes 62% of that shard's query time to one tenant. What is the correct immediate action?
A refund transaction updates `orders` (tenant-scoped) and a global `ledger` table sharded by `account_id`, so the two rows live on different shards. Citus runs this with two-phase commit. What is the failure mode to monitor?
To add capacity a team runs Citus 11.1+ online rebalancing. Mid-move the coordinator crashes and a teammate panics that the moving shard is corrupted. How do you read this, and what mechanism makes the answer true?
The through-line of the unit is one decision chain: pick a shard key that is selective, uniform, stable, and present at routing time; partition for pruning and retention but shard for throughput; co-locate every tenant-scoped table so 99% of joins stay single-node; expect hot shards on power-law tenants and answer them with tenant isolation, not rebalancing; keep transactions single-shard to dodge 2PC’s in-doubt risk; and lean on logical-replication-based online resharding — while remembering that a shard-key change is a months-long decision lived with for years.