awesome-everything RU
↑ Back to the climb

Databases

Sharding: provoke and tame a hot shard

Crux Hands-on project — stand up a Citus cluster, provoke a hot shard on a power-law tenant, isolate it online, and prove the fix with before/after skew and latency numbers.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at senior altitude — in orbit
◷ 240 min

Reading about hot shards is not the same as watching one shard climb to 94% CPU while the rest idle, then cutting the offending tenant onto its own shard with a sub-second pause. Build a small Citus cluster, drive it into the canonical sharding failure, and apply the unit’s playbook until the skew flattens — with evidence at every step.

Goal

Turn the unit’s mental model into a reproducible operational loop: design a co-located tenant-sharded schema, load a power-law tenant distribution, detect the hot shard from skew and per-tenant metrics, isolate the outlier online, and verify the fix with before/after numbers.

Project
0 of 7
Objective

Stand up a multi-worker Citus cluster with a co-located tenant_id-sharded schema, deliberately create a hot shard with a power-law tenant load, then detect and isolate the outlier tenant online — proving the skew and latency improvement with measurements, not estimates.

Requirements
Acceptance criteria
  • A before/after table: max/median shard-size skew ratio, hot-worker CPU %, whale P99, and small-tenant P99 — all measured under the same load, not estimated.
  • EXPLAIN output (or query plans) proving the co-located join is single-worker and the non-co-located join fans out to all workers.
  • Evidence the isolation was online: the measured write-pause is sub-second and the cluster served traffic throughout (no maintenance window).
  • A one-paragraph write-up naming why isolate_tenant_to_new_shard fixed the skew where citus_rebalance_start() would not have, and the leading indicators (skew ratio, per-tenant share) you would alert on to pre-empt the next incident.
Senior stretch
  • Add a tiered isolation policy as a script/cron: query per-tenant share from pg_stat_statements and auto-isolate any tenant crossing ~5% of cluster query time; demonstrate it firing on a synthetic growth pattern before CPU saturates.
  • Exercise online resharding: add a fourth worker and run citus_rebalance_start(); kill the coordinator mid-move and show the rebalancer resumes idempotently from pg_stat_subscription with no data loss.
  • Force a cross-shard transaction (update a co-located table plus a differently-keyed table in one BEGIN/COMMIT), observe the 2PC path, then crash the coordinator mid-prepare and demonstrate detecting and resolving the stuck entry in pg_prepared_xacts via COMMIT/ROLLBACK PREPARED.
  • Build the schema a second way with Citus 12 schema-based sharding (one schema per tenant) and compare onboarding/offboarding ergonomics and planning overhead against the row-based version.
Recap

This is the loop you will run in every real sharding incident: design for co-location so the common case stays single-node, instrument shard skew and per-tenant load as leading indicators, isolate the outlier tenant online rather than reaching for a rebalance that keeps it bundled, and verify with before/after numbers under identical load. Doing it once on a toy Citus cluster makes the production version — where the whale tenant is a paying enterprise account and the on-call clock is running — muscle memory.

Continue the climb ↑The seven acts: from CREATE TABLE to Citus
shortcuts expand
search
K
prev piece
k
next piece
j
cycle tier
t
this menu
?
sources2
expand
  1. 01
  2. 02

Trademarks belong to their respective owners. Editorial reference only.