awesome-everything RU
↑ Back to the climb

Databases

The seven acts: from CREATE TABLE to Citus

Crux A single product grows from a 1-table MVP to a 1-billion-row Citus cluster through seven moments of database failure — each one names the trigger, the symptom, and the lever that resolves it.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at junior altitude — the surface
◷ 12 min

Day 0 at a SaaS startup. The PM wants “search users by email.” One engineer, one Postgres, one table. Three years later: 1 billion rows, six Citus shards, and a runbook the size of a textbook. Between those two points are seven moments when the database broke and the team learned.

The seven pieces of this chapter map one-to-one onto seven acts of product growth. Each act has a trigger, a symptom, and a lever. Skip a lever and the next act costs more to resolve.

Act 1 — Day 0, schema design. “users(email, name, org_id).” Should email be UNIQUE? Should org_id be a foreign key? Should prefs be a side table or JSONB? Decisions: email is UNIQUE NOT NULL CITEXT. org_id is BIGINT REFERENCES orgs(id) ON DELETE CASCADE. prefs is JSONB with a GIN index when search demands it. A surrogate id BIGSERIAL insulates from email changes. Bend the rules only with measured throughput pressure, never as a default.

Act 2 — Week 1, 10K rows, the first slow query. Email-search endpoint p95 climbs from 30 ms to 800 ms. The planner does a sequential scan. CREATE INDEX users_email_idx ON users(email). The leading-column rule of B-tree means WHERE email = ? resolves with two page reads. p95 drops to 4 ms.

Act 3 — Month 1, 100K rows, the planner lies. Half the requests are fast, half take 600 ms. EXPLAIN ANALYZE shows the planner sometimes picks a seq-scan despite the index. Row estimate is off by 30×: stale statistics. ANALYZE users; rebuilds the histograms. The planner picks plans from statistics, not from data; statistics maintenance is the operational discipline.

Act 4 — Month 6, the silent bloat. A nightly report runner holds one transaction open for four hours. VACUUM cannot reclaim dead tuples below the xmin horizon. The users table swells from 200 MB to 80 GB. Fix: kill the long transaction, set idle_in_transaction_session_timeout = 60s, run pg_repack to reclaim disk. Permanent fix: replicate to a read replica for reporting.

Act 5 — Year 1, 1M users + 50 app pods, the connection storm. Each pod opens its own Postgres backend on cold-start. At pod-rollout the cluster sees 1000 concurrent backends; the kernel scheduler thrashes; 4 ms queries take 4 s. PgBouncer in transaction-mode: 100 server-side backends, 10000 client connections multiplexed. Sizing rule: pool_size = active_concurrent_transactions × safety_factor, not max_app_workers.

Act 6 — Year 2, the migration that froze prod. Multi-tenancy day. ALTER TABLE users ADD COLUMN tenant_id BIGINT NOT NULL DEFAULT 0 takes an AccessExclusiveLock and triggers a full table rewrite. Prod freezes for eight minutes. The expand-contract recipe: add nullable column, backfill in batches, add CHECK ... NOT VALID, VALIDATE CONSTRAINT, SET NOT NULL, drop the check.

Act 7 — Year 3, 1B rows, the hot shard. Citus is rolled out, shard key is tenant_id. Tenant Acme accounts for 40% of all queries. Acme’s shard saturates while the others idle. Fix: co-location of related tables on the same shard key; online resharding with citus_rebalance_table_shards.

Seven acts — trigger and lever at a glance
Act 1 — Day 0
Schema: surrogate keys, FKs, constraints
Act 2 — Week 1, 10K rows
Index on the filtered column
Act 3 — Month 1, 100K rows
ANALYZE to refresh statistics
Act 4 — Month 6, 200 MB → 80 GB
Kill long tx, pg_repack, timeout
Act 5 — Year 1, 50 pods
PgBouncer transaction-mode pool
Act 6 — Year 2, multi-tenancy
Expand-contract migration recipe
Act 7 — Year 3, 1B rows
Citus sharding with co-location

Why the order matters.

A database is like a growing city. The schema is zoning. Indexes are the street map. Execution plans are the traffic dispatcher. MVCC is multiple lanes so cars pass without colliding. The connection pool is the parking garage. Migrations are construction crews who must not close every road at once. Sharding is annexing new districts when one city block can’t hold the traffic. A skipped lesson means the city keeps growing, but the wrong layer is overloaded.

Each act unlocks the next only if you nailed the previous one. Skip Act 1 (schema) and Acts 2–7 fight a losing battle against inefficient joins. Skip Act 2 (indexing) and Act 3 planning decisions are made blind. Skip Act 3 (stats) and Act 4 vacuum is the only lever left. Skip Act 4 (bloat) and Act 5 pool threads see table scans get slower. Skip Act 5 (pooling) and Act 6 migrations are starved by connection storms. Skip Act 6 (lock safety) and Act 7 sharding is impossible. The order is a constraint imposed by physics and Postgres internals.

Why this works

The price of a late fix is exponential: Act 1 mistakes cost days to fix; Act 7 mistakes cost months of resharding. Teams that skip early acts pay catastrophic costs later. Teams that over-engineer early acts (sharding a 10 GB dataset) waste resources on problems that do not exist yet. The art is knowing your growth trajectory and timing each act to land just before the prior act’s failure modes become production pain.

Quiz

At 10K rows, the email-search endpoint is suddenly slow. Cheapest first lever?

Quiz

The disk is full but the row count has not changed. Most likely cause?

Quiz

Why is sharding a year-3 lever, not a year-1 lever?

Order the steps

Order the seven scale-tier levers from earliest (Day 0) to latest (Year 3):

  1. 1 Design the relational schema (tables, keys, constraints)
  2. 2 Add the right index for the query
  3. 3 Verify the execution plan uses the index; run ANALYZE
  4. 4 Hunt the long transaction blocking VACUUM
  5. 5 Put a connection pooler in front of Postgres
  6. 6 Migrate schema safely with expand-contract
  7. 7 Shard the largest table across nodes
Recall before you leave
  1. 01
    Name the seven levers in order and give one symptom that signals each tier.
  2. 02
    Why does the order of acts matter — why is skipping Act 2 not 'fine, I'll add indexes later'?
  3. 03
    Trace the city metaphor: match each layer (schema, index, plans, MVCC, pool, migrations, sharding) to its city element.
Recap

A product’s database passes through seven growth tiers, each with a distinct trigger and a single correct lever. Schema decisions made on Day 0 compound into every later act — a surrogate key and proper FK constraints are cheap greenfield choices that become expensive retrofits at scale. Indexes fix queries at 10K rows; missing one at 1B rows is a multi-hour concurrent build under traffic. Stale statistics cause the planner to ignore existing indexes; ANALYZE is the operational discipline. Bloat from a long-held transaction can grow a 200 MB table to 80 GB in days; the lever is killing the long transaction, not adding more disk. Connection storms at pod-rollout time require a connection pooler, not more backends. Schema migrations that take an AccessExclusiveLock queue every query behind them; expand-contract avoids the freeze. Sharding distributes load but multiplies every operational task — it is the last resort, entered deliberately after all prior levers are applied. The order is a constraint imposed by Postgres internals: skip one act and every subsequent act costs exponentially more.

Connected lessons
appears again in258
Continue the climb ↑Acts 1–3 in depth: schema, indexes, and planner statistics
shortcuts expand
search
K
prev piece
k
next piece
j
cycle tier
t
this menu
?
sources6
expand
  1. 01
  2. 02
  3. 03
  4. 04
  5. 05
  6. 06

Trademarks belong to their respective owners. Editorial reference only.