Databases
Partitioning vs sharding: same word, two different things
A team adds PARTITION BY RANGE (created_at) to their events table and calls it “sharding the database.” A month later they wonder why write throughput has not improved. Partitioning and sharding are often conflated — they solve different problems on different scales.
Partitioning: one Postgres, table split for pruning
Declarative table partitioning (introduced in PostgreSQL 10) splits a single table into named sub-tables called partitions, all living on the same Postgres instance.
-- One Postgres, one machine, many partitions
CREATE TABLE events (
id BIGINT,
tenant_id INT,
created_at TIMESTAMPTZ,
payload JSONB
) PARTITION BY RANGE (created_at);
CREATE TABLE events_2026_01 PARTITION OF events
FOR VALUES FROM ('2026-01-01') TO ('2026-02-01');
CREATE TABLE events_2026_02 PARTITION OF events
FOR VALUES FROM ('2026-02-01') TO ('2026-03-01');Postgres gains two things from this:
- Partition pruning: a query with
WHERE created_at >= '2026-02-01'only scans the relevant partitions, not the whole table. - Instant retention DROP: deleting a month of data is
DROP TABLE events_2026_01— instantaneous, no long DELETE, no dead-tuple bloat, no massive VACUUM.
The critical limitation: partitioning does not increase throughput beyond one machine. All writes still go to the same Postgres primary; all reads share the same CPU and RAM.
Sharding: many Postgres instances, routed by key
Sharding distributes rows across N physically separate Postgres instances. Each instance holds a subset of the data. Writes and reads for a given shard key go to that shard’s Postgres — parallel, independent IO and CPU.
| Dimension | Partitioning | Sharding |
|---|---|---|
| Postgres instances | One | Many (N) |
| Increases write throughput? | No | Yes (~N×) |
| Primary benefit | Partition pruning, fast DROP for retention | Scale past one machine’s ceiling |
| Cross-table joins | Free (all on one instance) | Expensive if cross-shard; free if co-located |
| Operational multiplier | Low (one instance still) | N× per-shard operations |
| Available since | PostgreSQL 10 (declarative) | Via Citus extension or application-level routing |
How they compose in production
Real production systems typically use both — partition within each shard:
Shard 0: Postgres instance holding tenant_id 1–500
└── events table: PARTITION BY RANGE (created_at)
├── events_2026_01 (Jan rows for tenants 1-500)
├── events_2026_02 (Feb rows for tenants 1-500)
└── ...
Shard 1: Postgres instance holding tenant_id 501–1000
└── events table: same month partitions
├── events_2026_01
└── ...The shard key (tenant_id) routes traffic to the right Postgres instance. Within that instance, the time partition means: query WHERE tenant_id = 42 AND created_at >= '2026-02-01' prunes to just the February partition. Retention is DROP PARTITION events_2026_01 per shard.
The senior intuition: partition for what you would do on a single machine anyway (time-series retention, query pruning for large tables); shard only when single-Postgres throughput and storage limits are measured and proven binding.
Why this works
Why does Postgres support up to ~1–2k partitions in OLTP workloads but ~10k in OLAP? The partition metadata is loaded into the planner’s working memory on every query — large numbers of partitions slow plan generation. For OLTP queries that touch one or two partitions, the plan overhead from 10k partitions dominates execution time. OLAP queries that scan many partitions in parallel can amortize the planning cost. The practical limit in OLTP is around 1–2k partitions per table.
A team adds PARTITION BY RANGE (created_at) to their largest table. What problem does this solve?
In a production Citus cluster, what is the typical composition of partitioning and sharding?
- 01What are the two main benefits of declarative table partitioning in Postgres, and what is its key limitation?
- 02A B2B SaaS shards orders by tenant_id across 16 Postgres instances. Each instance also partitions orders by created_at month. What does the query WHERE tenant_id = 42 AND created_at >= '2026-02-01' do?
- 03Why is 'partition for manageability, shard for capacity' the senior heuristic?
Partitioning (PostgreSQL 10+ declarative PARTITION BY) splits one table into sub-tables on a single Postgres instance — it benefits query pruning and makes retention DROP instantaneous, but does not increase throughput past one machine. Sharding distributes rows across N separate Postgres instances by a shard key, giving roughly N× the throughput and storage capacity at the cost of N× the operational complexity. Production systems use both: shard by tenant_id for cross-machine capacity, and partition by time within each shard for pruning and retention. The senior rule: partition freely when the table is large enough to benefit; shard only after measuring and proving that single-Postgres limits are the real bottleneck.
appears again in140
- Why GraphQL gets N+1junior
- DataLoader mechanics: tick-boundary batchingmiddle
- Batch function contracts: ordering, shapes, errorsmiddle
- Federation and lookahead: batching beyond DataLoadermiddle
- Query complexity defences: depth, cost, persisted queriesmiddle
- Senior GraphQL API: scheduling contract, tenant isolation, observabilitysenior
- Why idempotency: making retries safejunior
- Server-side state machine: four states of an idempotency keymiddle
- Outbox and inbox: effectively-once across the dual-write boundarymiddle
- Concurrency and cache architecture for idempotency at scalesenior
- Observability, production failures, and global-scale designsenior
- The event loop: one thread, three queuesjunior
- Tasks, microtasks, and scheduler.yield()middle
- Microtask starvation, Long Tasks, and LoAFsenior
- Node.js event loop: phases, nextTick, and loop lagsenior
- React, Vue, and INP observability in productionsenior
- The render pipeline: six stages from bytes to pixelsjunior
- Stage costs and the renderer process modelmiddle
- Invalidation, dirty bits, and containmiddle
- Compositor layers: promotion, overlap, and GPU memorymiddle
- DevTools flame strip and the frame lifecyclemiddle
- Layout thrash: forced synchronous layoutsenior
- BeginMainFrame, compositor-driven animations, and GPU memorysenior
- Production observability: LoAF, INP, and the full attack surfacesenior
- What V8 is and why performance varies 100×junior
- V8''''s four-tier JIT pipeline and profile-guided tieringmiddle
- Hidden classes, transition trees, and memory layoutmiddle
- Inline caches, IC states, and deoptimizationmiddle
- Orinoco GC: parallel scavenger, concurrent marking, and write barriersmiddle
- TurboFan''''s speculative engine and the deopt-loop trapsenior
- V8 in production: isolates, pointer compression, and real failuressenior
- Service worker lifecycle and cache strategiesmiddle
- Service worker edge cases: version skew, durability, and navigation trapssenior
- What the reconciler does: render vs commitjunior
- The fiber object and the double-buffer treemiddle
- Render phase purity and commit phase sub-stepsmiddle
- Reconciliation: diffing heuristics and the key trapmiddle
- Priority lanes, time-slicing, and useTransitionmiddle
- Bailout, memoisation, and tearingsenior
- React Profiler, the Compiler, and production observabilitysenior
- Rendering strategies: SSG, SSR, ISR, streaming, and hydrationjunior
- SSG, SSR, ISR, streaming, and RSC — how each worksmiddle
- Hydration cost: selective, progressive, islands, resumabilitymiddle
- Hydration mismatch: causes, detection, and the determinism rulesenior
- RSC, per-route strategy, and production observabilitysenior
- Core Web Vitals: what LCP, INP, and CLS measurejunior
- CLS: why layout shifts happen and how to stop themmiddle
- Metric tradeoffs, RUM attribution, and the CI+field loopsenior
- The full picture: URL to LCP to INP as a relay racejunior
- Eight layers traced: from the service worker to the second navigationmiddle
- Five canonical breaks: where production reliably diessenior
- The three-track method: reading traces and building a monitored systemsenior
- What is a cache stampede and why it makes things worsejunior
- Lock and single-flight: bounding concurrent rebuildsmiddle
- XFetch: coordination-free probabilistic early expirationmiddle
- Stale-while-revalidate and CDN request coalescingmiddle
- Detecting stampedes and designing TTL for productionmiddle
- Metastable failure, fencing tokens, and production postmortemssenior
- Raft roles, terms, and why majority quorums prevent split brainjunior
- How Raft replicates a log entry and decides it is safe to commitmiddle
- Raft leader election: timeouts, voting rules, and the four safety propertiesmiddle
- Raft in the real world: partitions, slow disks, and client routingmiddle
- Raft extensions: pre-vote, learners, snapshots, and linearizable readssenior
- Raft in production: membership changes, Multi-Raft, and observabilitysenior
- Where data fetching happens — and why it decides LCPjunior
- Fetch waterfalls — diagnosis and the Promise.all curemiddle
- React Server Components and Suspense streamingmiddle
- Client-side cache: TanStack Query, SWR, and stale-while-revalidatemiddle
- LCP, prefetch, and race conditions in interactive fetchingmiddle
- Senior internals: RSC payload, caching layers, and production failure modessenior
- The three-way handshakejunior
- Sequence numbers and connection statemiddle
- DNS: what it does and why it existsjunior
- The resolver walk: referrals, record types, and gluemiddle
- TTL, caching, and DNS propagationmiddle
- The 1-RTT handshake: key shares and ECDHEmiddle
- Session resumption and 0-RTTmiddle
- WebSocket: the HTTP upgrade handshakejunior
- WebSocket frame format: opcodes, masking, fragmentationmiddle
- WebSocket backpressure: when clients can''''t keep upmiddle
- Reconnection: jittered backoff, thundering herd, message resumptionsenior
- WebSocket at scale: HTTP/2 multiplexing, permessage-deflate, C10Msenior
- WebSocket in production: proxies, security, and distributed architecturesenior
- What reverse proxies dojunior
- Health checks, connection draining, and slow startmiddle
- Session affinity, consistent hashing, and the right fixmiddle
- Retry storms, circuit breakers, and load sheddingsenior
- Resilient LB architecture: anycast, zone-aware routing, and observabilitysenior
- Why QUIC and not TCP+TLSjunior
- Connection IDs and network migrationmiddle
- 0-RTT resumption and packet encryptionsenior
- DDoS: what it is and why it worksjunior
- Amplification attacks and state exhaustionmiddle
- Rate limiting: algorithms and architecturemiddle
- WAFs, firewalls, mTLS, and HSTSmiddle
- DNS cache poisoning and BGP hijackingsenior
- Defense-in-depth architecture and attack economicssenior
- DNS, TCP, TLS in sequence: where the milliseconds gomiddle
- Proxy intercepts and security gates: rate limiters, WAF, mTLSmiddle
- Alternate paths: QUIC 0-RTT, WebSocket upgrade, connection migrationmiddle
- Observability: distributed traces, USE/RED, and samplingsenior
- Resilience: cascading retries, circuit breakers, and error budgetssenior
- What the three signals are: logs, metrics, and tracesjunior
- Why structured logs exist: the diary vs the spreadsheetjunior
- The production log schema: fields every line must carrymiddle
- PII redaction and log injectionsenior
- OTel Logs Data Model and audit logs as a subsystemsenior
- SLI, SLO, and the error budget: reliability by the numbersjunior
- Error budget policy, latency SLOs, and composite journeysmiddle
- Production SLO failures, self-observability, security, and the big picturesenior
- The incident loop: from pager to postmortem to preventionmiddle
- Cache lines, struct layout, and false sharingmiddle
- SIMD, SoA vs AoS, and memory bandwidthmiddle
- Cache-oblivious algorithms, PGO, and production failuressenior
- GC in production: observability, security, edge cases, and fleet governancesenior
- Batching: amortize fixed cost per operationjunior
- The batching window: size and wait timemiddle
- Batching in Kafka and Postgresmiddle
- io_uring and observability of batchingmiddle
- From Nagle to io_uring: evolution of batchingmiddle
- Backpressure, failure isolation, and batch security in productionsenior
- CI enforcement and RUM: making budgets stickmiddle
- V8 JIT pipeline, HTTP priorities, and bundle securitysenior
- The performance loop: discipline, not a projectjunior
- Classify and fix: matching bottleneck families to remediesmiddle
- Observability stack and CI gates: catching regressions before they shipmiddle
- Incident to enforcement: SLO burn to verified fix in 35 minutesmiddle
- Culture, economics, and org-scale performancesenior
- At-most-once, at-least-once, exactly-once: the three delivery contractsjunior
- The three failure legs — where duplicates and losses actually happenmiddle
- Consumer-side dedup: the cheapest path to exactly-once processingmiddle
- Kafka exactly-once semantics: idempotent producer and transactionsmiddle
- SQS visibility timeout, DLQ, and the outbox patternmiddle
- Exactly-once in production: impossibility proof, hybrid patterns, and real incidentssenior
- What OAuth is and why passwords are not the answerjunior
- Authorization code flow with PKCEmiddle
- ID token validation and JWKS cache managementmiddle
- Refresh token rotation and scope-based least privilegemiddle
- Sender-constrained tokens: DPoP and mTLSsenior
- OAuth in production: audience attacks, observability, and real failuressenior