Backend Architecture
Pooling at scale: many instances, one database, and PgBouncer
A single service with a pool of 20 is healthy. Then it scales: autoscaling runs 50 instances under load, each with its own pool of 20. That is 1,000 connections demanded from a Postgres configured with the default max_connections = 100. The database starts rejecting connections with FATAL: sorry, too many clients already, and the rejection hits every instance, so the whole fleet degrades at once. Nothing is leaking and no single pool is misconfigured. The arithmetic simply overflowed: each instance sized its pool in isolation, but they all draw from one shared, hard limit — and that is the problem distributed pooling exists to solve.
The fan-in problem
A client-side pool bounds connections from one instance. It knows nothing about the other instances pointing at the same database. The constraint that actually matters is global:
(number of instances) × (pool size per instance) ≤ database max_connectionswith headroom subtracted for admin, replication, and migration tools. With Postgres defaulting to max_connections = 100, and each connection being a real backend process consuming a few megabytes, you cannot simply raise max_connections to 1,000 — the database would spend itself on per-process memory and scheduling, the exact contention from the sizing lesson, now at the server’s own scale. So in a horizontally scaled or autoscaling fleet, the per-instance pool size must be tiny (sometimes 2–5), and even then a scale-up event can blow the budget. This is the wall that pushes you to a different architecture.
PgBouncer: a multiplexer in the middle
A connection pooler like PgBouncer sits between the applications and Postgres. The apps connect to PgBouncer — cheaply, because a PgBouncer client connection is lightweight, not a backend process — and PgBouncer maintains a small pool of real connections to Postgres, lending them out as needed. It is a fan-in multiplexer: thousands of client connections funnel into tens of server connections. Typical ratios are dramatic — PgBouncer can present thousands of client connections while holding only a few dozen real Postgres backends, a multiplexing factor often quoted around 25–50× or more. The fleet’s 1,000 demanded connections become, say, 25 actual backends, and Postgres is comfortable again.
Transaction pooling buys the ratio — and breaks session state
PgBouncer offers pooling modes, and the choice is the whole tradeoff:
- Session pooling. A client holds a real server connection for its entire client session. Safe and transparent — everything works — but the multiplexing ratio is poor, because a real backend is tied up for as long as a client stays connected. Little better than no pooler for connection count.
- Transaction pooling. A real server connection is assigned only for the duration of a transaction, then returned to the pool the instant the transaction commits. This is what delivers the huge ratio, because connections are held for milliseconds, not minutes. The cost: anything that relies on session state across transactions breaks. Prepared statements (server-side),
SETsession variables,LISTEN/NOTIFY, advisory locks, and session-level temporary tables no longer work, because the next statement may land on a different backend. Drivers must run in a mode that avoids server-side prepared statements, and application code must not assume session continuity.
Why this works
Why does transaction pooling deliver such a large multiplexing ratio while session pooling barely helps? Because the ratio is governed by how long each client occupies a real backend. In session pooling a client owns a backend for the whole time it stays connected — often minutes or hours of mostly-idle keep-alive — so the number of real backends has to be close to the number of concurrently connected clients, which is exactly the count you were trying to shrink. In transaction pooling the backend is occupied only while a transaction is actually executing, typically a few milliseconds, then handed to the next client mid-flight; since at any instant only a small fraction of connected clients are inside an active transaction, a few dozen backends can serve thousands of connections. The ratio is essentially the inverse of the duty cycle — the fraction of time a client spends actually transacting versus idle. That same mechanism is why session state breaks: state like a prepared statement or a SET lives on a specific backend, but transaction pooling deliberately moves you to a different backend on the next transaction to keep the duty cycle low. You cannot have both maximal multiplexing and persistent per-session server state — they are the same coin, because multiplexing works precisely by not letting any client keep a backend.
Serverless makes this acute
The fan-in problem reaches its extreme with serverless functions, where each concurrent invocation can be its own process wanting its own connection, and invocations scale to hundreds in a burst. Traditional pooling assumes a long-lived process that amortises a pool over many requests; a function that lives for 200 ms cannot. The answers are the same shape — an external pooler (PgBouncer, or a managed equivalent like a serverless data proxy) that absorbs the burst of short-lived clients into a stable set of real backends, plus aggressively small per-function limits. The principle never changes: the database’s connection count is a hard, shared, expensive resource, and every layer that fans into it must be bounded.
| Approach | Real Postgres backends | Client capacity | Caveat |
|---|---|---|---|
| Per-instance pools only | instances × pool size | Limited by max_connections | Overflows on scale-up |
| PgBouncer session mode | ≈ concurrent clients | Low multiplexing | Safe, transparent, weak ratio |
| PgBouncer transaction mode | tens | Thousands (25–50×+) | No prepared stmts / SET / LISTEN / advisory locks |
| Serverless + data proxy | stable small set | Burst of short-lived fns | Per-function limit must be tiny |
A service healthy at one instance with a pool of 20 starts throwing 'FATAL: sorry, too many clients already' after autoscaling to 50 instances. No pool leaks. Why?
Why does PgBouncer transaction pooling achieve a far higher multiplexing ratio than session pooling?
What is the main correctness cost of switching to PgBouncer transaction pooling?
- 01Why does a horizontally scaled fleet overwhelm a database even when every individual pool is correctly sized?
- 02What is PgBouncer and how do its pooling modes trade off?
- 03Why are transaction pooling's high ratio and its loss of session state two sides of the same mechanism, and why is serverless the acute case?
The solved case is one pool against one database; the hard case is fan-in, where N instances each sizing a pool in isolation collide against a single max_connections — 50 instances × pool 20 = 1,000 against Postgres’s default 100, rejected fleet-wide with “FATAL: sorry, too many clients already” — and raising max_connections does not help because every connection is a real backend process competing for memory and scheduling. A connection pooler like PgBouncer multiplexes thousands of lightweight client connections into a few dozen real backends, and the big ratio comes specifically from transaction pooling, which occupies a backend only for a transaction’s milliseconds instead of a whole session. That is the same coin as its cost: because the next transaction may run on a different backend, session-spanning state breaks — server-side prepared statements, SET, LISTEN/NOTIFY, advisory locks — while session pooling preserves them but barely multiplexes. Serverless drives the fan-in to its extreme and forces an external pooler plus tiny per-function limits. This closes the pooling unit: the database connection is a hard, shared, expensive resource bounded at every layer. The next unit turns from holding a resource to surviving a failed call — idempotency and retries, where safely repeating a request becomes the foundation of resilience.