Backend Architecture BE · 04 · 06

Pooling at scale: many instances, one database, and PgBouncer

One pool against one database is solved. The hard version is N application instances each with their own pool, all colliding against a single max_connections — where the math overflows and a transaction-level multiplexer becomes mandatory, with sharp tradeoffs.

BE Senior ◷ 17 min

Level

FoundationsJuniorMiddleSenior

A single service with a pool of 20 is healthy. Then it scales: autoscaling runs 50 instances under load, each with its own pool of 20. That is 1,000 connections demanded from a Postgres configured with the default max_connections = 100. The database starts rejecting connections with FATAL: sorry, too many clients already, and the rejection hits every instance, so the whole fleet degrades at once. Nothing is leaking and no single pool is misconfigured. The arithmetic simply overflowed: each instance sized its pool in isolation, but they all draw from one shared, hard limit — and that is the problem distributed pooling exists to solve.

The fan-in problem

A client-side pool bounds connections from one instance. It knows nothing about the other instances pointing at the same database. The constraint that actually matters is global:

(number of instances) × (pool size per instance) ≤ database max_connections

with headroom subtracted for admin, replication, and migration tools. With Postgres defaulting to max_connections = 100, and each connection being a real backend process consuming a few megabytes, you cannot simply raise max_connections to 1,000 — the database would spend itself on per-process memory and scheduling, the exact contention from the sizing lesson, now at the server’s own scale. So in a horizontally scaled or autoscaling fleet, the per-instance pool size must be tiny (sometimes 2–5), and even then a scale-up event can blow the budget. This is the wall that pushes you to a different architecture.

PgBouncer: a multiplexer in the middle

A connection pooler like PgBouncer sits between the applications and Postgres. The apps connect to PgBouncer — cheaply, because a PgBouncer client connection is lightweight, not a backend process — and PgBouncer maintains a small pool of real connections to Postgres, lending them out as needed. It is a fan-in multiplexer: thousands of client connections funnel into tens of server connections. Typical ratios are dramatic — PgBouncer can present thousands of client connections while holding only a few dozen real Postgres backends, a multiplexing factor often quoted around 25–50× or more. The fleet’s 1,000 demanded connections become, say, 25 actual backends, and Postgres is comfortable again.

Transaction pooling buys the ratio — and breaks session state

PgBouncer offers pooling modes, and the choice is the whole tradeoff:

Session pooling. A client holds a real server connection for its entire client session. Safe and transparent — everything works — but the multiplexing ratio is poor, because a real backend is tied up for as long as a client stays connected. Little better than no pooler for connection count.
Transaction pooling. A real server connection is assigned only for the duration of a transaction, then returned to the pool the instant the transaction commits. This is what delivers the huge ratio, because connections are held for milliseconds, not minutes. The cost: anything that relies on session state across transactions breaks. Prepared statements (server-side), SET session variables, LISTEN/NOTIFY, advisory locks, and session-level temporary tables no longer work, because the next statement may land on a different backend. Drivers must run in a mode that avoids server-side prepared statements, and application code must not assume session continuity.

Transaction pooling buys the huge ratio precisely by not letting any client keep a backend — the same mechanism that breaks session state across transactions. Session pooling preserves that state but barely reduces the connection count.

▸Why this works

Why does transaction pooling deliver such a large multiplexing ratio while session pooling barely helps? Because the ratio is governed by how long each client occupies a real backend. In session pooling a client owns a backend for the whole time it stays connected — often minutes or hours of mostly-idle keep-alive — so the number of real backends has to be close to the number of concurrently connected clients, which is exactly the count you were trying to shrink. In transaction pooling the backend is occupied only while a transaction is actually executing, typically a few milliseconds, then handed to the next client mid-flight; since at any instant only a small fraction of connected clients are inside an active transaction, a few dozen backends can serve thousands of connections. The ratio is essentially the inverse of the duty cycle — the fraction of time a client spends actually transacting versus idle. That same mechanism is why session state breaks: state like a prepared statement or a SET lives on a specific backend, but transaction pooling deliberately moves you to a different backend on the next transaction to keep the duty cycle low. You cannot have both maximal multiplexing and persistent per-session server state — they are the same coin, because multiplexing works precisely by not letting any client keep a backend.

Serverless makes this acute

The fan-in problem reaches its extreme with serverless functions, where each concurrent invocation can be its own process wanting its own connection, and invocations scale to hundreds in a burst. Traditional pooling assumes a long-lived process that amortises a pool over many requests; a function that lives for 200 ms cannot. The answers are the same shape — an external pooler (PgBouncer, or a managed equivalent like a serverless data proxy) that absorbs the burst of short-lived clients into a stable set of real backends, plus aggressively small per-function limits. The principle never changes: the database’s connection count is a hard, shared, expensive resource, and every layer that fans into it must be bounded.

Approach	Real Postgres backends	Client capacity	Caveat
Per-instance pools only	instances × pool size	Limited by max_connections	Overflows on scale-up
PgBouncer session mode	≈ concurrent clients	Low multiplexing	Safe, transparent, weak ratio
PgBouncer transaction mode	tens	Thousands (25–50×+)	No prepared stmts / SET / LISTEN / advisory locks
Serverless + data proxy	stable small set	Burst of short-lived fns	Per-function limit must be tiny

Quiz

A service healthy at one instance with a pool of 20 starts throwing 'FATAL: sorry, too many clients already' after autoscaling to 50 instances. No pool leaks. Why?

Quiz

Why does PgBouncer transaction pooling achieve a far higher multiplexing ratio than session pooling?

Quiz

What is the main correctness cost of switching to PgBouncer transaction pooling?

50 instances × pool 20 = 1,000 client connections fan into PgBouncer. With transaction pooling PgBouncer holds only ~25 real Postgres backends — a 40× multiplexing ratio — keeping the fleet well under max_connections = 100.

key takeaway

One pool against one database is solved; the hard version is fan-in: N instances each with their own pool collide against a single max_connections, so 50 instances × pool 20 = 1,000 demanded overflows Postgres’s default 100 and the database rejects fleet-wide with “FATAL: sorry, too many clients already” — and you cannot just raise max_connections, because each connection is a real backend process. A connection pooler like PgBouncer multiplexes thousands of lightweight client connections into a few dozen real backends, often 25–50× or more. The ratio comes from transaction pooling, which holds a real backend only for a transaction’s milliseconds rather than a whole session — but that breaks anything relying on session state across transactions: server-side prepared statements, SET variables, LISTEN/NOTIFY, advisory locks. Session pooling keeps those working but barely multiplexes. Serverless makes the fan-in extreme, demanding an external pooler plus tiny per-function limits, because the database’s connection count is always a hard, shared, expensive resource.

Recall before you leave

01
Why does a horizontally scaled fleet overwhelm a database even when every individual pool is correctly sized?
02
What is PgBouncer and how do its pooling modes trade off?
03
Why are transaction pooling's high ratio and its loss of session state two sides of the same mechanism, and why is serverless the acute case?

Recap

The solved case is one pool against one database; the hard case is fan-in, where N instances each sizing a pool in isolation collide against a single max_connections — 50 instances × pool 20 = 1,000 against Postgres’s default 100, rejected fleet-wide with “FATAL: sorry, too many clients already” — and raising max_connections does not help because every connection is a real backend process competing for memory and scheduling. A connection pooler like PgBouncer multiplexes thousands of lightweight client connections into a few dozen real backends, and the big ratio comes specifically from transaction pooling, which occupies a backend only for a transaction’s milliseconds instead of a whole session. That is the same coin as its cost: because the next transaction may run on a different backend, session-spanning state breaks — server-side prepared statements, SET, LISTEN/NOTIFY, advisory locks — while session pooling preserves them but barely multiplexes. Serverless drives the fan-in to its extreme and forces an external pooler plus tiny per-function limits. This closes the pooling unit: the database connection is a hard, shared, expensive resource bounded at every layer. Now when you see a fleet throwing “FATAL: too many clients already” and someone suggests raising max_connections, you know the real fix is arithmetic — shrink per-instance pools and put PgBouncer in transaction mode between them and the database. The next unit turns from holding a resource to surviving a failed call — idempotency and retries, where safely repeating a request becomes the foundation of resilience.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 7 done

Connected lessons

builds on

Pool exhaustion: leaks, and why a bigger pool won''''t save yousenior

deepens into

Why idempotency: making retries safejunior

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.

Apply this

Put this lesson to work on a real build.

Distributed rate limiterBuild a token-bucket limiter that holds across many app instances by keeping the counter in Redis, not in process memory.