Crux Read real pool config, handler code, and a Postgres log line, predict the failure, and pick the highest-leverage fix — sizing math, a leak, an idle-in-transaction hold, and a timeout config.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at senior altitude — in orbit
◷ 14 min
Pool incidents are diagnosed in config files, handler code, and the database log — not in the abstract. Read each snippet, predict the failure it produces under load, and choose the fix a senior engineer would make first.
Goal
Practise the loop every pooling incident runs: read the config and the hot path, predict where connections starve, leak, or go stale, and reach for the structural fix before reaching for a bigger number.
Snippet 1 — sizing the pool
# 4 vCPU Postgres, one SSD. Service handles ~2000 req/s.# A panicked engineer set this after a "DB is the bottleneck" alert:spring.datasource.hikari.maximum-pool-size: 200spring.datasource.hikari.minimum-idle: 200
Quiz
Completed
On this 4-vCPU, one-SSD database, what should maximum-pool-size actually be, and why is 200 harmful?
Heads-up A connection is not a unit of parallelism; a core is. Past roughly (cores x 2) + spindles, extra connections only contend, they do not execute.
Heads-up max_connections is a server-wide ceiling shared across all clients, not a per-app target. The throughput-maximising number is single digits here.
Heads-up High request rate is served by the overflow waiting cheaply in the pool's queue, not by more backends. A larger pool thrashes the database — request rate does not change the core count.
Snippet 2 — the handler
async function getUser(id) { const conn = await pool.acquire(); const rows = await conn.query("SELECT * FROM users WHERE id = $1", [id]); conn.release(); // returns connection to the pool return rows[0];}
Quiz
Completed
This handler runs fine in testing but the pool drains to empty over hours in production, fixed only by a restart. What is the defect and the fix?
Heads-up Acquire is designed for concurrent callers; the bug is that the release is skipped on the error path, leaking one connection each time the query throws.
Heads-up Column count does not drain a pool. The leak is the un-guaranteed release on the throw path; that is what ratchets the pool to empty.
Heads-up A bigger pool only delays the same exhaustion because the leak drains at its own rate. The structural fix is try/finally, not more connections.
Snippet 3 — the Postgres log line
LOG: duration: 612000.244 ms state: idle in transactionDETAIL: process 48213 has been idle in transaction for 00:10:12HINT: Sessions idle in transaction hold their connection and any locks.
Quiz
Completed
Several backends show 'idle in transaction' for minutes and the pool keeps exhausting. What is happening, and the right defence?
Heads-up The session is idle, not running a query — the connection sits inside an open transaction doing nothing on the database. An index does not help a connection that is being held across unrelated work.
Heads-up Nothing here indicates memory pressure; 'idle in transaction' means the client opened a transaction and then stopped issuing statements, pinning the backend and its locks.
Heads-up More connections just give more sessions to strand idle-in-transaction, holding more locks. Bound the open transaction's duration instead.
Snippet 4 — the timeout config
spring.datasource.hikari.connection-timeout: 30000 # 30 s, the defaultspring.datasource.hikari.max-lifetime: 28800000 # 8 h# Backing MySQL has wait_timeout = 28800 (8 h, default)
Quiz
Completed
Two problems hide in this config that will bite under a slow dependency and after idle periods. Which fix addresses both correctly?
Heads-up A 30 s acquisition timeout pins web-server threads through a slowdown until the tier starves. And max-lifetime equal to wait_timeout races the server — it must be strictly shorter so the pool always closes first.
Heads-up 50 ms fails benign micro-bursts that would clear in tens of ms. And an 8 h max-lifetime equal to wait_timeout still lets the server kill connections first, handing out stale sockets.
Heads-up The acquisition timeout is the single number that decides fail-fast vs thread-starvation under a slow dependency — it is the most outage-relevant knob here.
Recap
Every pooling incident is read in config, code, and logs: a pool sized at 200 on 4 cores is contention, not concurrency — the (cores x 2) + spindles heuristic puts it near 9; a release() outside try/finally leaks on the error path and drains the pool over hours; ‘idle in transaction’ in the log means a connection hoarded inside an open transaction across unrelated work, holding locks; and a 30 s acquisition timeout plus a max-lifetime that does not undercut the DB’s wait_timeout is a starvation-and-stale-socket trap. Diagnose from the evidence, fix structurally (right size, guaranteed return, tight transactions, bounded waits), then re-measure.