Backend Architecture BE · 05 · 05

Concurrency and cache architecture for idempotency at scale

Naive SELECT-then-INSERT is not atomic. Postgres ON CONFLICT or advisory locks prevent race conditions. At high throughput, a Redis Cluster hot-path with Postgres cold-path fallback is the production pattern.

BE Senior ◷ 16 min

Level

FoundationsJuniorMiddleSenior

Two requests with the same idempotency key arrive at the server simultaneously — a real scenario when a mobile client fires two retries before the first acknowledgment. A naive SELECT-then-INSERT lets both through. Both charge the customer.

The race condition in naive implementations

If you have already implemented idempotency keys using a simple read-then-write, you may be shipping a double-charge bug that only surfaces under concurrent load. Here is how to close the window.

A common first implementation:

-- Not atomic!
SELECT * FROM idempotency_keys WHERE key = $1;
-- Time passes here. A second request slips in.
INSERT INTO idempotency_keys (key, fingerprint, status, expires_at)
  VALUES ($1, $2, 'in_progress', NOW() + interval '24 hours');

Between the SELECT and the INSERT, a second concurrent request can execute the same SELECT (seeing no row), then also INSERT. Both succeed. Both start processing. Both charge the customer.

Fix 1: Postgres INSERT … ON CONFLICT DO NOTHING RETURNING

INSERT INTO idempotency_keys (key, fingerprint, status, expires_at)
  VALUES ($1, $2, 'in_progress', NOW() + interval '24 hours')
  ON CONFLICT (key) DO NOTHING
  RETURNING *;

If the RETURNING set is empty, this request lost the race. It then re-reads the row:

If status=in_progress → return 409 Conflict.
If status=completed → return the cached response.

The INSERT is atomic at the database level. No race.

Fix 2: Postgres advisory locks

SELECT pg_advisory_xact_lock(hashtext($1));
-- Now safe to SELECT-then-INSERT

pg_advisory_xact_lock holds a session-level exclusive lock for the transaction duration. Only one connection can hold the lock for a given hash at a time.

Tradeoff: hashtext is 64-bit — collisions are possible at ~10⁻¹⁰ probability per 10M keys per day. Acceptable for most workloads; security-critical paths use a full unique index instead. The lock must be held for the entire request including any external API calls (tens to hundreds of milliseconds) — watch pg_locks for pool exhaustion during incidents.

Approach	Atomic?	Collision risk	Notes
SELECT then INSERT	No	N/A	Unsafe — race window
INSERT ON CONFLICT DO NOTHING RETURNING	Yes	None	Recommended for most cases
pg_advisory_xact_lock	Yes	10⁻¹⁰ (hash collision)	Watch pool under slow external calls

Scaling the cache: Redis Cluster

A single Postgres primary bottlenecks at roughly 5–10k writes/sec on commodity hardware. At 50k req/sec, the idempotency write becomes the bottleneck.

Redis Cluster with SETNX:

SETNX idempotency:{key} {fingerprint}:{status} EX {ttl_seconds}

SETNX is atomic set-if-not-exists. Redis Cluster shards by key hash, spreading load across nodes. Each node handles ~50–100k SETNX/sec.

A single Postgres primary saturates near ~10k writes/sec; one Redis node handles ~100k SETNX/sec, so a 50k req/sec workload only fits once the cache shards across nodes.

Durability risk: Redis defaults to async fsync. A crash within milliseconds of a write can lose the entry. For a payment API, this means the key is gone and the next retry is treated as new — potential double charge.

Two-tier cache: the production pattern

Payment APIs that hold legal liability for double charges use a hybrid:

Hot path: Redis — SETNX on the key. If it succeeds (new key), process and write to Postgres asynchronously via the outbox. If it conflicts (existing key), read Redis for the cached response.
Cold path: Postgres — authoritative record. If Redis misses (rare, on crash), fall through to the Postgres table to recover the authoritative response.

Request → Redis SETNX
          ├─ New: process + store in Postgres async + cache in Redis
          ├─ Conflict: return cached response from Redis
          └─ Redis miss: read from Postgres → populate Redis → return

Stripe and Square use this hybrid in production. Pure Redis is acceptable for non-financial workloads (signups, analytics events) where a rare duplicate is logging noise rather than a compliance event.

▸Why this works

Why not DynamoDB for the idempotency cache? DynamoDB ConditionExpression handles atomic insert-or-noop natively and scales horizontally without sharding effort. At 50k/sec the cost is roughly $3k/month at standard pricing, and p99 latency is ~5–10ms. Good if you are already on AWS and want zero operational overhead; expensive and adds latency otherwise. The Redis + Postgres hybrid is cheaper and faster at comparable scale.

Quiz

Two requests with the same idempotency key arrive simultaneously. The naive SELECT-then-INSERT implementation runs. What is the worst-case outcome?

Quiz

At 50k requests/sec, why does Redis Cluster outperform a single Postgres primary for idempotency key lookups?

Quiz

A Redis Cluster node crashes milliseconds after a SETNX write. The key is lost. What is the consequence for a payment API using Redis as the sole idempotency store?

Redis gives throughput (~50–100k SETNX/s per node); Postgres guarantees durability. Neither alone is sufficient for financial idempotency keys.

Recall before you leave

01
Why is SELECT-then-INSERT not safe for idempotency key creation, and what is the correct Postgres fix?
02
Describe the two-tier idempotency cache architecture used by production payment APIs.
03
What is the tradeoff of using pg_advisory_xact_lock instead of ON CONFLICT for idempotency key serialization?

Recap

Concurrent idempotency key creation requires an atomic operation. Postgres INSERT ... ON CONFLICT (key) DO NOTHING RETURNING * is the standard fix — it is a single statement with no race window. A request that finds RETURNING empty lost the race and re-reads the row to return 409 or replay. At throughput above ~10k writes/sec, a Redis Cluster hot-path with SETNX handles the load; a Postgres cold-path fallback provides durability for financial workloads. In-memory-only dedup is never acceptable behind a load balancer. Now when you review an idempotency implementation that uses SELECT-then-INSERT, flag it immediately — the race window is real, and under mobile client retry bursts it fires routinely.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 5 done

Connected lessons

builds on

Server-side state machine: four states of an idempotency keymiddle

unlocks

Observability, production failures, and global-scale designsenior

deepens into

Observability, production failures, and global-scale designsenior

appears again in205

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.

Apply this

Put this lesson to work on a real build.

Mini OAuth 2.0 + PKCE loginImplement the authorization-code + PKCE flow end to end against a real provider, so you understand every redirect and token instead of trusting a library.Distributed rate limiterBuild a token-bucket limiter that holds across many app instances by keeping the counter in Redis, not in process memory.