awesome-everything RU
↑ Back to the climb

Performance

Cross-protocol N+1: HTTP fan-out and Redis MGET

Crux The N+1 shape appears in HTTP microservice fan-out, Redis key lookups, and gRPC streaming — the fix family is the same: collect, batch, send once.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at middle altitude — in the sky
◷ 11 min

A REST endpoint assembles a user profile by calling 8 downstream microservices — one for preferences, one for posts, one for notifications, one for billing, and four more. Each call takes 30 ms. Total latency: 240 ms. No single call is slow. The problem is that they are all serial.

The shape appears in every protocol

The N+1 pattern is not a database-only problem. Anywhere a program makes multiple small round-trips where one larger operation would suffice, the same cost multiplier applies.

The per-round-trip overhead differs by protocol and distance: Postgres on localhost ~0.5 ms, Redis same-host ~0.1 ms, HTTP intra-DC ~2 ms, HTTP cross-region ~50 ms. But the math is the same: N calls × per-call overhead = serial wall-clock dominates.

HTTP fan-out: call services in parallel

A profile aggregator calling 8 services serially:

// Serial — pays 8 × RTT in sequence:
const user = await userService.get(userId);
const posts = await postsService.get(userId);
const notifs = await notifService.get(userId);
// ... 5 more calls
// Total: ~240 ms if each call is 30 ms
// Parallel — wall-clock = max(latencies):
const [user, posts, notifs, ...rest] = await Promise.all([
  userService.get(userId),
  postsService.get(userId),
  notifService.get(userId),
  // ... 5 more
]);
// Total: ~35 ms (slowest call + small coordination overhead)

Promise.all (Node/JS), errgroup.Wait (Go), CompletableFuture.allOf (Java), asyncio.gather (Python) — the pattern is identical across runtimes.

The wall-clock changes from sum(latencies) to max(latencies).

For 8 calls at 30 ms each: 240 ms serial vs 35 ms parallel — a 7× improvement with one structural change.

ProtocolN+1 patternBatch fix
SQL / ORMLazy load per rowJOIN / IN / preload / DataLoader
HTTP microservicesSerial service callsPromise.all / errgroup / gather
RedisGET in a loopMGET / pipeline
gRPCUnary call per rowBatch RPC / server streaming
File I/Oopen/read/close per fileio_uring batched submission

Redis: MGET instead of GET in a loop

A cache layer that fetches 100 items one by one:

# 100 round-trips — GET in a loop:
items = keys.map { |k| redis.get(k) }

# One round-trip — MGET:
items = redis.mget(*keys)

# Or pipeline for conditional logic per item:
results = redis.pipelined { keys.each { |k| redis.get(k) } }

Redis RTT on the same host is typically 0.1–0.5 ms. A loop of 100 GETs costs 10–50 ms. One MGET costs 0.5–2 ms. The fix is a single command.

For conditional logic (where you need to act on each result before deciding to fetch the next), use pipelining instead of MGET: send all commands at once, receive all responses at once.

Service call dependencies — DAG dispatch

Some service calls depend on others. You cannot fully parallelize a dependency chain:

// Sequential dependency: posts need userId first
const userId = await authService.resolveToken(token);
// Then these can all run in parallel:
const [posts, notifs, billing] = await Promise.all([
  postsService.get(userId),
  notifService.get(userId),
  billingService.get(userId),
]);

The structure is a DAG (directed acyclic graph). Services with no dependencies start immediately; services that depend on earlier results wait only for their direct parents. Most fan-out APIs have shallow DAGs (1–2 dependency levels). Fully serial call chains are usually accidental and can be unwound by tracing the actual dependency relationships.

Why this works

The LinkedIn 2023 feed-aggregator incident: the service was calling 8 downstream microservices serially, ~60 ms each. p99 was 480 ms. After parallelising with errgroup, p99 dropped to ~80 ms — the slowest single call plus coordination overhead. This is the same class of fix as adding .includes to an ORM query, applied one protocol level up.

Governance: preventing serial fan-out from returning

Static analysis can catch the pattern before it ships:

  • JavaScript/Node: lint rule rejecting await inside a for loop over service calls.
  • Code review checklist item: “does this function call a remote service N times in a loop? If yes, flag it.”
  • Observability: per-service dashboard panel showing “fan-out factor” (downstream calls per inbound request). Alert when it grows.
  • Load-test assertions: trace assertions in load tests can fail PRs that increase fan-out beyond a threshold.
Quiz

A REST endpoint calls 8 downstream microservices serially, each taking 30 ms. Total p99 is 240 ms. What is the most direct structural fix?

Quiz

A loop calls redis.get(key) for each of 100 items. What is the Redis-native single-trip fix?

Recall before you leave
  1. 01
    Explain why Promise.all reduces HTTP fan-out latency and describe the wall-clock change.
  2. 02
    What is the Redis equivalent of SQL eager loading, and when would you use pipelining instead?
Recap

N+1 is a protocol-agnostic pattern: serial HTTP microservice calls, Redis GET in a loop, and gRPC unary call per row all pay round-trip overhead N times. For HTTP fan-out, Promise.all and its equivalents change serial sum(latencies) into parallel max(latencies). For Redis, MGET fetches multiple keys in one command. For gRPC, server streaming or batch RPC replaces per-row unary calls. The fix family is always the same: identify serial round-trips, collect the IDs or calls, send once, distribute results. The protocol changes; the shape and fix family do not.

Connected lessons
appears again in159
Continue the climb ↑N+1 at scale: pool exhaustion, plan changes, and denormalisation
shortcuts expand
search
K
prev piece
k
next piece
j
cycle tier
t
this menu
?
sources3
expand
  1. 01
  2. 02
  3. 03

Trademarks belong to their respective owners. Editorial reference only.