Performance PERF · 05 · 05

Cross-protocol N+1: HTTP fan-out and Redis MGET

The N+1 shape appears in HTTP microservice fan-out, Redis key lookups, and gRPC streaming — the fix family is the same: collect, batch, send once.

PERF Middle ◷ 11 min

Level

FoundationsJuniorMiddleSenior

A REST endpoint assembles a user profile by calling 8 downstream microservices — one for preferences, one for posts, one for notifications, one for billing, and four more. Each call takes 30 ms. Total latency: 240 ms. No single call is slow. The problem is that they are all serial.

The shape appears in every protocol

The N+1 pattern is not a database-only problem. Anywhere a program makes multiple small round-trips where one larger operation would suffice, the same cost multiplier applies.

The per-round-trip overhead differs by protocol and distance: Postgres on localhost ~0.5 ms, Redis same-host ~0.1 ms, HTTP intra-DC ~2 ms, HTTP cross-region ~50 ms. But the math is the same: N calls × per-call overhead = serial wall-clock dominates.

HTTP fan-out: call services in parallel

A profile aggregator calling 8 services serially:

// Serial — pays 8 × RTT in sequence:
const user = await userService.get(userId);
const posts = await postsService.get(userId);
const notifs = await notifService.get(userId);
// ... 5 more calls
// Total: ~240 ms if each call is 30 ms

// Parallel — wall-clock = max(latencies):
const [user, posts, notifs, ...rest] = await Promise.all([
  userService.get(userId),
  postsService.get(userId),
  notifService.get(userId),
  // ... 5 more
]);
// Total: ~35 ms (slowest call + small coordination overhead)

Promise.all (Node/JS), errgroup.Wait (Go), CompletableFuture.allOf (Java), asyncio.gather (Python) — the pattern is identical across runtimes.

The wall-clock changes from sum(latencies) to max(latencies).

For 8 calls at 30 ms each: 240 ms serial vs 35 ms parallel — a 7× improvement with one structural change.

Serial fan-out pays the sum of every call's latency; parallel dispatch pays only the slowest call. Same 8 calls, 240 ms → 35 ms (7×).

Protocol	N+1 pattern	Batch fix
SQL / ORM	Lazy load per row	JOIN / IN / preload / DataLoader
HTTP microservices	Serial service calls	Promise.all / errgroup / gather
Redis	GET in a loop	MGET / pipeline
gRPC	Unary call per row	Batch RPC / server streaming
File I/O	open/read/close per file	io_uring batched submission

Redis: MGET instead of GET in a loop

A cache layer that fetches 100 items one by one:

# 100 round-trips — GET in a loop:
items = keys.map { |k| redis.get(k) }

# One round-trip — MGET:
items = redis.mget(*keys)

# Or pipeline for conditional logic per item:
results = redis.pipelined { keys.each { |k| redis.get(k) } }

Redis RTT on the same host is typically 0.1–0.5 ms. A loop of 100 GETs costs 10–50 ms. One MGET costs 0.5–2 ms. The fix is a single command.

For conditional logic (where you need to act on each result before deciding to fetch the next), use pipelining instead of MGET: send all commands at once, receive all responses at once.

Service call dependencies — DAG dispatch

Some service calls depend on others. You cannot fully parallelize a dependency chain:

// Sequential dependency: posts need userId first
const userId = await authService.resolveToken(token);
// Then these can all run in parallel:
const [posts, notifs, billing] = await Promise.all([
  postsService.get(userId),
  notifService.get(userId),
  billingService.get(userId),
]);

The structure is a DAG (directed acyclic graph). Services with no dependencies start immediately; services that depend on earlier results wait only for their direct parents. Most fan-out APIs have shallow DAGs (1–2 dependency levels). Fully serial call chains are usually accidental and can be unwound by tracing the actual dependency relationships.

▸Why this works

The LinkedIn 2023 feed-aggregator incident: the service was calling 8 downstream microservices serially, ~60 ms each. p99 was 480 ms. After parallelising with errgroup, p99 dropped to ~80 ms — the slowest single call plus coordination overhead. This is the same class of fix as adding .includes to an ORM query, applied one protocol level up.

Governance: preventing serial fan-out from returning

Static analysis can catch the pattern before it ships:

JavaScript/Node: lint rule rejecting await inside a for loop over service calls.
Code review checklist item: “does this function call a remote service N times in a loop? If yes, flag it.”
Observability: per-service dashboard panel showing “fan-out factor” (downstream calls per inbound request). Alert when it grows.
Load-test assertions: trace assertions in load tests can fail PRs that increase fan-out beyond a threshold.

Quiz

A REST endpoint calls 8 downstream microservices serially, each taking 30 ms. Total p99 is 240 ms. What is the most direct structural fix?

Quiz

A loop calls redis.get(key) for each of 100 items. What is the Redis-native single-trip fix?

100 GET key calls = 100 round-trips (~10–50 ms). One MGET batches every key into a single round-trip (~0.5–2 ms). Same shape as SQL IN.

Recall before you leave

01
Explain why Promise.all reduces HTTP fan-out latency and describe the wall-clock change.
02
What is the Redis equivalent of SQL eager loading, and when would you use pipelining instead?

Recap

N+1 is a protocol-agnostic pattern: serial HTTP microservice calls, Redis GET in a loop, and gRPC unary call per row all pay round-trip overhead N times. For HTTP fan-out, Promise.all and its equivalents change serial sum(latencies) into parallel max(latencies). For Redis, MGET fetches multiple keys in one command. For gRPC, server streaming or batch RPC replaces per-row unary calls. The fix family is always the same: identify serial round-trips, collect the IDs or calls, send once, distribute results. Now when you see an await inside a loop making remote calls — to any service, any cache, any database — you know what to ask: can these be batched or fired in parallel? Most of the time, they can.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 5 done

Connected lessons

builds on

Fix families: JOIN, IN, preload, and DataLoadermiddle

deepens into

appears again in162

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.