awesome-everything RU
↑ Back to the climb

Performance

N+1: diagnose, batch, and gate

Crux Hands-on project — diagnose and eliminate N+1 across an ORM list page, a GraphQL resolver tree, and a service fan-out, then lock it in with a CI query-count gate.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at senior altitude — in orbit
◷ 240 min

Reading about N+1 is not the same as pulling 300 queries out of one page load. Build a small service that exhibits N+1 in three forms — an ORM list page, a GraphQL resolver tree, and a service fan-out — then drive the query count down and gate it so it can never come back.

Goal

Turn the unit’s model into a reproducible loop: count round-trips per request, pick the fix family by cardinality and lookup origin, prove the count dropped from the query log, then add a CI gate and a DoS guard so the regression cannot ship.

Project
0 of 8
Objective

Take a small service (your own or the starter shape below) with three deliberate N+1 sites — an ORM list endpoint, a nested GraphQL query, and a serial service/cache fan-out — and bring each one's round-trip count to its structural minimum, proving every step with the query log and before/after numbers.

Requirements
Acceptance criteria
  • A before/after table per site: queries per request, p99 latency, and (for the fan-out) wall-clock — measured from the query log and a load test, not estimated.
  • The query log after each fix shows the round-trip count at its structural minimum (2–4 for the ORM page, one per type for DataLoader, one batched/parallel dispatch for the fan-out).
  • The DataLoader batch function is verified to return results in input-id order with nulls for missing ids, and its cache is request-scoped (no cross-request leakage).
  • The CI gate is demonstrated failing a PR that reintroduces an N+1, then passing once the fix is restored.
  • A one-paragraph write-up naming which fix family was used at each site and why it beat the alternatives for that cardinality and lookup origin.
Senior stretch
  • Add a DoS-amplification guard to the GraphQL endpoint — depth limit, query-complexity ceiling, and a per-request query budget — and show a crafted deeply-nested query is rejected before it hits the database.
  • Reproduce a connection-pool exhaustion incident: set a small pool, load-test the N+1 version to 503s, then show the fixed version sustaining ~10× the throughput on the same pool, with pool-saturation metrics before/after.
  • Run EXPLAIN ANALYZE on the new IN-list query at 10, 500, and 5000 parent ids and report whether the plan flips (index → bitmap → seq scan); document any size where the fix degrades.
  • Add an APM trace (Tempo/Datadog/Honeycomb or OpenTelemetry) and capture the waterfall before and after — the tall column of short DB spans collapsing into a few — as visual evidence alongside the numbers.
Recap

This is the loop you will run in every real N+1 incident: instrument the query count first, match the fix family to the cardinality and lookup origin (batch the ORM page, DataLoader the resolver tree, parallelise the fan-out), prove the round-trip count dropped from the log, then gate it in CI so the regression cannot reship. Doing it once across all three protocol shapes makes the production version muscle memory — and the CI gate is what keeps the win from eroding the next quarter.

Continue the climb ↑Batching: amortize fixed cost per operation
shortcuts expand
search
K
prev piece
k
next piece
j
cycle tier
t
this menu
?
sources3
expand
  1. 01
  2. 02
  3. 03

Trademarks belong to their respective owners. Editorial reference only.