awesome-everything RU
↑ Back to the climb

Performance

DataLoader: batching across resolver trees

Crux DataLoader queues lookup IDs across an entire request and fires one batched query per type when the event loop yields — the canonical fix for GraphQL N+1 and multi-source fan-out.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at middle altitude — in the sky
◷ 13 min

A GraphQL query for me { posts { author { name } } } on 50 posts triggers 1 + 50 author lookups. The ORM eager-load approach cannot help here — the data need is scattered across 50 independent resolver calls, not concentrated at one query site. DataLoader is the structural fix.

Why ORM eager loading is not enough for GraphQL

ORM eager loading (includes, select_related, joinedload) works by declaring relationships at the query construction site. You write one query with relationships declared up front. In a GraphQL server, there is no single query construction site — each resolver runs independently for each parent object. There is no natural place to say “by the way, I will also need the author for all of these posts.”

When 50 post resolvers each call db.user.findUnique({ where: { id: authorId } }), the ORM does not know they are all asking for the same kind of data. It fires 50 queries.

DataLoader solves this by moving batching to the request scope, not the query scope.

How DataLoader works

Facebook open-sourced DataLoader in 2015 alongside GraphQL.js. The mechanism:

  1. A DataLoader instance is created once per request (not per call).
  2. Any code in the request calls loader.load(id), which returns a Promise and queues the id internally — it does not execute a query.
  3. When the current synchronous work finishes and the event loop reaches its next tick, the loader fires one batched query: SELECT * FROM users WHERE id IN (all queued ids).
  4. The loader distributes results to each waiting Promise in order.
// Create once per request (e.g., in GraphQL context)
const userLoader = new DataLoader(async (ids) => {
  // ids is the batch: all IDs queued since last tick
  const users = await db.user.findMany({
    where: { id: { in: ids } },
  });
  // Must return results in the same order as ids
  return ids.map(id => users.find(u => u.id === id) ?? null);
});

// In a Post resolver — called 50 times for 50 posts
const resolveAuthor = async (post) => {
  return userLoader.load(post.authorId);
  // Does NOT query immediately — queues the ID
};
// After all 50 resolvers have queued their IDs,
// DataLoader fires one query:
// SELECT * FROM users WHERE id IN (1, 2, ..., 50)

Three properties DataLoader provides:

  1. Automatic batching — many load(id) calls in one event-loop tick become one query.
  2. Automatic cachingload(id) called twice in the same request returns the cached result without a second query.
  3. Request scope — the cache is scoped to the request object, so stale data does not leak between requests.
StepWhat happensQueries fired
Resolver calls load(1)ID 1 queued; Promise returned0
Resolver calls load(2) … load(50)IDs 2–50 queued0
Event loop ticksLoader fires batch query1
Results arriveEach Promise resolves with its row0

GraphQL four-level N+1

The shape compounds with nesting depth. For a query me { teams { projects { members { name } } } }:

  • teams resolver: 1 query
  • projects resolver (per team): N queries
  • members resolver (per project): N×M queries
  • name: included in members, no extra queries

Total without DataLoader: 1 + N + (N×M) queries. With DataLoader per type: 4 queries total — one per type per request.

// Each loader batches one type
const teamLoader = new DataLoader(ids => batchLoadTeams(ids));
const projectLoader = new DataLoader(ids => batchLoadProjects(ids));
const memberLoader = new DataLoader(ids => batchLoadMembers(ids));

// Each resolver just calls the loader:
const resolveTeams = (user) => teamLoader.load(user.id);
const resolveProjects = (team) => projectLoader.load(team.id);
const resolveMembers = (project) => memberLoader.load(project.id);

Impact: query count drops 100–1000x depending on fan-out depth. p99 drops from 1.4 s to ~150 ms for a typical four-level query.

DataLoader vs ORM eager loading: when to use each

DataLoader is more powerful and more complex than ORM eager loading. Choose based on where the data needs originate:

  • Known shape at query site → ORM eager loading. One declaration, ORM handles it.
  • Data needs scattered across many code paths → DataLoader. Batches across resolver trees or module boundaries.

The deciding question: “do I know at one place in the code what all the data needs are?” If yes, eager load. If no, DataLoader.

Why this works

DataLoader is fundamentally tied to async / promise-based runtimes because it relies on event-loop ticks to trigger batching. Synchronous codebases need explicit batch-coordination: collect IDs in a first pass, query once, distribute in a second pass. Many languages now have DataLoader ports: graphql-java/dataloader (Java), aiodataloader (Python asyncio), DataLoader.NET (C#), graphql-dataloader (Go).

Quiz

A GraphQL resolver calls userLoader.load(post.authorId) 50 times for 50 posts. How many database queries does DataLoader fire?

Quiz

A team is building a REST API endpoint that assembles data from three database tables by ID lookups scattered across three helper modules. Which tool is best suited?

Order the steps

Order the events in a DataLoader batch cycle, from first call to resolved promises:

  1. 1 Resolver calls loader.load(id) — Promise returned, ID queued
  2. 2 More load() calls from other resolvers — IDs accumulate in the batch
  3. 3 Current synchronous work completes; event loop ticks
  4. 4 DataLoader fires: SELECT * WHERE id IN (all queued IDs)
  5. 5 Results arrive; each Promise resolves with its corresponding row
Recall before you leave
  1. 01
    Walk through how DataLoader differs from ORM eager loading and when each is the right tool.
  2. 02
    A GraphQL query has four levels of nesting: me { teams { projects { members } } }. Explain how DataLoader reduces query count.
Recap

DataLoader batches ID lookups across an entire request into one query per type, firing when the event loop ticks. Unlike ORM eager loading, which must be declared at a single query site, DataLoader works across scattered code paths — making it the canonical fix for GraphQL resolver N+1, where each resolver independently requests related data. It provides three guarantees: automatic batching, request-scope caching, and no stale-data leakage across requests. The DataLoader pattern is async-first; synchronous codebases need explicit two-pass collect-then-query coordination instead.

Connected lessons
appears again in159
Continue the climb ↑Cross-protocol N+1: HTTP fan-out and Redis MGET
shortcuts expand
search
K
prev piece
k
next piece
j
cycle tier
t
this menu
?
sources3
expand
  1. 01
  2. 02
  3. 03

Trademarks belong to their respective owners. Editorial reference only.