Performance PERF · 05 · 04

DataLoader: batching across resolver trees

DataLoader queues lookup IDs across an entire request and fires one batched query per type when the event loop yields — the canonical fix for GraphQL N+1 and multi-source fan-out.

PERF Middle ◷ 13 min

Level

FoundationsJuniorMiddleSenior

A GraphQL query for me { posts { author { name } } } on 50 posts triggers 1 + 50 author lookups. The ORM eager-load approach cannot help here — the data need is scattered across 50 independent resolver calls, not concentrated at one query site. DataLoader is the structural fix.

Why ORM eager loading is not enough for GraphQL

ORM eager loading (includes, select_related, joinedload) works by declaring relationships at the query construction site. You write one query with relationships declared up front. In a GraphQL server, there is no single query construction site — each resolver runs independently for each parent object. There is no natural place to say “by the way, I will also need the author for all of these posts.”

When 50 post resolvers each call db.user.findUnique({ where: { id: authorId } }), the ORM does not know they are all asking for the same kind of data. It fires 50 queries.

DataLoader solves this by moving batching to the request scope, not the query scope.

How DataLoader works

Facebook open-sourced DataLoader in 2015 alongside GraphQL.js. The mechanism:

A DataLoader instance is created once per request (not per call).
Any code in the request calls loader.load(id), which returns a Promise and queues the id internally — it does not execute a query.
When the current synchronous work finishes and the event loop reaches its next tick, the loader fires one batched query: SELECT * FROM users WHERE id IN (all queued ids).
The loader distributes results to each waiting Promise in order.

// Create once per request (e.g., in GraphQL context)
const userLoader = new DataLoader(async (ids) => {
  // ids is the batch: all IDs queued since last tick
  const users = await db.user.findMany({
    where: { id: { in: ids } },
  });
  // Must return results in the same order as ids
  return ids.map(id => users.find(u => u.id === id) ?? null);
});

// In a Post resolver — called 50 times for 50 posts
const resolveAuthor = async (post) => {
  return userLoader.load(post.authorId);
  // Does NOT query immediately — queues the ID
};
// After all 50 resolvers have queued their IDs,
// DataLoader fires one query:
// SELECT * FROM users WHERE id IN (1, 2, ..., 50)

Three properties DataLoader provides:

Automatic batching — many load(id) calls in one event-loop tick become one query.
Automatic caching — load(id) called twice in the same request returns the cached result without a second query.
Request scope — the cache is scoped to the request object, so stale data does not leak between requests.

Together these three properties mean that any code in the request can call load(id) freely, from any module, without coordinating — and the loader guarantees that the database sees exactly one query per type per request.

Step	What happens	Queries fired
Resolver calls load(1)	ID 1 queued; Promise returned	0
Resolver calls load(2) … load(50)	IDs 2–50 queued	0
Event loop ticks	Loader fires batch query	1
Results arrive	Each Promise resolves with its row	0

GraphQL four-level N+1

The shape compounds with nesting depth. For a query me { teams { projects { members { name } } } }:

teams resolver: 1 query
projects resolver (per team): N queries
members resolver (per project): N×M queries
name: included in members, no extra queries

Total without DataLoader: 1 + N + (N×M) queries. With DataLoader per type: 4 queries total — one per type per request.

// Each loader batches one type
const teamLoader = new DataLoader(ids => batchLoadTeams(ids));
const projectLoader = new DataLoader(ids => batchLoadProjects(ids));
const memberLoader = new DataLoader(ids => batchLoadMembers(ids));

// Each resolver just calls the loader:
const resolveTeams = (user) => teamLoader.load(user.id);
const resolveProjects = (team) => projectLoader.load(team.id);
const resolveMembers = (project) => memberLoader.load(project.id);

Impact: query count drops 100–1000x depending on fan-out depth. p99 drops from 1.4 s to ~150 ms for a typical four-level query.

DataLoader vs ORM eager loading: when to use each

DataLoader is more powerful and more complex than ORM eager loading. Choose based on where the data needs originate:

Known shape at query site → ORM eager loading. One declaration, ORM handles it.
Data needs scattered across many code paths → DataLoader. Batches across resolver trees or module boundaries.

The deciding question: “do I know at one place in the code what all the data needs are?” If yes, eager load. If no, DataLoader.

The deciding question is where the data needs originate: known at one query site → ORM eager loading; scattered across resolver trees or modules → DataLoader.

▸Why this works

DataLoader is fundamentally tied to async / promise-based runtimes because it relies on event-loop ticks to trigger batching. Synchronous codebases need explicit batch-coordination: collect IDs in a first pass, query once, distribute in a second pass. Many languages now have DataLoader ports: graphql-java/dataloader (Java), aiodataloader (Python asyncio), DataLoader.NET (C#), graphql-dataloader (Go).

Quiz

A GraphQL resolver calls userLoader.load(post.authorId) 50 times for 50 posts. How many database queries does DataLoader fire?

Quiz

A team is building a REST API endpoint that assembles data from three database tables by ID lookups scattered across three helper modules. Which tool is best suited?

Order the steps

Order the events in a DataLoader batch cycle, from first call to resolved promises:

1 Resolver calls loader.load(id) — Promise returned, ID queued
2 More load() calls from other resolvers — IDs accumulate in the batch
3 Current synchronous work completes; event loop ticks
4 DataLoader fires: SELECT * WHERE id IN (all queued IDs)
5 Results arrive; each Promise resolves with its corresponding row

50 resolver load(id) calls only queue IDs — 0 queries. On the next event-loop tick, DataLoader fires one SELECT ... WHERE id IN (...), then distributes rows.

Recall before you leave

01
Walk through how DataLoader differs from ORM eager loading and when each is the right tool.
02
A GraphQL query has four levels of nesting: me { teams { projects { members } } }. Explain how DataLoader reduces query count.

Recap

DataLoader batches ID lookups across an entire request into one query per type, firing when the event loop ticks. Unlike ORM eager loading, which must be declared at a single query site, DataLoader works across scattered code paths — making it the canonical fix for GraphQL resolver N+1, where each resolver independently requests related data. It provides three guarantees: automatic batching, request-scope caching, and no stale-data leakage across requests. Now when you see a GraphQL resolver calling the database per parent object, you know the answer: create a DataLoader per type per request, call load(id) inside the resolver, and let the event loop do the batching for you.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 7 done

Connected lessons

builds on

Fix families: JOIN, IN, preload, and DataLoadermiddle

unlocks

DataLoader mechanics: tick-boundary batchingmiddle

deepens into

appears again in162

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.