awesome-everything RU
↑ Back to the climb

APIs

DataLoader mechanics: tick-boundary batching

Crux DataLoader waits for the current event-loop tick to finish, then fires one batched query for every key queued across all resolvers — turning N per-item SQL calls into one.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at middle altitude — in the sky
◷ 13 min

After wiring DataLoader, Sven’s 51-query page drops to 2 queries — without changing the schema or the client request. The GraphQL document and all the resolver functions stay exactly the same. Only the fetcher changes.

What DataLoader does

DataLoader is a small library, originally extracted from Facebook’s GraphQL implementation and maintained under the GraphQL Foundation. It provides new DataLoader(batchLoadFn, options) where batchLoadFn(keys: Array<K>): Promise<Array<V | Error>> is your batched fetcher.

Resolvers call loader.load(key) and receive a Promise. Under the hood:

  1. Each .load(key) queues the key into an internal array. No query runs yet.
  2. When the current synchronous JavaScript turn ends — specifically when the microtask queue starts draining — DataLoader takes all queued keys and calls batchLoadFn once with the full array.
  3. batchLoadFn runs one WHERE id IN (...) query and returns the values in the same order as the input keys.
  4. DataLoader resolves each waiting Promise with its value.

The tick boundary is the natural join point: GraphQL’s execution engine finishes walking one level of the query synchronously. All 50 Post.author resolver calls queue their IDs during that level-traversal. DataLoader’s microtask fires after, when every ID has been collected.

Why a fixed time window would be worse

A 5 ms buffer would either add 5 ms of latency per request (waiting unnecessarily) or fire too early under load (capturing fewer than all IDs). The tick boundary fires at the earliest possible moment when all IDs for the current level are queued — zero unnecessary wait.

Event-loop tick (GraphQL resolves level 2):
  Post.author(post1) → loader.load(7)   // queued
  Post.author(post2) → loader.load(9)   // queued
  Post.author(post3) → loader.load(7)   // dedup: already queued

Microtask (tick ends):
  batchLoadFn([7, 9])
  → SELECT id, name FROM users WHERE id IN (7, 9)
  → resolves: 7→Bea, 9→Sven

All three Post.author Promises resolve: post1→Bea, post2→Sven, post3→Bea

Deduplication inside one request

If loader.load(7) is called twice in the same request, DataLoader returns the same Promise both times — the key appears only once in the batch. This is useful when the same entity is referenced from multiple paths in one document (e.g. a post’s author and a comment’s author resolving to the same user). Opt out with options.cache = false if you need fresh reads within one request (rare: write-after-read patterns).

The per-request instance rule

A DataLoader instance is a cache. Its lifetime must be the request, not the server process. A module-scope DataLoader shared across all requests:

  • Returns stale data: Request A loads user 7, the row is updated, Request B loads user 7 from the cache and gets the pre-update row.
  • Leaks across tenants: Request A and B belong to different tenants. Tenant B sees Tenant A’s cached row for the same ID.

The discipline: instantiate DataLoaders inside the request-context factory (the function Apollo Server calls per request), attach them to context, and let GC reclaim them when the request ends.

// Apollo Server context factory — correct
context: async ({ req }) => ({
  loaders: {
    author: new DataLoader(batchAuthors),
    tags:   new DataLoader(batchTags),
  },
})

// Resolver — correct
Post: {
  author: (post, _args, ctx) => ctx.loaders.author.load(post.authorId),
}
lesson.inset.warning

A global DataLoader saves memory in theory. In practice it leaks data across tenants and serves stale rows. Apollo’s docs are explicit: “DataLoader instances are per-request — if you use a DataLoader in your data source, ensure you create a new instance with every request.”

Quiz

DataLoader is invoked twice for the same key in the same request. What happens?

Quiz

Why does DataLoader batch on the event-loop tick boundary instead of a fixed 5 ms window?

Complete the analogy

Fill in the blank: DataLoader gathers all .load() calls made in the same event-loop _______, then fires one batch.

Recall before you leave
  1. 01
    Why must a DataLoader instance be created per request, not once at server start?
  2. 02
    When does DataLoader fire the batch function relative to the resolver calls?
Recap

DataLoader moves the fetch out of each resolver and into a per-request batch. Every resolver calls loader.load(id) and gets a Promise. DataLoader holds all queued IDs until the event-loop tick ends, then fires one WHERE id IN (...) query for the full set. Keys are deduplicated within the batch, so loader.load(7) called from two different paths in one document fires one SQL lookup, not two. The instance must be created per request — a global instance leaks tenant data and returns stale rows.

Connected lessons
appears again in178
Continue the climb ↑Batch function contracts: ordering, shapes, errors
shortcuts expand
search
K
prev piece
k
next piece
j
cycle tier
t
this menu
?
sources3
expand
  1. 01
  2. 02
  3. 03

Trademarks belong to their respective owners. Editorial reference only.