Performance PERF · 05 · 02

Fix families: JOIN, IN, preload, and DataLoader

The four structural fixes for N+1 — JOIN, IN-based batching, ORM eager loading, and DataLoader — and how to pick the right one by relationship cardinality and data shape.

PERF Middle ◷ 14 min

Level

FoundationsJuniorMiddleSenior

You have confirmed an N+1 in the query log: 50 customer queries for one orders page. There are four ways to fix it, and they have different tradeoffs. Picking the wrong one for your data shape can make things worse.

The four fix families

1. JOIN — one query, one trip

Fetch parent and child in a single SQL query using a JOIN:

SELECT o.*, c.name AS customer_name
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.user_id = 42
LIMIT 50;

One round-trip. The database returns everything merged into one result set.

The tradeoff: for one-to-many relationships, the result set duplicates parent columns for every child row. Fifty orders with five line items each produces 250 rows where 200 repeat the same order columns. Network bandwidth and ORM parsing cost can offset the round-trip savings. JOIN works best for one-to-one or small one-to-many (1–5 children per parent).

2. IN — two queries, no duplication

Two round-trips: fetch parents first, then fetch all children using WHERE id IN (...):

-- Query 1: get all orders
SELECT * FROM orders WHERE user_id = 42 LIMIT 50;

-- Query 2: get all customers for those orders
SELECT * FROM customers WHERE id IN (1, 2, 3, ..., 50);

No row duplication. The parent result is 50 rows; the children result is up to 50 rows — each row of data appears exactly once.

The tradeoff: the IN list has practical limits. Postgres handles 10 000+ items with growing planner cost; MySQL has a hard limit at max_allowed_packet. For most pagination (10–1000 items), IN is the right default for one-to-many.

3. Preload / eager loading — ORM directive

All major ORMs ship a directive that declares up front what relationships to load. The ORM picks JOIN or IN internally:

# Rails — .includes picks strategy
Order.where(user_id: current_user).includes(:customer).limit(50)

# Django
Order.objects.filter(user=request.user).select_related('customer')[:50]

# SQLAlchemy
session.query(Order).options(selectinload(Order.customer)).filter_by(...).limit(50)

# Prisma
prisma.order.findMany({ where: {...}, include: { customer: true } })

All of these produce 2 queries instead of 51. You declare intent; the ORM optimises tactics.

The tradeoff: if you preload too aggressively — loading relationships the request never uses — you waste I/O. Declare only what the current request needs.

4. DataLoader — request-scoped batching

DataLoader (Facebook, 2015) queues lookup IDs across the entire request and fires one batched query when the event loop yields to its next tick:

// Create once per request
const userLoader = new DataLoader(async (ids) => {
  const users = await db.user.findMany({ where: { id: { in: ids } } });
  return ids.map(id => users.find(u => u.id === id));
});

// Call from anywhere in the request — each call queues the id
const user = await userLoader.load(orderId);

DataLoader provides three properties: (1) automatic batching — many load(id) calls become one query, (2) automatic caching — repeated load(id) within the same request returns the cached result without a second query, (3) request scope — the cache is scoped to the request, so stale data does not leak.

The tradeoff: DataLoader requires an async / event-loop runtime. It adds caching complexity and a small batch-window latency (typically under 1 ms). It is the right tool when data needs are scattered across many unrelated code paths in the same request — the canonical case being GraphQL resolvers.

Fix	Queries	Best for	Tradeoff
JOIN	1	One-to-one, small one-to-many	Parent columns duplicated per child row
IN	2	One-to-many fan-out	IN list limits (~10 k on Postgres)
Preload	2–3	Known shape at query site	Over-fetches if unused
DataLoader	1 per type	Multi-source, GraphQL resolvers	Async runtime required; cache complexity

Choosing by cardinality

When you are deciding which fix to reach for, start with one question: how many children does each parent typically have? That number alone narrows the choice to one or two options.

One-to-one → JOIN. No duplication. One trip.
One-to-many (small fan-out) → preload. ORM picks JOIN internally. Simple to write.
One-to-many (heavy fan-out) → IN-based selectinload. Avoids result-set bloat.
Multi-source or GraphQL → DataLoader. Batches across unrelated code paths.

▸Why this works

Why does IN beat JOIN for one-to-many heavy fan-out? A query returning 50 orders with 20 line items each as JOIN returns 1 000 rows where 950 columns repeat the same order data. The network payload and ORM parsing cost can outweigh the round-trip savings of skipping the second query. The IN approach sends 2 queries but no duplication: 50 order rows and 1 000 line-item rows each appear exactly once.

For heavy one-to-many fan-out, IN's extra round-trip buys zero row duplication — the bandwidth and parse cost JOIN pays for 950 redundant rows is why IN wins.

Practical example: a nested eager load

# Rails nested eager load:
User.find(42).projects.includes(tasks: :comments).limit(50)
# Produces:
#   SELECT * FROM projects WHERE user_id = 42 LIMIT 50
#   SELECT * FROM tasks WHERE project_id IN (1, 2, ..., 50)
#   SELECT * FROM comments WHERE task_id IN (...)
# 3 queries instead of 1 + 50 + 250 = 301

# Django equivalent:
Project.objects.filter(user=user).prefetch_related('tasks__comments')[:50]

# SQLAlchemy:
session.query(Project).options(
    selectinload(Project.tasks).selectinload(Task.comments)
).filter_by(user_id=42).limit(50)

Quiz

A page loads 50 blog posts, each with a list of tags (average 10 tags per post). Which fix strategy avoids result-set duplication while keeping the query count low?

Quiz

A GraphQL query has 'me { posts { author { name } } }' on 50 posts and triggers 1 + 50 author lookups. What is the canonical fix?

Order the steps

Order the steps of diagnosing and fixing an N+1 the senior way:

1 Enable DB query log or APM trace, count queries per request
2 Identify the call site — often a loop over parents accessing a child relation lazily
3 Pick the fix family by cardinality: one-to-one → JOIN; one-to-many → IN/selectinload; multi-source → DataLoader
4 Apply the ORM directive or batch loader
5 Verify by re-reading the query log: count should drop to 2–5 from N+1
6 Re-measure p99 — expect 5–20x improvement for a typical N+1 fix on a list page
7 Add CI gate: per-request query count budget, fail PRs that regress

JOIN, IN, preload, and DataLoader are four shapes of one idea: fewer round-trips. Cardinality and data source pick the one.

Recall before you leave

01
Walk through the four N+1 fix families with one example of when each is the right choice.
02
Why does IN-based fetching often beat JOIN for one-to-many relationships?

Recap

The four N+1 fix families are JOIN (one trip, best for one-to-one), IN-based batching (two trips, no duplication, best for one-to-many fan-out), ORM preload (declares intent at the query site), and DataLoader (request-scoped batching across many code paths, canonical for GraphQL). Cardinality drives the choice: one-to-one favors JOIN; heavy one-to-many favors IN or selectinload; multi-source resolvers favor DataLoader. Now when you face an N+1 in code review or a production slowdown, you can name the fix before you write a single line: pick by cardinality, apply the ORM directive or DataLoader, and verify by query count — not just p99.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 5 done

Connected lessons

builds on

N+1: one logical operation, many round-tripsjunior

unlocks

deepens into

appears again in162

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.