Performance PERF · 05 · 03

Detecting N+1: query logs, APM traces, and CI gates

N+1 hides from CPU profilers because the cost is off-CPU wait time, not compute. Query logs, APM waterfalls, and per-request counters surface it — and a CI gate prevents regressions.

PERF Middle ◷ 13 min

Level

FoundationsJuniorMiddleSenior

The page is 1.2 seconds. The CPU flame graph is flat. Your colleague insists the profiler must be wrong. The profiler is not wrong — you are looking at the wrong instrument. N+1 lives in the off-CPU wait time between queries, not in any function.

Why CPU profiles miss N+1

A CPU profiler samples stack traces of running threads. The N+1 problem does not burn CPU in any single location. Each query dispatches in microseconds; the program then waits for the database (off-CPU); the response returns fast; the next query dispatches. The wait is the bottleneck, but the wait is off-CPU.

The profiler sees fast blips with nothing wide. The cumulative cost is spread across many fast call sites — not concentrated in one slow function. A CPU flame graph will look healthy while the page takes 1.2 seconds.

The right instruments

1. Database query log

The most direct instrument. Every major database can log every query with its duration and a timestamp.

# Postgres — log everything in development
log_min_duration_statement = 0

# MySQL slow query log
slow_query_log = 1
long_query_time = 0

# ORM-level logging
# Rails: config.log_level = :debug (ActiveRecord logger prints each query)
# Django: LOGGING = { 'loggers': { 'django.db.backends': { 'level': 'DEBUG' } } }
# SQLAlchemy: echo=True

The N+1 signature is unmistakable: dozens to hundreds of similar queries for one request, often the same SELECT differing only in the WHERE id = ? parameter.

2. APM tracing — the waterfall view

Application performance monitoring tools (Datadog APM, Grafana Tempo, New Relic, Honeycomb with auto-instrumentation) capture database spans inline with code execution spans.

The N+1 waterfall looks like a tall column of short parallel bars — 50 or 100 short database spans all within the same request’s trace. One glance at the trace view makes the pattern obvious in a way that logs alone do not.

Alert when per-endpoint p95 span count grows more than 30% week-over-week.

3. Per-request query counter

Custom middleware that counts queries per request and either logs or alerts on excess:

# Rails — middleware counts queries per request
ActiveSupport::Notifications.subscribe('sql.active_record') do
  RequestStore[:query_count] ||= 0
  RequestStore[:query_count] += 1
end

after_action do
  if RequestStore[:query_count] > 20
    Rails.logger.warn "QUERY_COUNT #{params[:controller]}##{params[:action]}: #{RequestStore[:query_count]}"
  end
end

Expose as a Prometheus histogram by endpoint. Alert when endpoint p99 exceeds 20 or grows more than 50% week-over-week.

Instrument	Best for	Signature
CPU profiler	Compute hotspots	Wrong tool — misses N+1
Query log	Confirm N+1, find call site	Many similar SELECTs per request
APM trace	Visual waterfall, production	Tall column of short DB spans
Query counter	Alerting, per-endpoint trends	queries_per_request p99 > 20

The CI gate — catch at PR time

The most leveraged anti-N+1 mechanism is catching regressions at PR time instead of in production. Teams running this gate report 70–90% reduction in N+1 incidents reaching production. Initial setup: 1–2 weeks. Ongoing maintenance: trivial — the gate runs in CI.

Rails: the bullet gem raises on any lazy-loaded relation accessed during request render. A controller test that triggers N+1 fails.

# spec/spec_helper.rb
Bullet.enable = true
Bullet.raise = true  # turns N+1 into a test failure

Django: assertNumQueries(N) in the test framework counts exact queries per view:

def test_orders_page(self):
    with self.assertNumQueries(2):  # fails if N+1 sneaks in
        response = self.client.get('/orders/')

SQLAlchemy / Hibernate: enable query count statistics and assert per-test:

# SQLAlchemy — assert query count via events
# Hibernate — assert via Statistics.getPrepareStatementCount()

The pattern across all of them: PR triggers tests, tests fire requests, query counter exceeds the per-endpoint threshold, test fails, PR cannot merge.

▸Why this works

Off-CPU profiling (eBPF off-CPU profiles, Java async-profiler in wall-clock mode, JFR socket-wait events) is a fourth option. It shows the application spending most wall-time waiting on the database socket, not running code. Off-CPU profiling is the definitive instrument for the class of problem where “program is slow but CPU profile is healthy” — which is the N+1 signature.

Walking a real query log

# tail -f log/development.log | grep -E "SELECT|Loading"

Started GET "/dashboard"
User Load (0.4ms)    SELECT "users".* FROM "users" WHERE "users"."id" = 42
Project Load (0.6ms) SELECT "projects".* FROM "projects" WHERE "projects"."user_id" = 42 LIMIT 50
Task Load (0.3ms)    SELECT "tasks".* FROM "tasks" WHERE "tasks"."project_id" = 1
Task Load (0.3ms)    SELECT "tasks".* FROM "tasks" WHERE "tasks"."project_id" = 2
Task Load (0.4ms)    SELECT "tasks".* FROM "tasks" WHERE "tasks"."project_id" = 3
... (47 more Task Load lines)
Comment Load (0.2ms) SELECT "comments".* FROM "comments" WHERE "comments"."task_id" = 1
Comment Load (0.2ms) SELECT "comments".* FROM "comments" WHERE "comments"."task_id" = 2
... (243 more Comment Load lines)

Completed 200 OK in 980ms (Views: 250.3ms | ActiveRecord: 689.4ms)

Reading this: 980 ms total, 689 ms of which is ActiveRecord across 300+ queries. Two N+1 levels:

Level 2 (projects → tasks): 50 separate Task Load queries, one per project.
Level 3 (tasks → comments): 200–300 separate Comment Load queries, one per task.

73% of the request is off-CPU database wait spread across 300+ N+1 queries — which is exactly why a CPU profiler shows nothing.

Fix: projects.includes(tasks: :comments).limit(50) in Rails (or the equivalent in your ORM). From 300 queries to 4 — one for user, one for projects, one for tasks (IN), one for comments (IN). ActiveRecord time: 689 ms → ~15 ms.

Quiz

An APM trace shows 142 database spans per request on the /dashboard endpoint, while p99 is 1.8 s. The spans are short (1–3 ms each). What is the diagnosis?

Quiz

A PR adds a new serializer that accesses order.customer.name for each order in a list. Tests pass. What CI mechanism would have caught this N+1 before it shipped?

Order the steps

Order the layers of N+1 detection, from cheapest to find to hardest to find:

1 Development query log — enable ORM logger, count lines per request
2 CI gate — bullet/assertNumQueries fails PR before it ships
3 APM trace in staging — waterfall shows DB span columns
4 Production alert — queries_per_request p99 spikes on a live endpoint

CPU profile misses N+1 (off-CPU wait). Query log, APM waterfall, and a per-request counter surface it; a CI gate stops regressions at PR time.

Recall before you leave

01
Why does a CPU profile miss N+1, and what three instruments surface it correctly?
02
Describe the CI gate pattern for preventing N+1 regressions from shipping.

Recap

N+1 is off-CPU: the program waits between queries, not inside any function. That is why CPU profiles look healthy while the page is slow. The right instruments are the query log (repeated similar SELECTs), APM traces (tall column of short DB spans), and per-request query counters (alert at p99 > 20). The most leveraged prevention is a CI gate — Rails bullet or Django assertNumQueries — that fails PRs introducing lazy-load regressions before they reach production. Now when you see a flat CPU profile on a slow endpoint, you know to reach for the query log first — and when you ship new list views, a CI gate is the one thing that keeps N+1 from coming back.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 5 done

Connected lessons

builds on

Fix families: JOIN, IN, preload, and DataLoadermiddle

unlocks

N+1 at scale: pool exhaustion, plan changes, and denormalisationsenior

deepens into

appears again in162

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.