awesome-everything RU
↑ Back to the climb

Performance

Detecting N+1: query logs, APM traces, and CI gates

Crux N+1 hides from CPU profilers because the cost is off-CPU wait time, not compute. Query logs, APM waterfalls, and per-request counters surface it — and a CI gate prevents regressions.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at middle altitude — in the sky
◷ 13 min

The page is 1.2 seconds. The CPU flame graph is flat. Your colleague insists the profiler must be wrong. The profiler is not wrong — you are looking at the wrong instrument. N+1 lives in the off-CPU wait time between queries, not in any function.

Why CPU profiles miss N+1

A CPU profiler samples stack traces of running threads. The N+1 problem does not burn CPU in any single location. Each query dispatches in microseconds; the program then waits for the database (off-CPU); the response returns fast; the next query dispatches. The wait is the bottleneck, but the wait is off-CPU.

The profiler sees fast blips with nothing wide. The cumulative cost is spread across many fast call sites — not concentrated in one slow function. A CPU flame graph will look healthy while the page takes 1.2 seconds.

The right instruments

1. Database query log

The most direct instrument. Every major database can log every query with its duration and a timestamp.

# Postgres — log everything in development
log_min_duration_statement = 0

# MySQL slow query log
slow_query_log = 1
long_query_time = 0

# ORM-level logging
# Rails: config.log_level = :debug (ActiveRecord logger prints each query)
# Django: LOGGING = { 'loggers': { 'django.db.backends': { 'level': 'DEBUG' } } }
# SQLAlchemy: echo=True

The N+1 signature is unmistakable: dozens to hundreds of similar queries for one request, often the same SELECT differing only in the WHERE id = ? parameter.

2. APM tracing — the waterfall view

Application performance monitoring tools (Datadog APM, Grafana Tempo, New Relic, Honeycomb with auto-instrumentation) capture database spans inline with code execution spans.

The N+1 waterfall looks like a tall column of short parallel bars — 50 or 100 short database spans all within the same request’s trace. One glance at the trace view makes the pattern obvious in a way that logs alone do not.

Alert when per-endpoint p95 span count grows more than 30% week-over-week.

3. Per-request query counter

Custom middleware that counts queries per request and either logs or alerts on excess:

# Rails — middleware counts queries per request
ActiveSupport::Notifications.subscribe('sql.active_record') do
  RequestStore[:query_count] ||= 0
  RequestStore[:query_count] += 1
end

after_action do
  if RequestStore[:query_count] > 20
    Rails.logger.warn "QUERY_COUNT #{params[:controller]}##{params[:action]}: #{RequestStore[:query_count]}"
  end
end

Expose as a Prometheus histogram by endpoint. Alert when endpoint p99 exceeds 20 or grows more than 50% week-over-week.

InstrumentBest forSignature
CPU profilerCompute hotspotsWrong tool — misses N+1
Query logConfirm N+1, find call siteMany similar SELECTs per request
APM traceVisual waterfall, productionTall column of short DB spans
Query counterAlerting, per-endpoint trendsqueries_per_request p99 > 20

The CI gate — catch at PR time

The most leveraged anti-N+1 mechanism is catching regressions at PR time instead of in production. Teams running this gate report 70–90% reduction in N+1 incidents reaching production. Initial setup: 1–2 weeks. Ongoing maintenance: trivial — the gate runs in CI.

Rails: the bullet gem raises on any lazy-loaded relation accessed during request render. A controller test that triggers N+1 fails.

# spec/spec_helper.rb
Bullet.enable = true
Bullet.raise = true  # turns N+1 into a test failure

Django: assertNumQueries(N) in the test framework counts exact queries per view:

def test_orders_page(self):
    with self.assertNumQueries(2):  # fails if N+1 sneaks in
        response = self.client.get('/orders/')

SQLAlchemy / Hibernate: enable query count statistics and assert per-test:

# SQLAlchemy — assert query count via events
# Hibernate — assert via Statistics.getPrepareStatementCount()

The pattern across all of them: PR triggers tests, tests fire requests, query counter exceeds the per-endpoint threshold, test fails, PR cannot merge.

Why this works

Off-CPU profiling (eBPF off-CPU profiles, Java async-profiler in wall-clock mode, JFR socket-wait events) is a fourth option. It shows the application spending most wall-time waiting on the database socket, not running code. Off-CPU profiling is the definitive instrument for the class of problem where “program is slow but CPU profile is healthy” — which is the N+1 signature.

Walking a real query log

# tail -f log/development.log | grep -E "SELECT|Loading"

Started GET "/dashboard"
User Load (0.4ms)    SELECT "users".* FROM "users" WHERE "users"."id" = 42
Project Load (0.6ms) SELECT "projects".* FROM "projects" WHERE "projects"."user_id" = 42 LIMIT 50
Task Load (0.3ms)    SELECT "tasks".* FROM "tasks" WHERE "tasks"."project_id" = 1
Task Load (0.3ms)    SELECT "tasks".* FROM "tasks" WHERE "tasks"."project_id" = 2
Task Load (0.4ms)    SELECT "tasks".* FROM "tasks" WHERE "tasks"."project_id" = 3
... (47 more Task Load lines)
Comment Load (0.2ms) SELECT "comments".* FROM "comments" WHERE "comments"."task_id" = 1
Comment Load (0.2ms) SELECT "comments".* FROM "comments" WHERE "comments"."task_id" = 2
... (243 more Comment Load lines)

Completed 200 OK in 980ms (Views: 250.3ms | ActiveRecord: 689.4ms)

Reading this: 980 ms total, 689 ms of which is ActiveRecord across 300+ queries. Two N+1 levels:

  • Level 2 (projects → tasks): 50 separate Task Load queries, one per project.
  • Level 3 (tasks → comments): 200–300 separate Comment Load queries, one per task.

Fix: projects.includes(tasks: :comments).limit(50) in Rails (or the equivalent in your ORM). From 300 queries to 4 — one for user, one for projects, one for tasks (IN), one for comments (IN). ActiveRecord time: 689 ms → ~15 ms.

Quiz

An APM trace shows 142 database spans per request on the /dashboard endpoint, while p99 is 1.8 s. The spans are short (1–3 ms each). What is the diagnosis?

Quiz

A PR adds a new serializer that accesses order.customer.name for each order in a list. Tests pass. What CI mechanism would have caught this N+1 before it shipped?

Order the steps

Order the layers of N+1 detection, from cheapest to find to hardest to find:

  1. 1 Development query log — enable ORM logger, count lines per request
  2. 2 CI gate — bullet/assertNumQueries fails PR before it ships
  3. 3 APM trace in staging — waterfall shows DB span columns
  4. 4 Production alert — queries_per_request p99 spikes on a live endpoint
Recall before you leave
  1. 01
    Why does a CPU profile miss N+1, and what three instruments surface it correctly?
  2. 02
    Describe the CI gate pattern for preventing N+1 regressions from shipping.
Recap

N+1 is off-CPU: the program waits between queries, not inside any function. That is why CPU profiles look healthy while the page is slow. The right instruments are the query log (repeated similar SELECTs), APM traces (tall column of short DB spans), and per-request query counters (alert at p99 > 20). The most leveraged prevention is a CI gate — Rails bullet or Django assertNumQueries — that fails PRs introducing lazy-load regressions before they reach production. Query count is the metric that matters; p99 latency is the result.

Connected lessons
appears again in159
Continue the climb ↑DataLoader: batching across resolver trees
shortcuts expand
search
K
prev piece
k
next piece
j
cycle tier
t
this menu
?
sources3
expand
  1. 01
  2. 02
  3. 03

Trademarks belong to their respective owners. Editorial reference only.