Browser & Frontend Runtime WEB · 07 · 05

Lab vs field: why the two disagree and how to use each

Lighthouse is a lab tool — reproducible, great for debugging. CrUX is field data from real Chrome users — and only the field drives Search ranking. Knowing which to trust for which question ends the confusion.

WEB Middle ◷ 12 min

Level

FoundationsJuniorMiddleSenior

PageSpeed Insights shows LCP 1.8 s in the Lighthouse section — green, good. The field data section shows LCP 3.9 s at p75 — red, poor. The team says “the lab says we are fine.” They ship nothing. Three months later, Search Console still shows poor LCP. The lab score and the ranking are measuring different things. Knowing which one is the verdict changes everything.

Lab data vs field data — two different questions.

Have you ever shipped a “performance fix” only to find the ranking number unchanged three months later? That is the lab-vs-field gap in action. Lab data is a synthetic measurement: a tool like Lighthouse or DevTools loads the page once, on a simulated device and a throttled network, in a controlled environment. It is fully reproducible and it can attribute problems precisely — great for debugging. One device, one network, one moment. If you change the page and rerun, the result is comparable.

Field data is real-user monitoring: the actual Core Web Vitals experienced by real visitors on their real devices and networks, aggregated. Google’s public field dataset is CrUX — the Chrome User Experience Report. It aggregates visits from opted-in Chrome users, and the field p75 from CrUX is what Search ranking uses.

They disagree routinely because they measure different things. A lab run on a simulated fast device shows an LCP that a quarter of real users — on older phones, on congested 4G — never come close to. INP barely exists in the lab at all, because lab tools do not script realistic interaction sequences; the lab proxy is Total Blocking Time (TBT). CLS in the lab is close to real, but dynamic content that appears only with real user sessions can be missed.

Lab vs field — key differences

Lab (Lighthouse, DevTools): Synthetic, reproducible, debuggable
Field (CrUX, web-vitals RUM): Real users, real devices, real networks
What drives Search ranking: CrUX field p75
Lab INP proxy: Total Blocking Time (TBT)
Rule: Debug in lab; judge in field

Both sources measure the same metrics, but they answer different questions. Lab is reproducible, so it is for debugging and CI regression gates; the field p75 from CrUX is the only number Search ranking uses. Debug in the lab, judge in the field.

The rule: debug in the lab, judge in the field.

Use the lab to reproduce and attribute a problem (fast, deterministic, easy to bisect commits), and to gate regressions in CI (Lighthouse CI runs on every PR, fast, no real users needed). Use the field p75 as the verdict on whether a page is actually good for users and whether it will rank. A lab improvement that does not move the field p75 did not help your users. Only the field number is the truth.

The supporting diagnostic metrics.

The three Core Web Vitals sit on top of a layer of diagnostic metrics that are not scored for ranking but are essential for understanding them:

TTFB (Time to First Byte) — the server-and-network portion. It is the floor under LCP: LCP can never be faster than TTFB.
FCP (First Contentful Paint) — when the first pixel of content appears, vs LCP’s largest. A large FCP-to-LCP gap means the page shows something quickly but the main content lags — a sign the LCP resource has load delay.
TBT (Total Blocking Time) — the sum of the blocking portion of every long task during load. This is the lab proxy for INP: you cannot measure true INP in a lab (it needs real interactions), so CI gates use TBT instead. A high TBT predicts a high INP.

Together, TTFB, FCP, and TBT form the diagnostic layer under the three ranked vitals: without them, a Lighthouse report is a verdict with no explanation. Knowing these is what lets you read a Lighthouse report fully — Lighthouse never just says “LCP is bad”, it shows TTFB and FCP so you can see where in the chain the time went.

Reading a DevTools performance trace.

Open DevTools, the Performance panel, record a page load and an interaction. The trace shows several tracks:

Timings track: marks FCP, LCP, and DCL with vertical lines. Hovering the LCP marker names the exact element.
Frames track: each rendered frame; layout shifts appear as red markers — click to see which nodes shifted.
Main track: flame chart of main-thread work; long tasks are flagged with a red corner.
Interactions track: click/tap events, broken into input delay, processing, and presentation delay — color-coded — so you can see which part dominates for each interaction without guessing.

The INP workflow: interact during the recording, find the interaction in the Interactions track, read the three-part split. The LCP workflow: find the LCP marker in the Timings track, hover to identify the element, then look at the network waterfall above it to see when the resource was requested vs when it downloaded vs when the element painted.

▸Why this works

Why is p75 used rather than the median? Median performance is pulled down by fast outliers — developers on fast Wi-Fi, cached repeat visits. The 75th percentile focuses attention on the slow quarter of real users: those on older phones, mobile networks, or less powerful CPUs. A change that improves the median but not p75 helped the already-fast users and did nothing for the users who needed it most. p75 is the threshold that makes the optimization target the tail of the distribution, not its center.

Trace it

1/4

PageSpeed Insights shows LCP 1.8 s in the Lighthouse (lab) section — good — but the field data section shows LCP 3.9 s at p75 — poor. The team says 'the lab says we are fine'. Who is right?

Step 1 of 4

The field data is the real verdict — lab ran on a fast simulated device and connection; real users at p75 are on slower phones and networks, and 3.9 s is what they actually experience

Locked

The lab is right because it is reproducible

Locked

They are both wrong; only TTFB matters

Locked

Average the two: 2.85 s

Complete the analogy

Google publishes a public dataset of real-user Core Web Vitals, aggregated from actual Chrome visits, and Search ranking is based on it — not on any lab test. What is that field dataset called?

Compute it

A 'good' INP score requires interaction latency at or below how many milliseconds, at the 75th percentile?

Recall before you leave

01
Explain the lab-vs-field distinction and give the rule for when to use each.
02
Why is Total Blocking Time used as the INP proxy in CI, rather than measuring INP directly?
03
In a DevTools performance trace, what does the Interactions track show for a slow interaction, and how do you use it to diagnose INP?

Recap

Lab data — from Lighthouse or DevTools — is synthetic: one device, one network, reproducible, great for debugging and CI regression gates. Field data — CrUX — is from real Chrome users at p75, and it is what Search ranking uses. They disagree routinely because real user diversity (slower phones, congested networks) is not captured in a single simulated run. INP barely exists in the lab because lab tools do not simulate real interactions; the lab proxy is Total Blocking Time. The rule is: debug in the lab, judge in the field. TTFB, FCP, and TBT are the supporting diagnostic metrics under the three Core Web Vitals — knowing them is what lets you read a Lighthouse report and trace where in the chain time was lost. A lab improvement that does not move the field p75 did not help your users; only the field number is the real verdict. Now when you see a green Lighthouse score alongside a red Search Console report, you will know exactly why they disagree — and which one to act on.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 5 done

Connected lessons

builds on

unlocks

Metric tradeoffs, RUM attribution, and the CI+field loopsenior

deepens into

Metric tradeoffs, RUM attribution, and the CI+field loopsenior

appears again in196

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.