Networking & Protocols NET · 10 · 07

Deployment tradeoffs and CPU cost

QUIC''''s user-space packet processing costs 15–30% more CPU per byte than kernel TCP, drops up to 45% goodput on fast links, and requires mandatory HTTP/2 fallback for the 3–5% of networks that block UDP.

NET Senior ◷ 14 min

Level

FoundationsJuniorMiddleSenior

Your CDN rolls out HTTP/3. CPU utilization on edge servers spikes 25%, goodput on 1 Gbps paths drops by nearly half, and 4% of clients fall back silently to HTTP/2 with no error visible anywhere. QUIC is genuinely better for latency — and genuinely expensive to operate.

CPU cost: why user-space transport is expensive

QUIC runs in user space — every packet goes through application code, not the kernel’s NIC-to-socket fast path. The cost components are:

Per-packet AES-GCM encryption — ~15–20 cycles per byte on modern CPUs. Kernel TCP offloads this to the NIC (TLS offload, kTLS). QUIC cannot — the NIC doesn’t speak QUIC yet.
Variable-length integer decoding — packet numbers, stream IDs, and frame lengths use QUIC’s VarInt encoding. Each decode is a conditional branch in user space.
Payload framing — STREAM, CRYPTO, and ACK frames are serialized/deserialized per packet in user space. Retransmitted frames must be re-serialized with new packet numbers.
Syscall overhead — without batching, each UDP sendmsg() is a syscall. At 1 Gbps with 1500-byte packets that’s ~83 k syscalls/s per core.

Measured impact: On a 1 Gbps LAN link at full rate, QUIC saturates a CPU core before the link fills. Goodput drops up to ~45% vs HTTP/2 on TCP. On slow or lossy paths (typical mobile) the network — not CPU — is the bottleneck, so the overhead is unmeasurable. Together these four cost components explain why QUIC is fast in latency but expensive in throughput: each one is work that kernel TCP offloads to the NIC or handles in a single system call, but QUIC must do in user space on every packet.

QUIC CPU cost: path matters

Fast LAN (1 Gbps)

CPU bottleneck first

Goodput −45% vs TCP

HoL benefit: negligible

Use TCP

WAN / API (100 ms RTT)

Network bottleneck

CPU overhead: invisible

Handshake save: +100 ms

Use QUIC

Mobile 4G (lossy)

Network bottleneck

CPU overhead: invisible

HoL save: 575× at 0.5% loss

Use QUIC

Mitigations: UDP GSO, NIC offload, core affinity

UDP Generic Segmentation Offload (UDP GSO): Instead of one sendmsg() per 1500-byte packet, batch up to 64 KB of QUIC payload in a single syscall with a GSO hint. The kernel segments it into individual UDP datagrams before NIC DMA. Result: ~3–4 syscalls per 64 KB instead of 43. Cloudflare reports ~20% CPU gain from GSO alone.

NIC QUIC offload (Intel E810 and newer): Parse QUIC long/short headers in silicon, route packets to the correct QUIC stream without involving user-space demux. Reduces per-packet interrupt overhead. Still experimental as of 2026 but available in cloud-optimized NICs.

Core affinity: Keep the QUIC process on the same physical core. QUIC state (connection table, cc windows, key material) fits in L3 cache. Cross-core migrations flush cache lines, adding ~50 ns per packet.

With GSO + affinity, per-byte CPU cost drops from 30% to 15–20% overhead vs kernel TCP.

The CPU tax is not fixed: batching syscalls with UDP GSO recovers ~20%, and pinning the connection to one core for L3-cache locality takes the overhead from 30% down to 15%.

Quiz

Why does QUIC's CPU overhead hurt throughput on a 1 Gbps LAN but not on a 10 Mbps mobile link?

UDP blocking and mandatory HTTP/2 fallback

~3–5% of networks block UDP outright — corporate proxies, certain ISP gateways, some LTE contexts. When QUIC is silently dropped (no ICMP error, just lost packets) the client’s only signal is a timeout — typically 1–3 seconds before fallback to TCP.

Browser racing: Modern browsers start both QUIC (UDP 443) and TCP (443) simultaneously. Whichever handshake completes first wins. This limits the UX penalty from UDP blocking to zero — TCP wins and QUIC gracefully loses the race. The downside: wasted effort on every connection if QUIC is consistently blocked on a path.

Alt-Svc discovery: HTTP/3 is advertised via Alt-Svc: h3=":443"; ma=3600 in the HTTP/2 response. The browser caches this and attempts QUIC on the next connection. First-time connections always fall back to TCP, discover the header, and upgrade on subsequent requests.

RFC 9000 mandate: Implementations MUST support fallback to HTTP/2 over TCP. A deployment that doesn’t is non-compliant and will break in blocked networks.

Quiz

A user on a corporate network reports that loading your site is 2 seconds slower than usual. QUIC is enabled. What is the most likely cause and how do you verify?

Deployment reality 2026

~21% of web traffic runs HTTP/3. ~35–40% of major sites advertise it via Alt-Svc. Adoption is bimodal:

Mobile browsers default-enable HTTP/3 — latency and HoL elimination matter on cellular.
Desktop/LAN still races QUIC vs TCP; TCP often wins because round-trips are short and HoL blocking is rare on fast paths.

Major CDNs (Cloudflare, Google, Akamai, Fastly) enable HTTP/3 by default. Browsers (Chrome, Safari, Firefox) support it. The adoption curve will steepen as UDP blocking becomes rarer, hardware offload reduces CPU cost, and fallback racing becomes universal.

Debug this

QUIC packet trace — diagnose encryption-level and loss issues

log

$ quictrace capture.pcapng | head -20
timestamp=0.000 dcid=12345678 type=Initial pkt_num=0 frames=[Crypto[0..120], Padding]
timestamp=0.045 dcid=12345678 type=Initial pkt_num=1 frames=[Crypto[120..240], Padding] # Retransmit (no ACK in time)
timestamp=0.051 scid=87654321 dcid=12345678 type=Initial pkt_num=0 frames=[Crypto[0..200], Ack[0], Padding]
timestamp=0.052 dcid=87654321 type=Handshake pkt_num=0 frames=[Crypto[200..350]]
timestamp=0.100 scid=87654321 dcid=12345678 type=Handshake pkt_num=0 frames=[Crypto[350..400], Finished]
timestamp=0.101 dcid=87654321 type=1RTT pkt_num=0 frames=[Stream(0, fin, 4096 bytes)]
timestamp=0.151 scid=87654321 dcid=12345678 type=1RTT pkt_num=0 frames=[Stream(0, fin, [all bytes 0..4095], Ack[0])]

The client sees one Initial retransmit before the server's Initial arrives. The Handshake then flows normally. What does this indicate?

Observability gaps

QUIC’s encryption prevents packet inspection — tcpdump shows only opaque blobs. Traditional network monitoring (per-flow HTTP request counts, slow client detection, misbehavior at flow level) breaks.

Adapting the stack:

Applications export QUIC traces via JSON (RFC 9312 qlog format) — connection lifecycle, packet numbers, CC events.
Browsers report PerformanceResourceTiming.nextHopProtocol = "h3" for HTTP/3 connections.
Cloud providers (AWS, GCP) are adding QUIC-aware flow metrics.
eBPF probes on userspace QUIC sockets can reconstruct packet timing without decryption.

The trade-off is intrinsic: encryption buys privacy and security at the cost of operational opacity.

Pick the best fit

A CDN must choose between deploying QUIC for a latency-sensitive API (small request/response, intercontinental) vs. a high-throughput static asset service (1 Gbps, LAN clients).

Design challenge

Design the deployment strategy for rolling out HTTP/3 and QUIC to a global CDN serving both mobile (90% traffic) and desktop (10% traffic), with current HTTP/2 TCP infrastructure.

Existing HTTP/2 deployment is stable and well-tuned; no breaking changes to TCP paths.
Mobile clients are diverse (iOS 14+, Android 5+, various browsers); some networks are QUIC-blocked.
CPU budget for QUIC: no more than 20% overhead vs. current HTTP/2.
Observability: measure QUIC adoption rate, fallback rate, and latency improvement per client device.

Reference answer

Phase 1 (Month 1–3): Enable QUIC on edge servers in non-blocking networks (APAC, US, EU datacenters). Advertise HTTP/3 via Alt-Svc to mobile browsers only (Alt-Svc: h3=':443'; ma=3600). Measure: Alt-Svc discovery rate, QUIC connection rate, fallback rate. Expect 10–20% adoption on first browser restart. Phase 2 (Month 4–6): Deploy UDP GSO on all edge servers; drop CPU overhead to 15%. Expand QUIC advertisement to all clients but race QUIC and TCP in the browser (both initiate, first-to-connect wins). Measure: QUIC win rate, latency improvement (expect 30–55% on mobile due to HoL fix; 5–10% on desktop due to handshake save). Phase 3 (Month 7–12): Monitor fallback rate from QUIC-blocked networks (expect 3–5%). For blocked paths add explicit HTTP/2 fallback after a 2-second QUIC timeout (shorter than browser default 1-minute timeout). Redirect blocked clients to a HTTP/2-only Alt-Svc to prevent repeated failures. Phase 4 (Month 13+): QUIC becomes the default; TCP remains for fallback and legacy clients. Steady-state: ~60–70% QUIC, ~30–40% TCP. Observability: track (1) Alt-Svc adoption, (2) QUIC-to-TCP fallback rate (alert if > 8%), (3) p50/p99 latency by protocol by client device, (4) CPU cost per protocol.

Should cover

Mobile benefits from QUIC's latency; desktop from TCP's throughput + familiarity. Different tiers get different protocols.
Alt-Svc discovery requires a prior HTTP/2 request; race both protocols in parallel for new clients to avoid fallback latency.
QUIC-blocked networks exist; explicit fallback after a short timeout (2–3s) avoids hanging the user.
UDP GSO is critical for CPU cost control. Without it, QUIC is too expensive for high-throughput CDNs.
Measure fallback rate continuously. If > 5%, investigate whether it is genuine UDP blocking or a deployment bug.
Connection semantics differ: HTTP/2 over TCP has persistent TCP state; HTTP/3 over QUIC moves state to QUIC. Ensure your load balancer and observability stack understand both.

QUIC deployment reality 2026

CPU overhead vs kernel TCP: 15–30% per byte
Goodput loss on 1 Gbps fast links: up to ~45% vs HTTP/2
UDP GSO CPU gain (Cloudflare): ~20% per connection
Networks blocking UDP: ~3–5%
Web traffic running HTTP/3 (2026): ~21%
Major sites advertising HTTP/3: ~35–40%
Browser QUIC+TCP racing timeout: 1–3 s before TCP wins

▸Why this works

Why not add QUIC to the kernel? Linux has experimental in-kernel QUIC patches, but the community is divided. Kernel TCP benefits from NIC offload (kTLS, GRO, RSS) built over decades. Replicating this for QUIC would take years and couples QUIC’s evolution to the kernel release cycle — the opposite of what RFC 9000 intended. User-space QUIC can ship new CC algorithms weekly; kernel QUIC cannot. The CPU cost is the price of agility.

Because QUIC lives in user space, each packet pays AES-GCM, VarInt decode, framing, and a syscall on the CPU — work that kernel TCP hands to the NIC (kTLS, GSO, RSS). This is the 15–30% per-byte overhead, and on fast links the CPU saturates before the link fills.

Recall before you leave

01
Why does QUIC's CPU overhead hurt 1 Gbps LAN throughput but not 10 Mbps mobile throughput?
02
What is UDP GSO and why does Cloudflare report ~20% CPU gain from it?
03
A CDN sees 4% of QUIC connections fail silently with no error. What is the likely cause and fix?

Recap

QUIC’s user-space architecture delivers latency and HoL wins at a real CPU cost: 15–30% more per byte than kernel TCP, rising to ~45% goodput loss on fast 1 Gbps links where CPU — not the network — is the bottleneck. UDP GSO batches syscalls and recovers ~20% CPU; NIC offload and core affinity push it further. About 3–5% of networks block UDP silently, requiring browser-side TCP racing and mandatory HTTP/2 fallback. As of 2026, ~21% of web traffic runs HTTP/3 with bimodal adoption — mobile benefits clearly, desktop races TCP and often loses to it. QUIC encryption breaks traditional packet inspection; qlog (RFC 9312), browser timing APIs, and eBPF probes are the replacement observability stack. The right deployment strategy: QUIC for latency-sensitive WAN and mobile paths, TCP for high-throughput LAN and static assets. Now when you see a QUIC deployment spike edge-server CPU by 25%, you will know exactly where to look first: whether UDP GSO is enabled, whether core affinity is set, and whether the traffic profile is latency-bound mobile or throughput-bound LAN — because the fix is different in each case.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 5 done

Connected lessons

appears again in166

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.