awesome-everything RU
↑ Back to the climb

Networking & Protocols

Deployment tradeoffs and CPU cost

Crux QUIC''''s user-space packet processing costs 15–30% more CPU per byte than kernel TCP, drops up to 45% goodput on fast links, and requires mandatory HTTP/2 fallback for the 3–5% of networks that block UDP.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at senior altitude — in orbit
◷ 14 min

Your CDN rolls out HTTP/3. CPU utilization on edge servers spikes 25%, goodput on 1 Gbps paths drops by nearly half, and 4% of clients fall back silently to HTTP/2 with no error visible anywhere. QUIC is genuinely better for latency — and genuinely expensive to operate.

CPU cost: why user-space transport is expensive

QUIC runs in user space — every packet goes through application code, not the kernel’s NIC-to-socket fast path. The cost components are:

  1. Per-packet AES-GCM encryption — ~15–20 cycles per byte on modern CPUs. Kernel TCP offloads this to the NIC (TLS offload, kTLS). QUIC cannot — the NIC doesn’t speak QUIC yet.
  2. Variable-length integer decoding — packet numbers, stream IDs, and frame lengths use QUIC’s VarInt encoding. Each decode is a conditional branch in user space.
  3. Payload framing — STREAM, CRYPTO, and ACK frames are serialized/deserialized per packet in user space. Retransmitted frames must be re-serialized with new packet numbers.
  4. Syscall overhead — without batching, each UDP sendmsg() is a syscall. At 1 Gbps with 1500-byte packets that’s ~83 k syscalls/s per core.

Measured impact: On a 1 Gbps LAN link at full rate, QUIC saturates a CPU core before the link fills. Goodput drops up to ~45% vs HTTP/2 on TCP. On slow or lossy paths (typical mobile) the network — not CPU — is the bottleneck, so the overhead is unmeasurable.

QUIC CPU cost: path matters
Fast LAN (1 Gbps)
CPU bottleneck first
Goodput −45% vs TCP
HoL benefit: negligible
Use TCP
WAN / API (100 ms RTT)
Network bottleneck
CPU overhead: invisible
Handshake save: +100 ms
Use QUIC
Mobile 4G (lossy)
Network bottleneck
CPU overhead: invisible
HoL save: 575× at 0.5% loss
Use QUIC

Mitigations: UDP GSO, NIC offload, core affinity

UDP Generic Segmentation Offload (UDP GSO): Instead of one sendmsg() per 1500-byte packet, batch up to 64 KB of QUIC payload in a single syscall with a GSO hint. The kernel segments it into individual UDP datagrams before NIC DMA. Result: ~3–4 syscalls per 64 KB instead of 43. Cloudflare reports ~20% CPU gain from GSO alone.

NIC QUIC offload (Intel E810 and newer): Parse QUIC long/short headers in silicon, route packets to the correct QUIC stream without involving user-space demux. Reduces per-packet interrupt overhead. Still experimental as of 2026 but available in cloud-optimized NICs.

Core affinity: Keep the QUIC process on the same physical core. QUIC state (connection table, cc windows, key material) fits in L3 cache. Cross-core migrations flush cache lines, adding ~50 ns per packet.

With GSO + affinity, per-byte CPU cost drops from 30% to 15–20% overhead vs kernel TCP.

Quiz

Why does QUIC's CPU overhead hurt throughput on a 1 Gbps LAN but not on a 10 Mbps mobile link?

UDP blocking and mandatory HTTP/2 fallback

~3–5% of networks block UDP outright — corporate proxies, certain ISP gateways, some LTE contexts. When QUIC is silently dropped (no ICMP error, just lost packets) the client’s only signal is a timeout — typically 1–3 seconds before fallback to TCP.

Browser racing: Modern browsers start both QUIC (UDP 443) and TCP (443) simultaneously. Whichever handshake completes first wins. This limits the UX penalty from UDP blocking to zero — TCP wins and QUIC gracefully loses the race. The downside: wasted effort on every connection if QUIC is consistently blocked on a path.

Alt-Svc discovery: HTTP/3 is advertised via Alt-Svc: h3=":443"; ma=3600 in the HTTP/2 response. The browser caches this and attempts QUIC on the next connection. First-time connections always fall back to TCP, discover the header, and upgrade on subsequent requests.

RFC 9000 mandate: Implementations MUST support fallback to HTTP/2 over TCP. A deployment that doesn’t is non-compliant and will break in blocked networks.

Quiz

A user on a corporate network reports that loading your site is 2 seconds slower than usual. QUIC is enabled. What is the most likely cause and how do you verify?

Deployment reality 2026

~21% of web traffic runs HTTP/3. ~35–40% of major sites advertise it via Alt-Svc. Adoption is bimodal:

  • Mobile browsers default-enable HTTP/3 — latency and HoL elimination matter on cellular.
  • Desktop/LAN still races QUIC vs TCP; TCP often wins because round-trips are short and HoL blocking is rare on fast paths.

Major CDNs (Cloudflare, Google, Akamai, Fastly) enable HTTP/3 by default. Browsers (Chrome, Safari, Firefox) support it. The adoption curve will steepen as UDP blocking becomes rarer, hardware offload reduces CPU cost, and fallback racing becomes universal.

Debug this

QUIC packet trace — diagnose encryption-level and loss issues

log
$ quictrace capture.pcapng | head -20
timestamp=0.000 dcid=12345678 type=Initial pkt_num=0 frames=[Crypto[0..120], Padding]
timestamp=0.045 dcid=12345678 type=Initial pkt_num=1 frames=[Crypto[120..240], Padding] # Retransmit (no ACK in time)
timestamp=0.051 scid=87654321 dcid=12345678 type=Initial pkt_num=0 frames=[Crypto[0..200], Ack[0], Padding]
timestamp=0.052 dcid=87654321 type=Handshake pkt_num=0 frames=[Crypto[200..350]]
timestamp=0.100 scid=87654321 dcid=12345678 type=Handshake pkt_num=0 frames=[Crypto[350..400], Finished]
timestamp=0.101 dcid=87654321 type=1RTT pkt_num=0 frames=[Stream(0, fin, 4096 bytes)]
timestamp=0.151 scid=87654321 dcid=12345678 type=1RTT pkt_num=0 frames=[Stream(0, fin, [all bytes 0..4095], Ack[0])]

The client sees one Initial retransmit before the server's Initial arrives. The Handshake then flows normally. What does this indicate?

Observability gaps

QUIC’s encryption prevents packet inspection — tcpdump shows only opaque blobs. Traditional network monitoring (per-flow HTTP request counts, slow client detection, misbehavior at flow level) breaks.

Adapting the stack:

  • Applications export QUIC traces via JSON (RFC 9312 qlog format) — connection lifecycle, packet numbers, CC events.
  • Browsers report PerformanceResourceTiming.nextHopProtocol = "h3" for HTTP/3 connections.
  • Cloud providers (AWS, GCP) are adding QUIC-aware flow metrics.
  • eBPF probes on userspace QUIC sockets can reconstruct packet timing without decryption.

The trade-off is intrinsic: encryption buys privacy and security at the cost of operational opacity.

Pick the best fit

A CDN must choose between deploying QUIC for a latency-sensitive API (small request/response, intercontinental) vs. a high-throughput static asset service (1 Gbps, LAN clients).

Design challenge

Design the deployment strategy for rolling out HTTP/3 and QUIC to a global CDN serving both mobile (90% traffic) and desktop (10% traffic), with current HTTP/2 TCP infrastructure.

  • Existing HTTP/2 deployment is stable and well-tuned; no breaking changes to TCP paths.
  • Mobile clients are diverse (iOS 14+, Android 5+, various browsers); some networks are QUIC-blocked.
  • CPU budget for QUIC: no more than 20% overhead vs. current HTTP/2.
  • Observability: measure QUIC adoption rate, fallback rate, and latency improvement per client device.
QUIC deployment reality 2026
CPU overhead vs kernel TCP
15–30% per byte
Goodput loss on 1 Gbps fast links
up to ~45% vs HTTP/2
UDP GSO CPU gain (Cloudflare)
~20% per connection
Networks blocking UDP
~3–5%
Web traffic running HTTP/3 (2026)
~21%
Major sites advertising HTTP/3
~35–40%
Browser QUIC+TCP racing timeout
1–3 s before TCP wins
Why this works

Why not add QUIC to the kernel? Linux has experimental in-kernel QUIC patches, but the community is divided. Kernel TCP benefits from NIC offload (kTLS, GRO, RSS) built over decades. Replicating this for QUIC would take years and couples QUIC’s evolution to the kernel release cycle — the opposite of what RFC 9000 intended. User-space QUIC can ship new CC algorithms weekly; kernel QUIC cannot. The CPU cost is the price of agility.

Recall before you leave
  1. 01
    Why does QUIC's CPU overhead hurt 1 Gbps LAN throughput but not 10 Mbps mobile throughput?
  2. 02
    What is UDP GSO and why does Cloudflare report ~20% CPU gain from it?
  3. 03
    A CDN sees 4% of QUIC connections fail silently with no error. What is the likely cause and fix?
Recap

QUIC’s user-space architecture delivers latency and HoL wins at a real CPU cost: 15–30% more per byte than kernel TCP, rising to ~45% goodput loss on fast 1 Gbps links where CPU — not the network — is the bottleneck. UDP GSO batches syscalls and recovers ~20% CPU; NIC offload and core affinity push it further. About 3–5% of networks block UDP silently, requiring browser-side TCP racing and mandatory HTTP/2 fallback. As of 2026, ~21% of web traffic runs HTTP/3 with bimodal adoption — mobile benefits clearly, desktop races TCP and often loses to it. QUIC encryption breaks traditional packet inspection; qlog (RFC 9312), browser timing APIs, and eBPF probes are the replacement observability stack. The right deployment strategy: QUIC for latency-sensitive WAN and mobile paths, TCP for high-throughput LAN and static assets.

Connected lessons
appears again in162
Continue the climb ↑QUIC internals: multiple-choice review
shortcuts expand
search
K
prev piece
k
next piece
j
cycle tier
t
this menu
?
sources3
expand
  1. 01
  2. 02
  3. 03

Trademarks belong to their respective owners. Editorial reference only.