awesome-everything RU
↑ Back to the climb

Deployment & Infra

Load balancing L4 vs L7: what each layer can and can''''t see

Crux An L4 balancer routes by IP and port without reading the payload — fast but blind to URLs. An L7 balancer terminates the connection and reads HTTP, enabling path routing and retries at a CPU cost. Pick the wrong layer and path routing is simply impossible.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at junior altitude — the surface
◷ 16 min

A team splits a monolith: /api/* should go to the new service, everything else to the old one. They already run an AWS NLB for “performance”, so they add a rule to route by path — and discover there is no such rule. The NLB cannot see the path. It forwards raw TCP; the URL only exists once HTTP is parsed, and an L4 balancer never parses HTTP. The migration stalls until someone swaps in an ALB. The lesson cost a day: the layer you pick decides which routing is even possible.

L4 forwards bytes; L7 reads requests

A Layer-4 balancer operates at the transport layer. It sees the TCP/UDP connection — source IP, destination IP, ports — and forwards packets to a backend without ever decoding what is inside. It is essentially a fast, connection-aware NAT. Because it never parses the payload, it is protocol-agnostic (it balances any TCP or UDP service, not just HTTP) and extremely cheap per packet — AWS’s NLB handles millions of requests per second at microsecond-level per-packet overhead and hands out static IPs.

A Layer-7 balancer operates at the application layer. It terminates the client connection, reads the full HTTP request — method, host, path, headers, cookies — and then opens its own connection to a chosen backend. That terminate-and-re-originate is the whole source of its power and its cost. Once the balancer holds the parsed request, it can route on /api/* vs /, on Host: header, on a cookie, or on a custom header for canaries. The price: it must buffer and parse every request, and if TLS is in play it decrypts and re-encrypts — real CPU per request, and it only speaks the protocols it understands (HTTP/1.1, HTTP/2, gRPC).

The decision is not “which is better” but “what do I need to see.” If you only need to spread raw TCP across identical backends, L4 is faster and simpler. The moment routing depends on anything inside the request, you need L7 — and no amount of L4 tuning will conjure a URL it never decoded.

CapabilityL4 (transport)L7 (application)
Routes byIP + portpath, host, header, cookie
Reads payload?No — forwards bytesYes — parses HTTP
Protocolsany TCP/UDPHTTP/1.1, HTTP/2, gRPC
TLS terminationpassthrough (usually)terminates + can re-encrypt
HTTP-aware retriesNoYes (idempotent reqs)
Per-request costvery lowparse + buffer + crypto
AWS exampleNLBALB

Where TLS terminates changes everything

TLS termination is the cleanest lens on the L4/L7 split. An L7 balancer ends the encrypted session at the edge: it holds the certificate, decrypts the request, reads the plaintext HTTP, makes its routing decision, and forwards — either as plaintext inside the trusted network or re-encrypted to the backend. That decryption is exactly what lets it route on path or header. It also centralises certificate management and offloads crypto from your app servers.

An L4 balancer normally cannot terminate TLS, because terminating means decrypting, and decrypting means becoming HTTP-aware. So it does TLS passthrough: the encrypted bytes flow straight to the backend, which holds the cert and terminates. That keeps the balancer blind by design — great for end-to-end encryption and for non-HTTP protocols, useless if you wanted path routing. This is the concrete reason “just use the fast one” backfires: the fast one literally cannot read what you need to route on.

Why this works

“L4 is always faster” is half-true and misleading. Per packet, yes — no parsing, no crypto. But an L7 balancer that terminates TLS once at the edge can be cheaper overall than passing encryption through to every backend, because each app server no longer pays the handshake and decryption cost. The right question is where the work is best done, not which layer wins a microbenchmark.

Algorithms, health checks, and pulling dead backends

Both layers still have to choose a backend. Round-robin spreads connections evenly and is fine when requests cost roughly the same. Least-connections sends the next request to the backend with the fewest active connections — far better when request durations vary widely, because one slow endpoint won’t keep getting piled on. Hashing (by client IP, or consistent hashing) pins a given key to a given backend, which is how an L4 balancer fakes stickiness without cookies.

Health checks are what keep the pool honest. Active checks probe each backend on an interval (say every few seconds) and pull it from rotation after N consecutive failures; passive checks mark a backend down after it returns errors to real traffic. Open-source nginx leans on passive checks; ALB/NLB and nginx Plus do active probing. The failure to internalise: a backend that crashes is only removed after the health check notices — so a too-slow interval means a window where live traffic keeps hitting a dead box and 502s leak to users.

Pick the best fit

You must route /api/* to a new service and / to the old one, terminate TLS at the edge, and run canary by a request header. Pick the balancer.

Sticky sessions and connection draining: two ways deploys break

Sticky sessions (session affinity) pin a client to one backend — via an L4 IP hash or an L7 cookie — so in-memory session state stays put. It works, and it quietly hurts: load stops distributing evenly (one whale client hammers one box), and that backend can’t be drained cleanly because it owns sessions nobody else has. The senior reflex is to make backends stateless (sessions in Redis or a JWT) so any backend can serve any request; reach for stickiness only when you genuinely can’t externalise state.

The deploy-killer is missing connection draining. When you remove a backend during a rollout, in-flight requests are still being served on it. If the balancer rips it out immediately, those requests die — users see resets mid-checkout. Connection draining (AWS calls it deregistration delay) tells the balancer to stop sending new connections to the backend but let existing ones finish for a grace window. On AWS the default is 300 seconds, tunable from 0 to 3600. Set it shorter than your longest request and you’ll still cut live requests; forget to enable it and every deploy is a small outage. Pair it with health checks: drain, wait, then terminate.

Quiz

Your L4 (NLB) load balancer needs to send /admin/* to a separate backend pool. How do you configure the path rule?

Quiz

During a rolling deploy, users report connection resets mid-request whenever an old instance is removed. What's the fix?

Order the steps

Order the steps to gracefully remove a backend during a deploy:

  1. 1 Mark the backend for deregistration so the balancer stops sending it new connections
  2. 2 Let connection draining run: existing in-flight requests keep being served
  3. 3 Wait out the drain window (e.g. AWS deregistration delay, default 300s)
  4. 4 Once connections finish or the window elapses, terminate the instance
  5. 5 Health checks confirm the remaining pool is serving traffic
Recall before you leave
  1. 01
    A colleague wants path-based routing but the team already runs an L4 (NLB) balancer 'because it's faster'. Explain precisely why path routing is impossible there and what changes if you move to L7.
  2. 02
    Walk through what connection draining does during a deploy and what breaks without it. Include the AWS default.
Recap

The layer you pick decides what the balancer can route on, because it decides what it can see. An L4 balancer works at the transport layer — IP, port, raw TCP/UDP — and forwards bytes without parsing, making it fast, protocol-agnostic, and capable of TLS passthrough, but structurally blind to URLs, headers, and cookies. An L7 balancer terminates the connection, parses HTTP, and routes on path, host, header, or cookie, which is what enables path-based routing, edge TLS termination, header canaries, and HTTP-aware retries — paid for in per-request CPU. On top of the layer choice sit the operational details that decide reliability: round-robin, least-connections, and hashing balance differently under uneven load; active and passive health checks pull dead backends out, but only as fast as their interval; sticky sessions keep state in place at the cost of even distribution and clean draining; and connection draining (AWS deregistration delay, default 300 seconds) lets in-flight requests finish before a backend is removed. The two production failures fall straight out of this: needing path routing on an L4 balancer, which is impossible, and removing a backend with no draining, which kills live requests on every deploy.

Continue the climb ↑Load balancing levels: multiple-choice review
shortcuts expand
search
K
prev piece
k
next piece
j
cycle tier
t
this menu
?
sources4
expand
  1. 01
  2. 02
  3. 03
  4. 04

Trademarks belong to their respective owners. Editorial reference only.