Deployment & Infra DEP · 06 · 10

Load balancing levels: deploy with zero dropped requests

Hands-on project — stand up an L7 balancer in front of two backends, prove path routing, then engineer a zero-error rolling deploy with health checks and connection draining.

DEP Senior ◷ 220 min

Level

FoundationsJuniorMiddleSenior

Reading that connection draining prevents resets is not the same as watching a load test stay at zero errors while you roll a backend out from under it. Stand up a real L7 balancer, prove path routing works where L4 cannot, then engineer a rolling deploy that drops not a single in-flight request — and prove it with numbers.

Goal

Turn the unit’s mental model into a working rig: route by path at L7, choose an algorithm that survives uneven load, configure health checks that pull a dead backend fast, and add connection draining so a rolling deploy completes with zero dropped requests under sustained traffic.

Project

0 of 7

Objective

Front two distinct backends with an L7 balancer (nginx, Envoy, HAProxy, or a cloud ALB), prove path-based routing, then drive sustained load while you remove and replace a backend — and show the deploy completes with zero 5xx and zero connection resets.

Requirements

Acceptance criteria

A curl transcript proving /api/* and / land on different backends through one L7 balancer, plus a one-line explanation of why an L4 balancer cannot reproduce this.
A health-check measurement: probe config plus the observed detection-and-removal time and failed-request count when a backend is killed mid-traffic.
A round-robin vs least-connections comparison table under the skewed-latency load: p99 and per-backend distribution for each, with a sentence on why least-connections wins here.
Two load-test runs side by side — rolling deploy WITHOUT draining (non-zero resets/5xx) and WITH draining (zero resets/5xx) — under identical sustained load, measured not estimated.
A short write-up: which layer you chose and why, the drain window you picked and how you derived it from your slowest request, and the deploy sequence (deregister, drain, wait, terminate, confirm).

Senior stretch

Add a header-based canary: route requests carrying X-Canary: true to a third backend while everyone else stays on the stable pool, and show the split in the access logs.
Introduce sticky sessions (cookie or IP-hash), reproduce the uneven-load and dirty-drain problems they cause, then externalise session state (Redis or JWT), remove stickiness, and show distribution and draining both improve.
Compare TLS termination at the edge vs TLS passthrough to the backends: measure the CPU cost per backend in each mode and explain when end-to-end encryption is worth the lost path routing.
Write a one-page on-call runbook: how to read the four signals (per-backend distribution, health-check status, drain state, 5xx rate), the safe deploy sequence, and the first three things to check when a deploy starts dropping requests.

Recap

This is the rig behind every safe production rollout: an L7 balancer routing on what it can see, an algorithm matched to your load profile, health checks that pull dead backends as fast as their interval allows, and connection draining that lets in-flight requests finish before a backend retires. The two load-test runs — resets without draining, zero errors with it — are the whole point of the unit made measurable. Build it once on a toy rig and the zero-downtime deploy becomes a checklist you trust instead of a hope.

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.