awesome-everything RU
↑ Back to the climb

Networking & Protocols

Resilient LB architecture: anycast, zone-aware routing, and observability

Crux A single LB is a SPOF — anycast + BGP ECMP eliminates it; zone-aware routing cuts cross-zone egress cost; TLS terminates at the edge; RED metrics and circuit-breaker state are the minimum observability for safe operation.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at senior altitude — in orbit
◷ 14 min

Your load balancer is healthy. Then it is not. Every client connection drops at once because the single LB machine crashed. Your load balancer — the component that was supposed to make your backend resilient — is your single point of failure.

The LB single point of failure

If your load balancer is a single machine, its failure takes down all traffic. Even with a hot standby (active-passive failover), the failover takes 10–30 seconds — long enough for users to notice.

Solution: anycast + BGP ECMP

Multiple LB machines all advertise the same anycast VIP (virtual IP) via BGP. The network’s Equal-Cost Multipath (ECMP) routing distributes client traffic across all LBs:

  • Client connects to IP 203.0.113.10 (the VIP).
  • BGP sees multiple equal-cost routes to 203.0.113.10 — one via LB_A, one via LB_B, one via LB_C.
  • ECMP hashes the 5-tuple (src IP, src port, dst IP, dst port, protocol) to pick a path.
  • If LB_A crashes, BGP withdraws its route. ECMP re-hashes to LB_B or LB_C. Convergence: <1 second.

The stateless LB requirement. When ECMP re-routes a flow to a different LB, that LB has no memory of the previous state. If the LB stored connection state (TLS session, HTTP/2 stream state) locally, the client must reconnect. Stateless LBs store no per-flow state — each connection is self-contained. This is why Maglev (Google’s distributed LB) uses consistent hashing of the 5-tuple to always map the same flow to the same LB machine, even as machines come and go.

Zone-aware routing

The problem: A client in zone A (us-east-1a) routing to a backend in zone B (us-east-1b) incurs:

  • Cross-zone egress cost: $0.01–0.02/GB in most clouds.
  • Extra latency: 1–5 ms intra-region RTT.

Zone-aware routing: Prefer backends in the same zone as the LB. Fall back to other zones only when all same-zone backends are unhealthy or circuit-breaker limits are hit.

AWS ALB zone-affinity: Enabled by default in newer AWS regions. Envoy: locality_weighted_lb_config with local-zone preference. GCP: uses zone-affinity mode by default when backends span zones.

Zone failure isolation. When zone A has a partial failure, zone-aware routing prevents it from cascading: traffic stays in zone A (or shifts to zone B/C only for zone-A traffic), so zone B/C are not suddenly absorbing 3× their normal load.

TLS termination at the LB

The LB terminates TLS: decrypts the client’s TLS session, sees plaintext, and (optionally) re-encrypts on the connection to the backend or sends plaintext over the internal network.

Benefits:

  • Backends do not need to manage certificates — one cert at the LB edge.
  • TLS handshake cost (~20–50 ms per new connection) borne once at the LB, not by every backend.
  • The LB can terminate TLS 1.2 from old clients and upgrade to TLS 1.3 on the backend-facing connection.

TLS 1.3 0-RTT resumption at the LB. If the client has a pre-shared key (PSK) from a prior session, the first request can be sent in the same flight as the ClientHello — zero extra round-trips. The LB must route the resumption request to the same LB instance that holds the session ticket, or the PSK must be stored in a distributed session cache shared by all LB instances.

Cost: ~20–50 ms per new connection, 50–2 000 ms under load spikes. TLS session reuse amortizes this over many requests.

Resilient LB architecture numbers
Anycast ECMP failover time
&lt;1 s (BGP withdrawal)
Cross-zone egress cost
$0.01–0.02/GB
TLS termination cost (new connection)
20–50 ms
TLS termination cost under load spike
50–2 000 ms
DNS TTL for geo-LB
60–300 s
L4 edge + L7 behind: Google's pattern
Maglev + Envoy

DNS load balancing vs LB routing

DNS round-robin: Return multiple A records for one hostname. Clients pick one. Simple, but:

  • DNS TTL is 60–300 seconds — backend changes are not reflected for up to 5 minutes.
  • Clients cache DNS results and defeat rebalancing.
  • No health awareness — DNS returns dead backends until TTL expires.

Correct pattern: DNS points to a single anycast VIP (one per region). The LB cluster behind the VIP handles per-request balancing. DNS provides geographic routing (return the nearest regional VIP); the LB provides per-request balancing within the region.

Observability: minimum viable metrics

Alert-worthy metrics for a load balancer cluster:

  1. Request rate per backend (RED method: Rate, Errors, Duration).
  2. p50/p95/p99 latency per backend — p99 shows tail latency that affects 1% of users.
  3. Error rate per backend — alert if > 0.01%.
  4. Active connection count per backend.
  5. Health-check success/failure rate — alert on flapping.
  6. Circuit-breaker opens/closes — one open per week is fine; 10/hour signals a problem.
  7. Retry rate — alert if > 0.1% of request rate (early storm warning).
  8. Load imbalance — std dev of request counts across backends; high imbalance signals algorithm or affinity issues.
  9. Drain time on shutdowns — long drain time (approaching timeout) signals long-running requests.

SLOs:

  • p99 latency < 100 ms for API endpoints.
  • Error rate < 0.01%.
  • Circuit-breaker open time < 1 minute/week.
Trace it
1/4

Trace zone-aware LB failover and anycast resilience.

1
Step 1 of 4
You have LBs in zones A, B, C. All advertise the same anycast VIP via BGP ECMP. A client in zone A initiates a request. Which LB handles it?
2
Locked
LB_A in zone A has backends in zones A, B, and C. Should it prefer zone-A backends for new requests?
3
Locked
LB_A crashes. BGP withdraws its route. ECMP re-hashes in-flight connections to LB_B or LB_C. What is the impact on existing TCP connections?
4
Locked
Backend B1 in zone A dies. Zone-A backends are now [B2, B3]. Should LB_A immediately fail over all new traffic to zone B?
Pick the best fit

A platform team is building a multi-region load balancer for a globally distributed SaaS service. Pick the topology.

Why this works

Google’s Maglev and the two-tier LB pattern. Google uses Maglev as a stateless L4 LB at the network edge. Maglev uses a consistent hash of the 5-tuple to route flows to backend Envoy instances (L7). This two-tier design separates concerns: Maglev absorbs packet-rate traffic cheaply and provides LB-level fault tolerance via anycast + consistent hashing. Envoy behind it does content routing, TLS, gRPC transcoding, and per-request observability. AWS mirrors this with Network Load Balancer (L4, anycast VIP) → Application Load Balancer (L7, HTTP routing).

Recall before you leave
  1. 01
    How does anycast + BGP ECMP eliminate the LB single point of failure, and what happens to in-flight connections when one LB crashes?
  2. 02
    Why does zone-aware routing matter economically, and when should it fail over to another zone?
  3. 03
    What is the minimum set of metrics needed to detect a retry storm before it causes an outage?
Recap

A single load balancer is a single point of failure. Anycast + BGP ECMP advertises the same VIP from multiple LBs; ECMP hashes flows across them and BGP withdraws a dead LB’s route in <1 second. Zone-aware routing keeps traffic in the same availability zone to avoid $0.01–0.02/GB egress costs and intra-region RTT overhead — only crossing zones when all same-zone backends are unhealthy. TLS terminates at the LB edge: one certificate, 20–50 ms handshake cost borne once rather than on every backend. The minimum observability set — request rate, p99 latency, error rate, retry rate, circuit-breaker opens — catches a retry storm at the 0.1% retry rate threshold before it escalates to cascade failure.

Connected lessons
appears again in258
Continue the climb ↑Proxy and load balancing: multiple-choice review
shortcuts expand
search
K
prev piece
k
next piece
j
cycle tier
t
this menu
?
sources3
expand
  1. 01
  2. 02
  3. 03

Trademarks belong to their respective owners. Editorial reference only.