awesome-everything RU
↑ Back to the climb

Networking & Protocols

Balancing algorithms: round-robin to power-of-two-choices

Crux Round-robin ignores real load; least-connections causes herding; power-of-two-choices is O(1) and ~12% imbalance — the modern default and why.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at middle altitude — in the sky
◷ 12 min

You have 4 backend servers. Requests should be distributed evenly. Round-robin does exactly that — until one server slows down and still gets its turn. Every choice of algorithm has a cost. The wrong one turns a slow backend into an avalanche.

Round-robin: simple but load-blind

The LB maintains a rotating pointer over the backend list: B1, B2, B3, B4, B1, B2, …. Each request advances the pointer one step.

Properties:

  • O(1) per routing decision.
  • Distributes request count evenly, not actual load.
  • A 1 ms API call and a 100 ms file upload both count as “one request.”

When it fails: If one backend is slower (GC pause, heavier queries), it accumulates open connections faster than round-robin can detect. The LB keeps sending it its fair share of requests; the slow backend falls further behind.

Weighted round-robin assigns each backend a weight proportional to capacity. B1 (weight 2) receives twice as many requests as B2 (weight 1). Weights must be set manually and do not adapt to real-time load changes.

Least-connections: adapts but herds

Route each new request to the backend with the fewest active connections at that instant. This directly accounts for slow backends: a backend processing long requests accumulates more connections and receives fewer new ones.

The herding problem. Suppose B1 gets overloaded, accumulates 1 000 connections, and the LB stops routing to it. B1’s GC pause ends. Its counter drops to 10. At that exact instant, every client perceiving the updated counter simultaneously routes to B1 — a new spike. The synchronized burst can overload B1 again within milliseconds. The herd has stampeded.

Least-connections also requires an O(log N) min-heap scan per request for precise tracking, or O(1) approximate tracking with a single “last minimum” pointer.

Power-of-two-choices: the modern default

Algorithm (per request):

  1. Pick two backends uniformly at random.
  2. Route to the one with fewer active connections.

That’s it. Two comparisons, zero global scan.

Why it beats both alternatives:

  • Pure random → ~100% load imbalance (some backends get 2× the average).
  • Least-connections → O(N) scan + synchronized herding.
  • Power-of-two-choices → ~12% imbalance (proven by Adler 1996), O(1) per request, random-noise-desynchronized herding.

The randomness prevents all clients from noticing a backend’s recovery at the same instant. The comparison prevents the worst-case assignment. Together: near-optimal at constant cost.

Nginx, Envoy, HAProxy, and AWS ALB all default to this algorithm or a variant of it.

Balancing algorithm comparison
Round-robin decision cost
O(1)
Least-connections decision cost
O(log N)
Power-of-two-choices decision cost
O(1)
Pure-random load imbalance
~100%
Power-of-two-choices imbalance
~12%
Least-connections imbalance
~0% (but herds)

Tracing power-of-two-choices over 1 000 requests

1/3
1. Request 1: pick B2 (8 conns) and B4 (12 conns) at random → route to B2 (fewer). 2. B2 now has 9 conns, B4 has 12. 3. Request 2: pick B1 (11 conns) and B3 (11 conns) → tied, pick B1 (tiebreaker: lower ID). 4. Request 3: pick B2 (9 conns) and B4 (12 conns) → route to B2. 5. After 1 000 requests: all four backends cluster around 6–7 active connections each, ~12% imbalance. 6. Compare: pure random would leave one backend at ~14 conns while another sits at ~3.
Quiz

Why does power-of-two-choices prevent the 'herding' problem that pure least-connections exhibits?

Order the steps

Order the backends by connection count and trace the power-of-two-choices winner:

  1. 1 Pick backend A (2 connections) and backend D (10 connections) at random.
  2. 2 Since A has fewer connections, route the request to A.
  3. 3 A now has 3 connections, D still has 10.
  4. 4 Next request: pick backend B (5 connections) and backend C (6 connections).
  5. 5 Route to B (fewer).
  6. 6 After many requests, all backends cluster around the same load (≈6–7 connections each), ≈12% imbalance.
Edge cases

Weighted power-of-two-choices. When backends have different capacity (one has 4 cores, another has 16), pure power-of-two-choices distributes connections equally rather than proportionally to capacity. Weight the random selection probability by backend capacity: a 4× heavier backend is 4× more likely to be one of the two candidates. This preserves the O(1) property while adapting to heterogeneous fleets.

Quiz

Round-robin distributes requests evenly by count. Why does this fail when request cost varies widely?

Recall before you leave
  1. 01
    What is the herding problem with least-connections, and why does power-of-two-choices avoid it?
  2. 02
    What load imbalance does power-of-two-choices achieve, and why is that better than pure random?
  3. 03
    When would you prefer weighted round-robin over power-of-two-choices?
Recap

Three families of balancing algorithms each make a different tradeoff. Round-robin is O(1) but distributes request count, not actual load — a slow backend drowns in its fair share. Least-connections routes to the backend with the fewest open connections and adapts to real load, but the global minimum update creates a synchronized herding burst when a backend recovers. Power-of-two-choices samples exactly two backends at random and routes to the less loaded one: O(1), ~12% imbalance (vs ~100% for pure random), and randomness desynchronizes the herd. Nginx, Envoy, and AWS ALB default to power-of-two-choices or a variant for exactly these properties.

Connected lessons
appears again in162
Continue the climb ↑L4 vs L7 load balancing and client-IP preservation
shortcuts expand
search
K
prev piece
k
next piece
j
cycle tier
t
this menu
?
sources3
expand
  1. 01
  2. 02
  3. 03

Trademarks belong to their respective owners. Editorial reference only.