Networking & Protocols NET · 09 · 02

Balancing algorithms: round-robin to power-of-two-choices

Round-robin ignores real load; least-connections causes herding; power-of-two-choices is O(1) and ~12% imbalance — the modern default and why.

NET Middle ◷ 12 min

Level

FoundationsJuniorMiddleSenior

You have 4 backend servers. Requests should be distributed evenly. Round-robin does exactly that — until one server slows down and still gets its turn. Every choice of algorithm has a cost. The wrong one turns a slow backend into an avalanche.

The LB maintains a rotating pointer over the backend list: B1, B2, B3, B4, B1, B2, …. Each request advances the pointer one step. Why does this matter to you? Because the algorithm you ship determines which failures you will debug at 2 a.m. — blind request-count distribution or synchronized herding.

Properties:

O(1) per routing decision.
Distributes request count evenly, not actual load.
A 1 ms API call and a 100 ms file upload both count as “one request.”

When it fails: If one backend is slower (GC pause, heavier queries), it accumulates open connections faster than round-robin can detect. The LB keeps sending it its fair share of requests; the slow backend falls further behind.

Weighted round-robin assigns each backend a weight proportional to capacity. B1 (weight 2) receives twice as many requests as B2 (weight 1). Weights must be set manually and do not adapt to real-time load changes.

Least-connections: adapts but herds

Route each new request to the backend with the fewest active connections at that instant. This directly accounts for slow backends: a backend processing long requests accumulates more connections and receives fewer new ones.

The herding problem. Suppose B1 gets overloaded, accumulates 1 000 connections, and the LB stops routing to it. B1’s GC pause ends. Its counter drops to 10. At that exact instant, every client perceiving the updated counter simultaneously routes to B1 — a new spike. The synchronized burst can overload B1 again within milliseconds. The herd has stampeded.

Least-connections also requires an O(log N) min-heap scan per request for precise tracking, or O(1) approximate tracking with a single “last minimum” pointer.

Power-of-two-choices: the modern default

Algorithm (per request):

Pick two backends uniformly at random.
Route to the one with fewer active connections.

That’s it. Two comparisons, zero global scan.

Why it beats both alternatives:

Pure random → ~100% load imbalance (some backends get 2× the average).
Least-connections → O(N) scan + synchronized herding.
Power-of-two-choices → ~12% imbalance (proven by Adler 1996), O(1) per request, random-noise-desynchronized herding.

The randomness prevents all clients from noticing a backend’s recovery at the same instant. The comparison prevents the worst-case assignment. Together: near-optimal at constant cost.

Power-of-two-choices collapses pure random's ~100% imbalance to ~12% — nearly matching least-connections' even spread, but at O(1) and without the synchronized herd.

Nginx, Envoy, HAProxy, and AWS ALB all default to this algorithm or a variant of it.

Balancing algorithm comparison

Round-robin decision cost: O(1)
Least-connections decision cost: O(log N)
Power-of-two-choices decision cost: O(1)
Pure-random load imbalance: ~100%
Power-of-two-choices imbalance: ~12%
Least-connections imbalance: ~0% (but herds)

Tracing power-of-two-choices over 1 000 requests

1/3

1. Request 1: pick B2 (8 conns) and B4 (12 conns) at random → route to B2 (fewer). 2. B2 now has 9 conns, B4 has 12. 3. Request 2: pick B1 (11 conns) and B3 (11 conns) → tied, pick B1 (tiebreaker: lower ID). 4. Request 3: pick B2 (9 conns) and B4 (12 conns) → route to B2. 5. After 1 000 requests: all four backends cluster around 6–7 active connections each, ~12% imbalance. 6. Compare: pure random would leave one backend at ~14 conns while another sits at ~3.

Quiz

Why does power-of-two-choices prevent the 'herding' problem that pure least-connections exhibits?

Order the steps

Order the backends by connection count and trace the power-of-two-choices winner:

1 Pick backend A (2 connections) and backend D (10 connections) at random.
2 Since A has fewer connections, route the request to A.
3 A now has 3 connections, D still has 10.
4 Next request: pick backend B (5 connections) and backend C (6 connections).
5 Route to B (fewer).
6 After many requests, all backends cluster around the same load (≈6–7 connections each), ≈12% imbalance.

▸Edge cases

Weighted power-of-two-choices. When backends have different capacity (one has 4 cores, another has 16), pure power-of-two-choices distributes connections equally rather than proportionally to capacity. Weight the random selection probability by backend capacity: a 4× heavier backend is 4× more likely to be one of the two candidates. This preserves the O(1) property while adapting to heterogeneous fleets.

Quiz

Round-robin distributes requests evenly by count. Why does this fail when request cost varies widely?

Two random samples (B2 and B4), one comparison: route to B2 because it has fewer active connections. No global scan of all four backends, and the randomness means different requests sample different pairs — so a recovering backend is not stampeded all at once.

Recall before you leave

01
What is the herding problem with least-connections, and why does power-of-two-choices avoid it?
02
What load imbalance does power-of-two-choices achieve, and why is that better than pure random?
03
When would you prefer weighted round-robin over power-of-two-choices?

Recap

Three families of balancing algorithms each make a different tradeoff. Round-robin is O(1) but distributes request count, not actual load — a slow backend drowns in its fair share. Least-connections routes to the backend with the fewest open connections and adapts to real load, but the global minimum update creates a synchronized herding burst when a backend recovers. Power-of-two-choices samples exactly two backends at random and routes to the less loaded one: O(1), ~12% imbalance (vs ~100% for pure random), and randomness desynchronizes the herd. Nginx, Envoy, and AWS ALB default to power-of-two-choices or a variant for exactly these properties. Now when you see a slow backend dragging down a service, the first question to ask is: which algorithm is the LB using — and does it account for connection duration, or just count?

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 7 done

Connected lessons

deepens into

appears again in165

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.