Networking & Protocols NET · 09 · 05

Session affinity, consistent hashing, and the right fix

Sticky sessions pin clients to one backend and cause 1.5–2.5x load imbalance and session loss on failure; consistent hashing maps keys to backends with minimal remapping on membership changes; the right fix is externalizing session state to Redis.

NET Middle ◷ 12 min

Level

FoundationsJuniorMiddleSenior

A user logs in. Their session lives on backend B2. The next request routes to B3. Session not found. They are logged out mid-flow. This is what happens when your application stores state on a single backend and you rely on session affinity to hide it — until affinity breaks.

Why sticky sessions exist

Some applications store session state on the backend — in memory, or in a local file. If the next request routes to a different backend, the state is missing: the user is logged out, the shopping cart is empty, the upload context is gone.

Sticky sessions (session affinity) make the LB route all requests from one client to the same backend. The client’s requests are “pinned” to one server.

The LB sets a cookie on the first response:

Set-Cookie: LB_ROUTE=backend_2; Path=/

On every subsequent request, the browser sends this cookie. The LB reads it and routes to backend_2.

Advantages:

Survives client-IP changes (mobile handover, NAT failover).
Allows graceful failover: if backend_2 is unhealthy, the LB can pick a new backend and update the cookie.

Disadvantage: Cookie-based affinity requires HTTP. Non-HTTP protocols cannot use it.

IP-hash affinity

Route by Hash(client_IP) % num_backends. No cookie needed — works for any protocol.

Disadvantages:

Breaks when the client IP changes (mobile handover, VPN reconnect).
Clusters badly: 50 clients behind one corporate NAT all hash to the same backend — that backend gets 50× the load of others.

Session affinity costs and options

Load imbalance with sticky sessions: 1.5–2.5x worse
Session loss on pinned backend crash: 100% for that session
Cookie-based: survives IP change: yes
IP-hash: survives IP change: no
IP-hash with 50 clients behind NAT: all 50 hit one backend
Right fix: session TTL in Redis: any backend can resume

The right fix: externalized session state

Store session data in a distributed cache (Redis, Memcached, DynamoDB) keyed by session ID. Every backend reads from and writes to this store.

Now the LB can route requests freely using power-of-two-choices. No affinity needed. No pinning. Session data survives backend failures because it is in Redis, not in memory on one backend.

Externalizing session state to Redis turns three liabilities of affinity — imbalance, the session SPOF, and constrained routing — into non-issues.

This is the pattern used at Netflix, Airbnb, Shopify, and any service that scales horizontally.

Consistent hashing: for cache locality, not session state

Session affinity is about user state. Consistent hashing is about cache locality: routing a cache key to the same backend so you get a cache hit instead of a miss. If you have ever wondered why a distributed cache seems to re-warm itself every time you add a node, consistent hashing — and whether you are using it — is the answer.

Problem with round-robin for caching:

Request user:42 → B1 (cached there).
Next request user:42 → B2 (cache miss, fetches from DB).
No locality, no benefit from caching.

Consistent hashing maps each key to a point on a hash ring. Backends are also mapped to points (with many virtual nodes per backend to spread them evenly, ~150–300). A key routes to the nearest backend clockwise on the ring.

Key property: When a backend joins or leaves, only ~1/N keys remap. All other keys stay on the same backend. This minimizes cache disruption on topology changes.

Lookup cost: O(log N) with a sorted tree (binary search over virtual node positions).

Use consistent hashing for:

Distributed caches (Memcached shards, Redis cluster).
Database sharding (route by shard key).
Sticky request routing for cache locality (media encoding jobs, analytics aggregations).

Do not use consistent hashing for general request balancing — power-of-two-choices adapts to real-time load, consistent hashing does not.

Rendezvous hashing (highest-random-weight)

Alternative to ring-based consistent hashing:

For each backend, compute Hash(key, backend_id).
Route to the backend with the highest hash value.

No ring, no virtual nodes — simpler to implement. O(N) per lookup but for small N (<100 backends) the difference is negligible. Hash distribution is often more uniform than ring-based consistent hashing in practice. Some CDNs and Facebook’s TAO use rendezvous hashing for sharding.

▸Why this works

Why virtual nodes matter. Without virtual nodes each backend occupies one arc on the ring. With 4 backends (A, B, C, D) placed at 0, 90, 180, 270 degrees, the arcs are perfectly even — but that is the ideal case. In practice, hash(backend_id) clusters backends unevenly. Virtual nodes map each backend to 150–300 positions on the ring, spreading it into 150–300 small arcs. This averages out the distribution so no single backend claims a disproportionately large arc.

Order the steps

Order the steps of session affinity failover (cookie-based) showing why it fails without Redis:

1 Client sends a request with cookie LB_ROUTE=backend_2.
2 LB reads the cookie and routes to backend_2.
3 Backend_2 crashes or becomes unhealthy.
4 LB stops routing new requests to backend_2 but cannot migrate the existing session.
5 Client's session is lost because only backend_2 held it in memory.
6 Client retries; LB picks a new backend, but the session data is gone.
7 Right fix: store session in Redis, accessible from any backend, so failover is transparent.

Quiz

What is the main drawback of IP-hash session affinity compared to cookie-based affinity?

Quiz

When a backend is removed from a consistent-hash ring, what fraction of keys must remap to a new backend?

The cookie pins the client to one backend. That gives cache and session locality, but if B2 crashes the session held only in B2's memory is gone — which is why the durable fix is to externalize session state to Redis so any backend can resume.

Recall before you leave

01
Why is session affinity considered an anti-pattern, and what is the correct fix?
02
What problem does consistent hashing solve that round-robin cannot?
03
What are virtual nodes in consistent hashing and why are they necessary?

Recap

Session affinity routes each client to one backend using a cookie (LB_ROUTE=backend_2) or IP hash. Cookie-based affinity survives IP changes; IP-hash breaks on IP changes and clusters badly behind NAT. Both cause 1.5–2.5× worse load imbalance and lose sessions on backend failure — they are workarounds, not solutions. The correct architecture: externalize session state to Redis so any backend can resume any session and the LB can route freely. Consistent hashing is a different tool for cache locality: it maps a key to the same backend with ~1/N remapping on membership change, using virtual nodes (150–300 per backend) to even out the ring distribution. Use consistent hashing for caches and sharding; use power-of-two-choices for live request balancing. Now when you inherit a service that logs out users on deploy, the first question to ask is: where does the session live — and is Redis already in the stack?

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 5 done

Connected lessons

deepens into

Resilient LB architecture: anycast, zone-aware routing, and observabilitysenior

appears again in178

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.