Backend Architecture BE · 07 · 03

The deregistration race: stop routing before you stop accepting

The subtle bug is a race: SIGTERM and endpoint removal fire in parallel, but propagation delay means the load balancer keeps routing to a pod that has already started shutting down. The fix is to fail readiness first and wait out the delay before closing the listener.

BE Middle ◷ 16 min

Level

FoundationsJuniorMiddleSenior

You did everything right: a real SIGTERM handler, the app is PID 1, the handler stops accepting new connections the instant the signal lands and drains the in-flight ones. Deploy, and you still see a burst of connection-refused errors — but only for a second or two, only at the start of each rollout. The cause is a race that almost nobody guesses on the first try. When the pod starts terminating, two things happen in parallel: the orchestrator sends SIGTERM to your process, and it tells the rest of the cluster to stop routing traffic to you. The second part is not instant — it propagates through the control plane, kube-proxy on every node, and the load balancer, taking anywhere from under a second on a small cluster to tens of seconds on a big one. Your process, obeying SIGTERM, closes its listener immediately. So for the propagation window there is a load balancer still confidently sending requests to a pod that has already shut its door — and every one of those gets refused. You stopped accepting before the world stopped routing.

Two clocks, started together, ending apart

The mental model that breaks here is “the load balancer knows the moment I get SIGTERM.” It does not. Endpoint removal and signal delivery are independent, concurrent actions kicked off by the same event:

The signal path is fast and local: the kubelet on your node sends SIGTERM to your container in milliseconds.
The routing path is slow and distributed: the API server marks the endpoint not-ready, that change propagates to every node’s kube-proxy (or to an external load balancer, or to an ingress controller that may be polling on an interval), and only then do the routing rules actually stop forwarding to you.

Because the routing path is eventually consistent, there is a window — measured from “SIGTERM arrives” to “the last router stops sending you traffic” — during which the cluster still believes you are a valid backend. Reported numbers: under a second on small clusters, but 10–30 seconds on large clusters or with polling ingress controllers. If your handler closes the listening socket the instant SIGTERM lands, every request that arrives in that window hits a closed port and the client gets a connection refused — the inverse of the lesson-one problem. There you exited too early and severed in-flight work; here you stop accepting too early and reject work the LB is still sending.

The signal clock finishes in milliseconds while the routing clock runs up to 10–30s — that gap is the race window where a closed listener refuses traffic the LB is still sending.

The fix: change the order, don’t just add a handler

The cure is to make routing stop before you stop accepting — to reverse the order of the two clocks. There are two complementary levers:

Fail the readiness probe first. Readiness is the signal the orchestrator uses to decide whether to route to you. The moment you begin shutting down, flip your readiness endpoint to failing (or unready). That starts the deregistration clock through the normal mechanism — but it does not finish it, because propagation still takes time.
Add a preStop sleep to cover propagation. Because the preStop hook runs before SIGTERM and the orchestrator blocks on it, a preStop that simply sleeps for a few seconds (commonly 5–15s, sized to your cluster’s real propagation delay) holds the process open and accepting while the not-ready status spreads. Only after the sleep does SIGTERM arrive and your handler close the listener — by which point the routers have caught up and are no longer sending you new traffic.

The principle: keep serving until you are confident nothing is still being routed to you, then stop accepting, then drain. A handler alone is not enough; the ordering relative to deregistration is the whole point.

Don’t break readiness the wrong way

A common own-goal: people make the liveness probe and the readiness probe share an endpoint, or they let the SIGTERM handler immediately return errors from the health endpoint. If the liveness probe starts failing during shutdown, the orchestrator may decide the container is broken and kill or restart it, cutting your drain short. Keep liveness passing (the process is alive and draining is healthy behavior) and fail only readiness (do not send me new traffic). The two probes answer different questions: liveness asks “should I restart you?”, readiness asks “should I route to you?” — and during shutdown the honest answers are no and no new traffic, respectively.

▸Why this works

Why can’t the platform just make endpoint removal synchronous with the signal, so there is no window to engineer around? Because routing in a distributed cluster is not a single switch you can flip atomically — it is replicated state spread across many independent components, and keeping replicated state perfectly consistent on every change is exactly the expensive coordination that distributed systems avoid for throughput. When a pod goes not-ready, that fact has to reach the API server, get written to the endpoints object, be observed by every node’s kube-proxy (which then rewrites local iptables or IPVS rules), and separately reach any external load balancer or ingress controller, some of which discover changes by polling on their own schedule rather than being pushed. Each of those hops is independently fast but collectively asynchronous, and there is no global clock that says “everyone has updated, now release the signal.” Making it synchronous would mean blocking every termination on the slowest router in the fleet acknowledging the change — coupling pod shutdown to cluster-wide consensus, which would make deploys crawl and would itself fail whenever any router was slow or unreachable. So the platform chooses eventual consistency and hands you the tools to bridge the gap: readiness to start the clock and preStop to wait it out. The deep lesson is the same one the circuit-breaker unit kept hitting — in a distributed system you cannot assume two events triggered together are observed together, and any correctness that depends on their ordering has to be enforced deliberately, not assumed.

Moment	Your process	The load balancer	Result
No guard, SIGTERM lands	Closes listener instantly	Still routing (not yet propagated)	Connection refused for the window
Fail readiness first	Keeps accepting	Starts marking you not-ready	Clock started, not yet finished
preStop sleeps 5–15s	Keeps accepting	Propagation completes	Routers stop sending new traffic
SIGTERM after sleep	Now closes listener	No longer routing to you	No refused connections

Quiz

A service with a correct SIGTERM handler that closes its listener immediately still sees a brief burst of connection-refused errors at the start of each deploy. Why?

Quiz

During shutdown, why should you fail the readiness probe but keep the liveness probe passing?

Without the preStop sleep, the LB routes requests to a pod that has already closed its listener — connection refused for the propagation window.

key takeaway

The subtle shutdown bug is a race between two clocks started by the same event but ending apart. Signal delivery is fast and local — the kubelet sends SIGTERM in milliseconds — while endpoint deregistration is slow and distributed: the not-ready status must propagate through the API server, every node’s kube-proxy, and any external or polling load balancer, an eventually-consistent path taking under a second on small clusters but 10–30s on large ones. A handler that closes the listener the instant SIGTERM lands therefore refuses every request the load balancer is still routing during that window — connection-refused errors, the inverse of exiting too early. The fix is ordering, not just a handler: fail the readiness probe first to start the deregistration clock through the normal mechanism, and add a preStop sleep (commonly 5–15s, sized to real propagation delay) that holds the process open and accepting until routing has caught up, after which SIGTERM closes the listener safely. Keep the liveness probe passing throughout — readiness controls routing, liveness controls restarts, and a failing liveness probe can make the orchestrator kill the container and cut the drain short.

Recall before you leave

01
What is the deregistration race and why does it cause connection-refused errors?
02
How do you fix the race, and why fail readiness but not liveness?

Recap

Even a perfect SIGTERM handler refuses connections if it closes the listener too soon, because termination starts two clocks at once that finish apart: signal delivery is a fast local path of milliseconds, while endpoint deregistration is a slow distributed path through the API server, every node’s kube-proxy, and any external or polling load balancer — eventually consistent, taking under a second on small clusters but 10–30s on large ones. During that window the load balancer still routes to a pod whose door is already shut, producing connection-refused errors, the inverse of lesson one’s sever-too-early. The fix is ordering: fail the readiness probe first to start the deregistration clock, and add a preStop sleep of roughly 5–15s, sized to real propagation, that keeps the process accepting until routing has caught up — only then does SIGTERM close the listener. Keep liveness passing so the orchestrator does not mistake a healthy drain for a broken container and restart it mid-shutdown. With routing safely drained and the listener finally closed, the next lesson handles what comes after the door shuts: draining the in-flight requests and closing every resource in the right order. Now when you see connection-refused errors at the very start of a deploy — not mid-drain, but on the first seconds — you’ll know the race, and you’ll reach for a preStop sleep before anything else.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 5 done

Connected lessons

builds on

Signals and the grace period: SIGTERM, SIGKILL, and PID 1middle

unlocks

Draining and shutdown order: reverse the dependency graphmiddle

deepens into

Draining and shutdown order: reverse the dependency graphmiddle

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.