Deployment & Infra DEP · 03 · 01

Kubernetes objects: the reconciliation loop behind every Pod, Service, and rollout

Kubernetes is declarative — you submit desired state and controllers reconcile actual toward it forever. The object hierarchy (Pod → ReplicaSet → Deployment) plus Services and probes is why a rollout heals itself, or 500s when you forget a readiness probe.

DEP Junior ◷ 17 min

Level

FoundationsJuniorMiddleSenior

A team ships a routine version bump. The rollout looks green in kubectl get pods — new pods are Running. But the on-call dashboard lights up: error rate spikes to 4% for ninety seconds on every deploy, then settles. It happens every single rollout. The cause is one missing field: the Deployment has no readiness probe. The instant a pod’s container process starts, Kubernetes marks it Ready and the Service begins routing live traffic to it — while the app is still loading config, warming a connection pool, and JIT-compiling. Requests land on a process that isn’t actually serving yet, and they 500.

Declarative, not imperative: the reconciliation loop

The single idea that makes Kubernetes coherent is this: you do not tell the cluster what to do, you tell it what you want to be true. You submit a desired state — “I want 5 replicas of this image” — and the cluster’s controllers run a loop forever, comparing actual state against desired state and taking corrective action to close the gap. This is the reconciliation loop, and it is why Kubernetes is self-healing. A node dies and three pods vanish? The ReplicaSet controller observes actual=2, desired=5, and creates three more. You never wrote “restart on failure” — you declared an invariant, and a controller enforces it.

This is the deepest reason you never kubectl run a bare Pod in production. A bare Pod is imperative: nothing owns it, nothing reconciles it. The node it lives on reboots and the Pod is gone — permanently, with no controller to recreate it. The same loop that heals your Deployment ignores the orphan Pod because no controller has it as a desired-state target.

▸Why this works

The loop must be idempotent: running it twice with the same desired state must not double anything. That is why controllers compare-then-act instead of just acting. A controller that “creates a pod when it sees a Deployment” would create infinite pods on every loop tick; a controller that “ensures pod count equals replicas” converges and then does nothing. Idempotence is what lets the loop run continuously without thrashing.

The workload hierarchy: Pod → ReplicaSet → Deployment

Why three separate objects instead of one? Because each failure scenario requires a different level of ownership — and understanding which layer owns which job tells you exactly where to look when a rollout goes wrong.

Three nested objects, each adding one capability:

Pod — the smallest deployable unit. One or more containers that share a network namespace (same IP, talk over localhost) and can share volumes. Pods are ephemeral: they get a new IP every time they restart, and they are designed to be thrown away and replaced, never repaired.
ReplicaSet — keeps exactly N identical pods alive. Its whole job is the count: it watches its pods and reconciles toward replicas: N. You almost never create one directly.
Deployment — manages ReplicaSets to enable rollouts and rollback. When you change the image, the Deployment creates a new ReplicaSet and scales it up while scaling the old one down, governed by maxSurge and maxUnavailable (both default 25%). The old ReplicaSet sticks around at scale 0, which is exactly how kubectl rollout undo works — it scales the previous ReplicaSet back up.

Together these three layers mean you never have to write “restart on failure” anywhere — you declare a count, and the hierarchy enforces it. Skip any layer and you lose the guarantee: a bare Pod has no ReplicaSet parent, so no one enforces the count, and a ReplicaSet with no Deployment has no rollback mechanism.

Object	Owns	The one job it adds
`Pod`	Containers	Run containers together (shared net + volumes); ephemeral
`ReplicaSet`	Pods	Keep exactly N copies alive (reconcile the count)
`Deployment`	ReplicaSets	Rollouts + rollback (swap ReplicaSets gradually)
`Service`	(selects Pods)	Stable IP + DNS over a changing set of pod IPs

Services and the label-selector glue

Pods are ephemeral and their IPs change constantly, so nothing should ever talk to a Pod IP directly. A Service gives you a stable virtual IP and a DNS name (my-svc.my-namespace.svc.cluster.local) that stays fixed while the pods behind it churn. The magic that wires a Service to its pods is labels and selectors: the Service declares selector: { app: web }, and Kubernetes maintains an Endpoints (or EndpointSlice) object listing the IPs of every Pod whose labels match. kube-proxy then load-balances traffic across those endpoints. Labels are the universal glue across Kubernetes — Deployments use the same mechanism to know which pods they own.

The three core Service types are a layered escalation of exposure:

ClusterIP (default) — a virtual IP reachable only inside the cluster. Internal service-to-service traffic.
NodePort — opens a static port (default range 30000–32767) on every node’s IP, forwarding to the ClusterIP. Crude external access; rarely the right answer for production HTTP.
LoadBalancer — provisions a cloud load balancer pointing at the NodePort. The traffic chain is: external client → cloud LB → node:NodePort → ClusterIP → Pod. One LB per Service, which gets expensive fast.
Ingress — not a Service type but an L7 HTTP router in front of Services. One LoadBalancer feeds an Ingress controller (NGINX, Traefik), which routes by host and path to many ClusterIP Services. This is how you expose dozens of apps behind a single external IP and TLS cert.

ConfigMaps, Secrets, and the production failure

Config and credentials live in their own objects so you don’t bake them into the image: ConfigMap for non-sensitive config, Secret for credentials (base64-encoded, and not encrypted at rest unless you enable encryption). Both mount as env vars or files. One sharp edge: changing a ConfigMap does not trigger a rollout — pods keep the old values until they restart, so teams hash the config into a pod annotation to force a new ReplicaSet on change.

Now the failure from the Hook, mechanically. A Service routes to a pod the moment that pod is Ready. Without a readiness probe, “ready” means only “the container process started” — not “the app can serve requests.” So during every rollout, the Service adds the new pod to its endpoints while the app is still warming up, and a slice of traffic 500s until it finishes. The fix is a readiness probe (an HTTP GET /healthz, a TCP check, or an exec) that returns success only when the app is genuinely serving. A failing readiness probe pulls the pod out of the Service endpoints — no traffic until it passes. This is distinct from a liveness probe, which restarts a pod that has wedged. The classic outage: using a liveness probe where you needed readiness, so a slow-starting app gets killed and restarted in a loop instead of just being held out of rotation. Probe defaults are aggressive — periodSeconds: 10, timeoutSeconds: 1, failureThreshold: 3 — and a slow /healthz under load will flap; for slow boots use a startupProbe to hold the others off.

Pick the best fit

A stateless HTTP API takes ~20s to warm (config load + connection pool + cache prime) before it can serve. You need zero-error rollouts. What do you configure?

Quiz

A node reboots and takes three of your pods with it. Your Deployment requested 5 replicas. What happens, and why?

Quiz

How does a Service know which pods to send traffic to?

Order the steps

Order what happens during a Deployment image change (a rolling update):

1 You apply the new image; the Deployment records a new desired state
2 The Deployment creates a new ReplicaSet for the new image
3 New RS scales up and old RS scales down, bounded by maxSurge / maxUnavailable
4 Each new pod passes its readiness probe before the Service routes traffic to it
5 Old ReplicaSet reaches scale 0 but is kept, so rollout undo can scale it back up

Deployment owns ReplicaSets — rollout + rollback

ReplicaSet owns Pods — keeps exactly N alive

Pod owns containers — ephemeral, new IP each restart

Containers shared network + volumes

Each layer adds one job and reconciles the layer below: the Deployment swaps ReplicaSets to roll out, the ReplicaSet keeps the Pod count at N, the Pod runs the containers. A bare, unowned Pod has no controller above it, so nothing recreates it when its node dies.

Recall before you leave

01
Explain why a Deployment self-heals after a node failure but a bare Pod does not.
02
Walk through exactly how a missing readiness probe causes 500s during a rollout, and how the probe fixes it.

Recap

Kubernetes is declarative: you submit a desired state and controllers run a reconciliation loop forever, comparing actual against desired and acting to close the gap — which is what makes the cluster self-healing and why you never run a bare, unowned Pod in production. The workload hierarchy stacks one capability per layer: a Pod runs containers that share network and volumes but is ephemeral; a ReplicaSet keeps exactly N pods alive by reconciling the count; a Deployment manages ReplicaSets to roll out and roll back, swapping them gradually under maxSurge and maxUnavailable. Services give a stable virtual IP and DNS over the churning pod IPs, wired by label selectors into an Endpoints list, escalating from ClusterIP to NodePort to LoadBalancer, with Ingress doing L7 routing in front. ConfigMaps and Secrets externalize config and credentials. And the production lesson that ties it together: a Service routes to a pod the instant it’s Ready, so without a readiness probe every rollout sends traffic to apps that have started but can’t yet serve — and they 500 until they warm up. Now when you see a rollout spike error rate for ninety seconds on every deploy, your first question is: does this Deployment have a readiness probe — and does it actually gate traffic, or is it missing?

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 5 done

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.

Apply this

Put this lesson to work on a real build.

URL shortener at scaleBuild a URL shortener that survives real traffic — then run it: deploy it, watch it, and work the incident when one hot link melts your cache.