Deployment & Infra DEP · 03 · 10

K8s objects: ship a self-healing, zero-downtime service

Hands-on project — deploy a slow-warming HTTP service to Kubernetes with correct probes, externalized config, and resource bounds, then prove zero-error rollouts and self-healing with evidence.

DEP Senior ◷ 210 min

Level

FoundationsJuniorMiddleSenior

Reading about reconciliation and readiness probes is not the same as watching a rollout stay green while a deliberately slow app warms up behind it. Build a real service, ship it to a local cluster with the right objects, and prove — with captured output — that it self-heals on node loss and rolls out without dropping a single request.

Goal

Turn the unit’s mental model into a working manifest set: a Deployment with correct probes, a Service wired by label selector, externalized config via ConfigMap/Secret, and right-sized resource requests and limits — then demonstrate self-healing and zero-error rollouts under load with evidence at each step.

Project

0 of 7

Objective

Deploy a deliberately slow-warming HTTP service (~20-30s boot) to a local Kubernetes cluster (kind, minikube, or k3d) using only declarative manifests, and prove it self-heals after pod loss and rolls out a new image with zero failed requests under sustained load.

Requirements

Acceptance criteria

All resources are created from declarative YAML applied with kubectl apply -f — no kubectl run, no imperative edits. The manifests are committed.
A captured rollout under load showing 0 failed requests, plus the same load test run with the readiness probe removed showing non-zero 5xx — proving the probe is what gates traffic.
kubectl get endpoints output showing the Service tracks exactly the matching pods, and a demonstration that scaling the Deployment updates the endpoint list automatically.
Evidence of self-healing: a deleted/evicted pod recreated by the ReplicaSet back to the desired replica count, with the controller event or pod-age delta captured.
A short write-up: which QoS class the pod lands in and why, what happens at the CPU limit vs the memory limit, and how the ConfigMap-hash annotation forces a roll on config change.

Senior stretch

Add a PodDisruptionBudget and a multi-node kind cluster, then drain a node and show the rollout/eviction respects minAvailable while the Service stays serving.
Add an Ingress in front of the ClusterIP Service (ingress-nginx) and route by host/path, so one external entry point fronts the workload instead of a per-Service LoadBalancer.
Trigger and document a CrashLoopBackOff and an OOMKill on purpose (a too-low memory limit), read the exit reason from kubectl describe, and fix each by tuning the right knob.
Add a second Deployment whose pods accidentally share the Service's label selector, observe traffic mixing in the endpoints, then fix it with a more specific selector — making the ownership-vs-selection distinction concrete.

Recap

This is the loop you run for every real service: declare the objects in YAML, let controllers reconcile and self-heal, gate traffic with a readiness probe (and cover slow boots with startup), wire the Service by label with no hardcoded IPs, externalize config and force rolls by hashing it into an annotation, and bound the pod with requests and limits while knowing CPU throttles and memory OOMKills. Doing it once on a toy slow-warming service makes the production manifest review reflexive — you will spot the missing readiness probe and the absent memory limit on sight.

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.