Deployment & Infra
Compose vs Kubernetes: choosing the right orchestration weight
A team of three ships an API and reaches for a managed Kubernetes cluster on day one, “to be ready to scale.” Six months later they have one node, eleven services nobody can keep straight, and a YAML directory bigger than the app. A bad node drain takes the whole product down for forty minutes because nobody understood pod disruption budgets. They never had a scaling problem — they imported a future one, and paid the on-call tax with no traffic to justify it.
Two tools, two jobs
Docker Compose and Kubernetes both “run containers,” and that shared verb hides how different they are. Compose is a single-host process manager: one docker-compose.yml declares your services, networks, and volumes, and docker compose up starts them all on this machine. It is fast to learn, fast to start, and trivial to debug — the whole stack is one file and docker compose logs. Its overhead is tiny, on the order of ~50MB, because there is no cluster to run.
Kubernetes is a different animal: a distributed control plane that runs across machines. You don’t start containers; you declare a desired state (3 replicas of this image, behind this service, with these health checks) and the control plane works continuously to make reality match. That continuous matching is the whole product, and it costs you a real control plane — roughly ~2GB of resident overhead before your app runs a single container.
The reconciliation loop is what you’re paying for
The single mechanism that separates Kubernetes from Compose is the reconciliation loop. Controllers watch the cluster, compare actual state against the desired state stored in etcd, and close the gap — continuously, event-driven, forever. A ReplicaSet controller’s only job is “keep N pods running”: kill a pod and it notices and replaces it; lose a whole node and the scheduler reschedules its pods onto healthy nodes. That is self-healing, and Compose simply does not have it. Compose’s restart: always restarts a container on the same host; if the host dies, everything on it dies with it.
The same loop powers rolling updates with health gating. A Kubernetes Deployment brings up new pods, waits for their readiness probe to pass, shifts traffic to them, and only then tears down the old ones — so a bad release never serves traffic, and you get zero-downtime rollout and automatic rollback for free. Compose restarts a service in place; for the window between stop and healthy-start, that service is down. There is no health gate and no automatic rollback.
| Capability | Docker Compose | Kubernetes |
|---|---|---|
| Scheduling scope | Single host only | Many nodes, the scheduler places pods |
| Node failure | Everything on that host dies | Self-heals — pods rescheduled elsewhere |
| Deploys | Restart in place; brief downtime | Rolling update gated on readiness probe |
| Autoscaling | None (manual, vertical only) | Horizontal autoscaling on metrics |
| Overhead | ~50MB, no control plane | ~2GB control plane before your app |
| Learning curve | Minutes; one YAML file | Weeks; pods, services, CNI, RBAC, YAML sprawl |
The complexity tax, and who actually pays it
Everything Kubernetes gives you arrives bundled with a tax: a control plane to keep alive, a cluster network (CNI) to understand, RBAC, ingress, secrets management, and the famous YAML sprawl. Surveys consistently put operational complexity as the top Kubernetes pain point — around 70% of users name it — and clusters routinely sit at 30–50% utilization, so a large share of what you provision is waste. For a small team the dollar figure is brutal: a credible production setup runs into the low five figures per month, and the real cost is the engineer-hours that go to keeping the cluster healthy instead of shipping features.
The senior tradeoff is therefore not “which is better” but capability vs complexity tax, weighed against your actual scale. Kubernetes’ capabilities only pay off when you genuinely need multi-node scheduling, zero-downtime rollout under real traffic, or autoscaling. Below that, you are paying for self-healing across machines you don’t have, for a scheduler placing pods on a single node. The classic failure — the team of three from the Hook — is buying the tax before the scale exists to justify it.
Why this works
“We’ll need it to scale eventually” is the line that buys Kubernetes too early. The honest counter: a single beefy host with Compose can serve a surprising amount of traffic, and migrating to Kubernetes later is a known, bounded project. Importing its complexity now is a permanent tax with an uncertain payoff date. Start at the weight your traffic earns; upgrade when a real ceiling — not a hypothetical one — is in sight.
Where the line actually is — and the middle ground
There are honest signals that you’ve outgrown Compose. The first is the single-host ceiling: Compose scales vertically (a bigger box) or by same-host replicas, and a single machine has a hard limit — when you need to spread load across machines, Compose can’t. The second is availability: when a single host going down is no longer acceptable, you need scheduling that survives a lost node. The third is operational needs Compose lacks: zero-downtime rolling deploys gated on health, and horizontal autoscaling on load. Hit one of those for real and the tax starts to pay for itself.
Crucially, it’s not a binary. Between “one Compose host” and “self-managed Kubernetes” sit options that give you multi-node resilience without the full operational weight: Docker Swarm (multi-host Compose-like syntax, far simpler than k8s), HashiCorp Nomad, and managed PaaS / serverless containers like Cloud Run, AWS App Runner, ECS, or Azure Container Apps that run your containers and handle scaling without you owning a control plane. A managed Kubernetes service (EKS/GKE/AKS) removes the control-plane operations but not the conceptual complexity — you still own pods, networking, and the YAML.
A 4-person startup runs an API + Postgres + Redis + a worker, all on one rented server, modest traffic. Pick the orchestration weight.
Which capability is the real reason to move from Compose to Kubernetes — the thing Compose fundamentally cannot do?
What does the Kubernetes reconciliation loop give you that Compose's restart: always does not?
Order the questions a senior asks before reaching for Kubernetes:
- 1 Does the workload fit on one host with room to grow? If yes, Compose is enough
- 2 Is a single host going down unacceptable, or do I need to spread load across machines?
- 3 Do I need zero-downtime rolling deploys gated on health, or horizontal autoscaling?
- 4 If yes to those — can a middle ground (Swarm, Nomad, managed PaaS) meet the need with less tax?
- 5 Only if the scale and ops needs are real and the middle ground doesn't fit → Kubernetes
- 01Explain to a teammate why a 3-person team adopting self-managed Kubernetes on day one is usually a mistake, and what to do instead.
- 02What concrete signals tell you you've genuinely outgrown Docker Compose?
Compose and Kubernetes both run containers, but they answer different questions. Compose is a single-host process manager — one YAML, minutes to learn, ~50MB overhead, perfect for local dev and small single-node deploys, but with no multi-node scheduling, no self-healing across machines, no health-gated rollouts, and no autoscaling. Kubernetes is a distributed control plane whose reconciliation loop continuously drives reality toward a declared desired state, giving self-healing, scheduling across nodes, zero-downtime rolling updates, and horizontal autoscaling — at the cost of a real complexity tax: a control plane, CNI, RBAC, YAML sprawl, on-call burden, and five-figure monthly bills. The senior decision is capability vs complexity weighed against actual scale. The classic failure is buying the tax too early; the opposite failure is ignoring Compose’s single-host ceiling too long. Between them sit Swarm, Nomad, and managed PaaS. Pick the weight your traffic earns, and migrate when a real ceiling — not a hypothetical — comes into view.