Deployment & Infra
Infrastructure as Code: build a drift-safe stack
Reading about locked state and silent reverts is not the same as living through them. Stand up a small but real IaC stack with proper remote state, then deliberately walk it into the two incidents from the lesson — a concurrent-apply clash and an out-of-band drift — and recover from each the senior way, with evidence at every step.
Turn the unit’s mental model into muscle memory: configure a versioned, locked remote backend, prove the lock actually blocks a concurrent apply, induce real drift and resolve it deliberately instead of letting apply silently revert it, and keep secrets out of state.
Build a small Terraform/OpenTofu (or Pulumi) stack with proper remote, versioned, locked state, then deliberately reproduce and recover from a concurrency clash and a drift event — proving each outcome with command output, not assertion.
- Command output (not prose) showing: a clean apply, a no-op second apply, the 'Error acquiring the state lock' failure under concurrency, and a plan -refresh-only that reports the induced drift.
- A short write-up of the drift resolution: which manual change you codified into config, which you let apply revert, and the reasoning for each — explicitly naming the silent-revert risk you avoided.
- Evidence the backend is versioned and locked (backend config plus the object-version listing or the lock entry), and that the state file contains no plaintext secret.
- A one-paragraph reflection connecting the exercise back to the unit: how the state file acted as both source of truth and hazard in your runs.
- Add a scheduled drift-detection job (CI cron running plan -refresh-only) that opens an alert or PR when reality diverges from the declaration, so drift is reviewed before any apply silently resolves it.
- Extract the stack into a reusable module with input variables and instantiate it twice (e.g. staging and prod) from the same source, proving reproducibility across environments.
- Wire a CI pipeline with a concurrency group and a -lock-timeout so overlapping runs wait instead of failing, then simulate a crashed run that leaves a stale lock and document the safe force-unlock + re-plan recovery.
- Convert one mutable resource to an immutable-replacement pattern (new image / create_before_destroy) and show how it shrinks the drift surface compared with in-place mutation.
This is the loop you will run on every real IaC stack: put state in a versioned, locked remote backend before anything else, prove the lock by trying to break it, treat drift as a question you answer with plan -refresh-only and a deliberate keep-or-revert decision rather than a blind apply, and keep secrets out of state entirely. Doing it once on a toy stack — including breaking it on purpose and recovering with evidence — makes the production version muscle memory instead of a 2am surprise.