Networking & Protocols
DNS: run a signed zone through a migration and a rollover
Reading about KSK rollovers and TTL migrations is not the same as running one without taking the zone down. Stand up your own signed zone behind a real recursive resolver, instrument the full resolution path, then drive it through the two changes that page on-call most often — an IP migration and a DNSSEC rollover — with dig evidence at every step.
Turn the unit’s model into a reproducible operations loop: build the resolver walk end to end, validate DNSSEC, encrypt the transport, migrate a record with zero stale-window surprises, and recover a deliberately broken chain — proving each outcome with dig, not assertion.
Run a small DNSSEC-signed zone (lab.example or any domain you control, or a fully local root using a tool like dnslib/CoreDNS) behind your own recursive resolver, instrument the resolution path, then execute a zero-downtime IP migration and a KSK rollover — breaking and recovering the chain — with dig evidence for every claim.
- A captured +trace annotated with which hops are referrals vs the authoritative answer, and a note on whether glue was required (in-bailiwick) or not (out-of-bailiwick) for your nameservers.
- dig +dnssec output before the break showing AD set, and the break-state pair (SERVFAIL under +dnssec, NOERROR + correct A under +cd) with a one-line diagnosis naming the DS-vs-KSK mismatch.
- A timed migration log: dig results at intervals proving the new IP appears no later than the lowered TTL after the change, with the old TTL drained beforehand.
- A packet capture or resolver log showing the query travelling over DoH/DoT (encrypted) and not over plaintext UDP/53, plus a one-paragraph note on what DNSSEC adds that encryption does not, and vice versa.
- Add negative-caching evidence: query a non-existent subdomain twice and show the second NXDOMAIN served from cache (no upstream query) for min(SOA.MINIMUM, SOA.TTL); then show a brief SERVFAIL cache during an induced upstream failure.
- Switch NSEC to NSEC3 and demonstrate that zone-walking enumeration (e.g. with an NSEC-walking tool) succeeds under NSEC and is defeated under NSEC3.
- Add EDNS Client Subnet to upstream queries and show, via a geo-aware test authoritative, that the returned edge IP changes with the advertised subnet — then write up the privacy cost and why Cloudflare disables ECS by default.
- Write a one-page KSK-rollover runbook: the double-signature overlap window (publish new KSK alongside old for 2x max-TTL), the DS update step, the wait, the retirement, and automated DS-vs-DNSKEY consistency monitoring so the break in this project can never reach production.
This is the loop you will run for any real DNS change: build the resolution path so you can see referrals and glue, sign and validate so the AD bit means something, encrypt the transport without confusing it for integrity, migrate by draining the old TTL first so the stale window is bounded, and rehearse the KSK-rollover failure (+dnssec SERVFAIL, +cd success, DS-vs-KSK mismatch) so the recovery is muscle memory. Doing it once on a lab zone makes the production version — where 30-40% of users are at stake — routine instead of a 3 a.m. discovery.