Networking & Protocols NET · 04 · 09

DNS: dig and zone-file reading

Read real dig output, a zone file, and a resolver config, predict the DNS behaviour, and pick the diagnosis or fix a senior engineer would reach for first.

NET Senior ◷ 14 min

Level

FoundationsJuniorMiddleSenior

DNS incidents are diagnosed in dig output and zone files, not in slides. Read each artefact, predict what it tells you, and choose the call a senior engineer would make first.

Goal

Practise the loop you run in every DNS incident: read the trace or zone file, locate the broken link, and reach for the diagnosis or the highest-leverage fix before guessing.

Snippet 1 — a +trace referral

$ dig +trace shop.example.com A

;; received referral from the .com TLD:
example.com.   172800  IN  NS  ns1.example.net.
example.com.   172800  IN  NS  ns2.example.net.
;; (Additional section empty)

;; query to ns1.example.net times out
;; query to ns2.example.net times out
;; SERVFAIL

Quiz

The .com referral lists the nameservers but the lookup SERVFAILs. The NS names are ns1/ns2.example.net. What is the most useful first read?

Snippet 2 — a zone file edit

$ORIGIN example.com.
@   IN  SOA  ns1.example.com. admin.example.com. (
            2026051300  ; serial  (was 2026051905)
            7200        ; refresh
            3600        ; retry
            1209600     ; expire
            300 )       ; minimum (negative-cache TTL)
@       IN  NS    ns1.example.com.
@       IN  CNAME shop.cdnprovider.net.
www     IN  CNAME shop.cdnprovider.net.

Quiz

This zone edit has TWO defects that will bite in production. What are they?

Snippet 3 — a DNSSEC diagnosis

$ dig @1.1.1.1 +dnssec api.bank.example A
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 41832
;; flags: qr rd ra; QUERY: 1, ANSWER: 0

$ dig @1.1.1.1 +cd api.bank.example A     # +cd = checking disabled
;; ->>HEADER<<- status: NOERROR; flags: qr rd ra cd; ANSWER: 1
api.bank.example.   60   IN   A   203.0.113.42

Quiz

Validating resolver returns SERVFAIL, but the same query with +cd returns a clean A record. What does this prove, and what is the customer impact?

Snippet 4 — a resolver config and a latency reading

# /etc/unbound/unbound.conf (excerpt)
server:
    serve-expired: yes
    serve-expired-ttl: 86400

$ dig +short @127.0.0.1 whoami.cloudflare TXT CH
"AMS"
$ dig +short @127.0.0.1 example.com A
;; Query time: 70 msec      # every query, never cached lower

Quiz

serve-expired is on, the resolver's nearest Cloudflare PoP is AMS, yet example.com costs 70 ms on every query and never drops to sub-ms. Which two readings are correct?

Recap

Every DNS incident is read in artefacts: a +trace referral tells you whether glue is even required (in-bailiwick vs out-of-bailiwick) and where the walk stalls; a zone file exposes decremented serials and forbidden apex CNAMEs at a glance; a SERVFAIL that clears under +cd is a DNSSEC chain break, almost always a DS-vs-KSK mismatch after a rollover; and a constant query time means the answer is not being cached, while serve-expired only helps during outages. Diagnose from the artefact, name the broken link, then fix the one mechanism it points to.

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.