Networking & Protocols NET · 04 · 03

TTL, caching, and DNS propagation

TTL is permission, not a command — understanding its operational impact on freshness, load, and the myth of DNS propagation.

NET Middle ◷ 14 min

Level

FoundationsJuniorMiddleSenior

You updated a DNS record and your co-worker can see the change. You cannot. An hour later you finally see it. Your manager calls this “DNS propagation” and says to wait 24–48 hours. That framing is technically wrong and operationally expensive. Understanding what TTL actually is — permission, not a command — changes how you manage DNS records in production.

What TTL means

Every DNS record carries a TTL (Time To Live) — a number in seconds. When a resolver caches an answer, it counts down from TTL to zero. At zero it discards the cached record and re-queries. The authoritative server does not push updates; caches expire and pull the new value on their own schedule.

This has two operational consequences:

You cannot force a cached value to disappear. Once a TTL is published and cached, you must wait for that TTL to count down everywhere.
“DNS propagation” is a misleading term. Nothing propagates. All that happens is that distributed caches expire one by one. There is no wave; there is no timeline. A cache that was populated 1 minute before your change will hold the old value for (TTL - 1 minute) longer.

High TTL vs low TTL

TTL	Benefit	Cost
High (86400 s = 1 day)	Few queries; fast cache hits; authoritative server handles less load	Stale data lingers up to 1 day after a change
Low (60 s)	Stale data clears in 1 minute after a change	More queries; higher load on authoritative; cache cannot serve stale during outage

Cache hit rate under steady load (10 queries/hour)

TTL = 60 s: ~17% cache hits
TTL = 300 s: ~50% cache hits
TTL = 3600 s: ~90% cache hits
TTL = 86400 s (1 day): ~99% cache hits
Planned-change SOP: Lower TTL to 60 s one week before, change, raise back

Cache hits climb steeply at first, then flatten — past 3600 s a longer TTL barely adds hit rate while multiplying how long stale data lingers.

Planned migration SOP

How do you change a DNS record without leaving users stranded for hours? The standard operational pattern for a planned DNS change:

One week before: Lower TTL to 60 seconds. Wait for the old high TTL to expire everywhere (max wait = old TTL).
Change day: Update the record. Bad outcome reachable immediately — max cache age is now 60 s.
After stabilisation: Raise TTL back to 3600 s or higher.

Skipping step 1 means old caches can serve stale data for up to the old TTL (e.g., 24 hours) after your change.

Negative caching (RFC 2308)

DNS caches do not only store positive answers. They also cache negative responses:

NXDOMAIN (name does not exist) — cached for min(SOA.MINIMUM, SOA.TTL), typically 1–3 hours.
NODATA (name exists but no record of that type) — same cache duration.
SERVFAIL — cached briefly per RFC 9520: 30 seconds to 5 minutes. Short enough to not amplify an outage, long enough to prevent a tight retry loop.

Negative caching is essential for performance: without it, every query for a non-existent subdomain would hit the authoritative server every time. Without SERVFAIL caching the internet’s early typo storms and misconfigured clients could flood authoritative servers into collapse.

Quiz

What does TTL actually tell downstream resolvers?

Quiz

You change an A record on your authoritative server. A resolver cached the old value 10 minutes ago with TTL=3600. How long before that resolver serves the new value?

SOA record and zone authority

Every zone has exactly one SOA (Start of Authority) record at the apex. Its fields govern replication and negative caching:

SERIAL: incremented on every zone change. Secondaries compare their serial to the primary’s; if lower, they pull an update.
REFRESH: how often secondaries poll without a NOTIFY (typically 1–24 hours).
RETRY: poll interval when REFRESH fails.
EXPIRE: how long a secondary serves stale data when the primary is unreachable (often 1 week).
MINIMUM: negative-cache TTL (RFC 2308). The actual negative TTL is min(SOA.MINIMUM, SOA.TTL).

Common ops mistake: decrementing SERIAL. The convention is YYYYMMDDNN format (2026051301 = 2026-05-13, change #01). Manual edits that decrement SERIAL break replication silently — secondaries skip the “older” zone.

Order the steps

Order the recommended steps for a planned DNS migration (changing an A record IP):

1 Identify current TTL (e.g. 86400 s)
2 Lower TTL to 60 s; wait for old TTL to expire everywhere
3 Update the A record to the new IP
4 Monitor for errors; verify new IP is resolving correctly
5 Raise TTL back to 3600 s or higher

▸Why this works

Stale-while-revalidate (RFC 8767). When an upstream authoritative is unreachable but a cache entry exists past its TTL, a resolver may serve the stale answer for up to 1–3 days (configurable) while attempting a refresh in the background. This dramatically improves availability during authoritative outages at the cost of serving slightly stale data. Unbound supports this with serve-expired yes. The trade-off: if a zone was intentionally removed rather than just unavailable, stale-while-revalidate hides the deletion from users longer than the TTL would suggest.

Browser DNS cache

Browsers maintain their own DNS cache, separate from the OS stub resolver and the upstream recursive resolver. Chrome caches DNS entries for approximately 1 minute regardless of the record’s actual TTL — intentionally short enough to forget potentially malicious answers, long enough to avoid re-resolving every link on a page. Firefox follows a similar policy.

Clearing the OS-level DNS cache (sudo systemd-resolve --flush-caches on Linux, sudo killall -HUP mDNSResponder on macOS) does not clear the browser’s cache. To force a full re-resolution: restart the browser or visit chrome://net-internals/#dns and clear the host cache.

Browsers also pre-warm DNS via <link rel="dns-prefetch" href="//cdn.example.com"> — resolving names before the user clicks a link so the lookup latency is hidden.

TTL is permission, not a push: the authoritative server never notifies caches. Each cache simply counts its own copy down to zero and re-queries — which is why 'propagation' is just independent caches expiring one by one.

Recall before you leave

01
Operational impact of high TTL vs low TTL — name one benefit and one cost of each.
02
You observe dig @1.1.1.1 example.com A returning 80 ms on every query. What does this suggest?
03
What is negative caching and why is it important?

Recap

TTL is a maximum hold time for downstream caches, not a validity guarantee from the authoritative server. DNS has no push mechanism: “propagation” is just distributed caches expiring one by one. To manage a planned change safely, lower the TTL well before the change so the worst-case staleness window is short. Negative responses — NXDOMAIN, NODATA, and SERVFAIL — are also cached, governed by SOA.MINIMUM and RFC 9520 respectively. The SOA record controls zone replication: SERIAL increments signal secondaries to pull updates, and decrementing SERIAL silently breaks replication. Browsers maintain their own DNS cache independent of the OS resolver; clearing the OS cache does not affect the browser’s cache. Now when you face “users still see the old IP after my change”, you will reach for TTL arithmetic first — calculate how long the old TTL had left when users cached the answer, not how long ago you made the change.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 5 done

Connected lessons

builds on

The resolver walk: referrals, record types, and gluemiddle

unlocks

DNSSEC: chain of trust and validation failuresenior

deepens into

DNSSEC: chain of trust and validation failuresenior

appears again in178

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.