Browser & Frontend Runtime WEB · 04 · 06

Service worker edge cases: version skew, durability, and navigation traps

Version skew from content-unhashed assets, the kill-and-restart durability trap, and why a broken navigation-intercepting service worker is a stop-the-deploy incident.

WEB Senior ◷ 16 min

Level

FoundationsJuniorMiddleSenior

You ship a service worker that serves the app shell cache-first. A week later you push a bug fix. Some users never get it — they keep hitting the broken cached shell on every reload, and there is no recovery button for an ordinary user.

The update-and-version-skew problem

A service worker’s asset cache is versioned to its own code. When you deploy version N+1, an open page may be running version N’s worker while N+1’s HTML has shipped — or the reverse. If the worker serves cache-first app.js from version N while the HTML expects version N+1’s app.js, you get a runtime error from mismatched modules.

The robust pattern:

Content-hash every asset filename (app.4f3a1c.js). Old and new assets coexist in the cache with no collision.
Version-tag cache names (cache-v3, cache-v4). Pre-cache each deploy’s full asset set under the version tag.
In activate, delete only stale caches — caches whose version tag is not the current one. Do it after clients.claim() so no controlled page loses assets mid-session.
Serve navigation requests network-first (or a dedicated app-shell route), so users always land on HTML consistent with the active worker.

The same class of bug applies to sw.js itself: browsers cache the worker file for up to 24 hours by default. The modern practice is to serve sw.js with Cache-Control: no-cache so the browser always re-fetches it on navigation.

Service workers are not durable

The browser kills an idle service worker aggressively — often within seconds of finishing a fetch event — and restarts it on the next event. Any state held in a module-level variable is gone on restart. This is a frequent bug source:

A counter tracking requests in flight.
A cache of pending promises.
A WebSocket connection held in a global.

All evaporate. Durable state must live in IndexedDB or the Cache API.

Long-running work inside an event handler must be wrapped in event.waitUntil(promise) — that tells the browser “do not kill me until this promise settles.” Forgetting waitUntil means the browser may terminate the worker mid-operation, and background sync, push handling, and cache population silently fail to complete.

Service worker durability facts

Idle worker kill time: Seconds after last event
sw.js browser cache default: Up to 24 hours
Recommended sw.js cache header: Cache-Control: no-cache
Durable state options: Cache API or IndexedDB only
waitUntil forgets → silent fail: push, sync, cache population

Of all the failure modes covered here, this is the one that can lock users out of your app permanently with no escape route. The most powerful — and most dangerous — service worker pattern is intercepting navigation requests: the fetch handler catches the request for the HTML document itself and returns a cached app shell. This gives instant loads, but creates a class of bug otherwise impossible.

The trap: If you ship a bug in the app shell and cache it cache-first, every repeat visit serves the broken shell from cache, bypassing the network where the fix lives. The user cannot escape with an ordinary reload.

The defence is layered:

Navigation requests should be network-first with a short timeout (~3 s, fall back to cache). This ensures a fix reaches users on their first successful load.
Keep a kill switch — a versioned endpoint the worker checks on activate or periodically. On signal, call self.registration.unregister() and delete caches. This lets you remotely detach a broken service worker from all clients.
Never cache navigation cache-only. Always have a network path.

Network-first is the safeguard: the fix on the server always reaches users on their first successful load, and the cached app shell is only the offline fallback — never the default. Cache-first navigation is the trap that locks users onto a broken shell.

A broken service worker shipped widely is a stop-the-deploy incident because ordinary users have no recovery button — they cannot open DevTools, they cannot clear site data. Your only recourse is the kill switch or a fresh deploy that the old worker fetches on next activation.

Quiz

A service worker holds an in-flight request map in a module-level `const cache = new Map()`. After a few seconds of inactivity, entries vanish. Why?

Quiz

You deploy a service worker update and some users report a broken page: scripts fail to load with module-mismatch errors. What is the most likely cause and the robust fix?

Quiz

A user is stuck on a broken cached app shell and a normal reload does not fix it. What is the recovery mechanism you should have built in advance?

▸Why this works

Why is a broken service worker so hard to recover from? When a service worker intercepts navigation, it sits between the browser and the server for the HTML document itself — the page cannot load without the service worker responding first. Unlike a broken CDN (where the browser falls back to origin), a broken service worker responds successfully with a broken cached response. The browser has no way to distinguish a correct cached response from a buggy one. This is why the kill switch must be proactive: a URL the worker fetches on every activate, whose response tells the worker whether to unregister itself. If you wait for users to report breakage, you have already shipped.

Recall before you leave

01
Why does a service worker's module-level state disappear between requests?
02
What is the version-skew failure mode in service workers and how do you prevent it?
03
Why is a broken navigation-intercepting service worker a stop-the-deploy incident, and what is the architectural defence?

Recap

Service workers have three major edge-case failure modes. Version skew: serving cached assets from the wrong version — prevented with content-hashed filenames and version-tagged caches. Durability trap: module-level state evaporates between events because the browser kills idle workers; use event.waitUntil for long operations and IndexedDB/Cache API for state. Navigation interception: caching the HTML document itself means a broken shell traps users permanently — always use network-first for navigation and build a kill-switch endpoint. All three failures become hard-to-reverse production incidents if deployed without the safeguards. Now when you review a service worker PR, you know the three questions to ask before approving: are assets content-hashed, is navigation network-first, and does a kill switch exist?

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 5 done

Connected lessons

builds on

Service worker lifecycle and cache strategiesmiddle

unlocks

Five canonical breaks: where production reliably diessenior

appears again in169

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.

Apply this

Put this lesson to work on a real build.

Offline PWA syncOffline-first notes PWA: a local write queue (IndexedDB) that syncs on reconnect with last-writer-wins conflict resolution, a service worker for asset caching, and background sync for missed flushes.

Service worker edge cases: version skew, durability, and navigation traps

The update-and-version-skew problem

Service workers are not durable

Navigation interception and the app-shell danger

A service worker holds an in-flight request map in a module-level `const cache = new Map()`. After a few seconds of inactivity, entries vanish. Why?

You deploy a service worker update and some users report a broken page: scripts fail to load with module-mismatch errors. What is the most likely cause and the robust fix?

A user is stuck on a broken cached app shell and a normal reload does not fix it. What is the recovery mechanism you should have built in advance?

Practice

Something unclear?