Networking & Protocols
Edge workers and edge-side composition
A product page needs a globally-cached page body (10-minute TTL), real-time personalised pricing (no cache, edge-fetched per user), and a shared comment count (60-second TTL). If you send all three to origin per request, p95 latency is 300 ms. If you compose the page at the edge from three independently-cached fragments, it drops to 30 ms. That is the promise of edge-side composition.
Edge workers: the compute model
Every major CDN now runs user code at the edge — not just static files:
- Cloudflare Workers: V8 isolates. Cold start 2–5 ms p99. 128 MiB memory limit. 50 ms CPU wall-time per request (extendable with Workers Unbound). Thousands of customer functions share one V8 process via isolate isolation — negligible per-function startup overhead.
- Fastly Compute@Edge: WebAssembly. Cold start similar to Workers.
- AWS Lambda@Edge: full Node.js/Python runtime. Cold start 400–600 ms at some regional POPs — a 100× penalty over Workers for latency-sensitive paths.
- Vercel Fluid Compute (2026): V8-based, persistent warm instances, concurrent request handling within a single isolate. 99.37% zero-cold-start ratio. 1.2×–5× faster than Workers for heavy SSR (template rendering, large page assembly). 50 ms wall-time budget.
- Cloudflare Workers cold start (p99)
- 2–5 ms
- Lambda@Edge cold start (some PoPs)
- 400–600 ms
- Vercel Fluid Compute zero-cold-start ratio
- 99.37% of requests
- Workers CPU wall-time budget
- 50 ms per request
- Workers memory limit
- 128 MiB
- Workers isolate model
- Thousands of customers per V8 process
Use cases for edge workers
Workers sit in the request path and can do anything a proxy can do with code:
- Auth at edge: validate JWT or session token at the edge POP without a round-trip to origin. Reject invalid requests before they ever hit your servers.
- A/B test routing: read a user cookie or experiment ID and rewrite the request URL to
/variant-a/or/variant-b/at the edge. - Geo-redirect: Workers have access to the user’s country code (from Cloudflare
CF-IPCountryheader) — redirect/to/en/or/de/without origin involvement. - Request/response mutation: add security headers (
Strict-Transport-Security,X-Content-Type-Options) to every response without modifying origin. - Dynamic routing: fan out to multiple microservices, merge responses, return combined JSON — all within the edge POP, not a central region.
The 50 ms wall-time budget
Workers cap CPU time at 50 ms per request. This eliminates: large-model ML inference, image resizing on large images, heavy database queries. It enables: JWT validation (~1 ms), URL rewriting (~0.1 ms), KV lookup (~3 ms), simple HTML mutation (~5 ms). Design workers to be thin routing/auth/mutation layers, not compute-heavy backends.
Why this works
Why V8 isolates instead of containers. A traditional serverless function (Lambda) requires a separate process or container per function — spinning up takes 400–600 ms. V8 isolates are memory-efficient sandboxes within one V8 process: each isolate has its own heap, no shared mutable state, but they share the process’s startup cost. Cloudflare’s model: thousands of customer functions share one running V8 process. Cold-starting a new isolate costs 2–5 ms, not 400 ms. The isolation guarantee is cryptographic (V8 sandbox) not OS-process-level, which is acceptable for CDN workloads running untrusted customer code.
TLS resumption at the edge
A new TLS connection costs 1 RTT (~10 ms locally at edge). Session resumption via TLS 1.3 PSK (pre-shared keys) skips the handshake on reconnection. Modern CDNs replicate session tickets across all POPs: if a user connects to POP A, then reconnects via POP B (mobile network switch), POP B already has the session ticket — no re-handshake. This matters most on mobile (frequent handoffs between cell towers).
Edge-side composition (ESI and HTMLRewriter)
ESI (Edge Side Includes), W3C draft 2001, still in production at Akamai. HTML responses contain <esi:include src="/fragment/nav" /> placeholders the edge replaces with cached fragment responses before sending to the browser. Each fragment has its own cache key and TTL — the shared nav caches for 6 hours; a breaking-news banner caches for 30 seconds.
Modern replacement: Cloudflare Workers + HTMLRewriter API. The worker fetches the HTML response and uses HTMLRewriter’s streaming HTML parser to inject fragments into specific DOM positions — faster and more flexible than ESI.
The unifying idea: assemble the page at the edge from independently-cached pieces. A product page:
- Page chrome (header, footer, nav):
max-age=21600(6 hours) - Product description:
max-age=600(10 minutes) - Price: fetched fresh from a regional pricing service on every request (~30 ms)
- Personalised recommendation widget: fetched from KV store keyed by user-id (~3 ms)
Result: the full page is assembled at the edge in ~35 ms, no origin round-trip for 80% of the bytes.
An e-commerce checkout page has mandatory edge-computed tax and shipping rates. The edge worker fetches from a regional microservice on cache miss. Optimise for cold-load scenarios.
Edge worker performance degradation alert
$ curl -w "@format.txt" https://api.example.com/checkout
time_starttransfer: 450ms
time_connect: 12ms
time_tls: 2ms
time_firstbyte: 428ms
Edge diagnostics via worker:
Server-Timing: edge-worker=185ms; regional-service=240ms; kv-lookup=3ms
Worker logs (2026-05-15):
handler_start=0ms
kv_fetch=3ms
regional_service_start=5ms
regional_service_timeout=185ms
handler_end=188ms
total_wall_time=188ms (budget: 50ms × 4 extensions used) Edge worker is using 4× the wall-time budget and users see 450 ms responses. What is the bottleneck?
An e-commerce site needs to serve product pages globally with millisecond-precision price updates. Pick the cache strategy.
- 01Why does Cloudflare Workers achieve 2–5 ms cold start while Lambda@Edge takes 400–600 ms?
- 02Describe edge-side composition for a news page that has shared chrome, a breaking-news banner (must be fresh within 30 s), and article body (5 min staleness OK).
- 03What is the 50 ms wall-time budget in Cloudflare Workers, and what does it exclude?
Edge workers run custom code at CDN POPs using V8 isolates (Cloudflare Workers: 2–5 ms p99 cold start) or WebAssembly. They enable auth validation, A/B routing, geo-redirect, request/response mutation, and real-time fragment fetching without origin round-trips. The 50 ms wall-time budget restricts compute-heavy tasks; those belong in regional Functions. Edge-side composition (ESI or Workers + HTMLRewriter) assembles responses from independently-cached fragments — page chrome cached for hours, personalised or real-time data fetched fresh per request — combining the cache efficiency of static assets with the freshness of dynamic data. Lambda@Edge’s 400–600 ms cold start disqualifies it for latency-sensitive paths; Vercel Fluid Compute achieves 99.37% zero-cold-start by keeping warm isolates. TLS session ticket replication across POPs amortises reconnection costs on mobile.
appears again in162
- The journey of a request: seven stops from socket to responsejunior
- Accept and parse: from kernel queue to a typed requestmiddle
- Routing and middleware: choosing what runs, and in what ordermiddle
- Handler and response: from business logic to bytes on the wiremiddle
- Streaming and backpressure: when the client reads slower than you writesenior
- Timeouts and tail latency: budgets, deadlines, and the fan-out trapsenior
- Middleware and DI: the two patterns that shape every backendjunior
- Writing middleware: signatures, next(), and the three framework modelsmiddle
- Inversion of control: how dependencies reach a classmiddle
- DI scopes and lifecycles: singleton, request, transientmiddle
- DI as a testing seam: fakes, mocks, and the boundary that matterssenior
- DI containers in production: resolution graphs, circular deps, and when not tosenior
- Blocking vs non-blocking I/O: two ways to waitjunior
- The event loop: one thread, ordered phasesmiddle
- What blocks the loop: CPU work and sync callsmiddle
- Offloading CPU work: worker threads and the libuv poolmiddle
- Backpressure and bounded concurrencysenior
- Throughput under load: tail latency and saturationsenior
- Why pool: the cost of creating a connectionjunior
- Pool sizing: why bigger is not fastermiddle
- Acquisition and timeouts: the wait queue is the real latency dialmiddle
- Retry strategies: backoff, jitter, and thundering herdmiddle
- Observability, production failures, and global-scale designsenior
- Tasks, microtasks, and scheduler.yield()middle
- Timer accuracy, throttling, and idle workmiddle
- Node.js event loop: phases, nextTick, and loop lagsenior
- Rendering strategies: SSG, SSR, ISR, streaming, and hydrationjunior
- SSG, SSR, ISR, streaming, and RSC — how each worksmiddle
- Hydration cost: selective, progressive, islands, resumabilitymiddle
- Core Web Vitals: what LCP, INP, and CLS measurejunior
- LCP: four phases, one dominant costmiddle
- INP: input delay, processing, presentationmiddle
- Lab vs field: why the two disagree and how to use eachmiddle
- Metric tradeoffs, RUM attribution, and the CI+field loopsenior
- The full picture: URL to LCP to INP as a relay racejunior
- Eight layers traced: from the service worker to the second navigationmiddle
- Five canonical breaks: where production reliably diessenior
- The three-track method: reading traces and building a monitored systemsenior
- What an index is and how it speeds up queriesjunior
- The leading-column rule and composite index designmiddle
- Partial, expression, and covering indexesmiddle
- Index types: GIN, GiST, BRIN, Hash, Bloom, and HOT updatesmiddle
- Index-only scans, the Visibility Map, and INCLUDEsenior
- Production failure modes and the index audit playbooksenior
- Index design exercise: full-text search strategysenior
- EXPLAIN and execution plans: what the planner decides and whyjunior
- Scan types: Seq, Index, Bitmap, Index-Onlymiddle
- Join algorithms and the row-estimate cascademiddle
- pg_statistic, ANALYZE, and production observabilitymiddle
- Extended statistics: fixing correlated-column estimate failuressenior
- Plan cache, cost-constant tuning, and planner internalssenior
- Production failure modes and plan stabilitysenior
- Connection pools: amortising the cost of a Postgres backendjunior
- PgBouncer session, transaction, and statement modesmiddle
- Pool sizing: the (cores × 2) + spindles formula and the two-layer stackmiddle
- Pool exhaustion and idle-in-transaction: the 3 AM failure modemiddle
- Migrating to transaction mode: rollout playbook and PgBouncer 1.21 prepared statementsmiddle
- The Postgres process model and why raising max_connections degrades throughputsenior
- Pooler landscape 2026, serverless connection storms, and the full failure-mode taxonomysenior
- ADD COLUMN: instant in PG 11+ vs rewrite in older Postgresjunior
- The lock-queue failure mode: why instant DDL can freeze the databasemiddle
- Safe DDL patterns: NOT VALID, CONCURRENTLY, and unsafe-op fixesmiddle
- Migration failure taxonomy and production disciplinesenior
- Shard-key selection: hash, range, list, and directory strategiesmiddle
- Co-location and Citus: the invariant that makes sharding usablemiddle
- The hot-shard failure mode: detection, isolation, and durable policymiddle
- Online resharding, 2PC, and the operational cost of shardingsenior
- The seven acts: from CREATE TABLE to Citusjunior
- Acts 1–3 in depth: schema, indexes, and planner statisticsmiddle
- Acts 4–6 in depth: MVCC bloat, connection pooling, and safe migrationsmiddle
- Act 7 in depth: sharding, co-location, and the seven-tier tradeoff cascademiddle
- Observability, anti-patterns, and production triagesenior
- What the three signals are: logs, metrics, and tracesjunior
- Metrics and cardinality: the cost model of a time-series databasemiddle
- Logs and volume: the cost model of structured loggingmiddle
- Traces and sampling: the cost model of distributed tracingmiddle
- Join keys and exemplars: making the three signals composemiddle
- Observability 2.0: wide events and the cost shiftsenior
- Failure modes and engineering practice: cardinality budgets, PII, and samplingsenior
- Why structured logs exist: the diary vs the spreadsheetjunior
- The production log schema: fields every line must carrymiddle
- Log levels and alert routingmiddle
- Sampling strategies and log costmiddle
- PII redaction and log injectionsenior
- Trace context propagation in logssenior
- OTel Logs Data Model and audit logs as a subsystemsenior
- OTel signals, Semantic Conventions, and the OTLP wire formatmiddle
- Auto-instrumentation and manual spans: the 80/20 of OTelmiddle
- The OTel Collector: receivers, processors, exporters, and deployment patternsmiddle
- Sampling strategies: head, tail, and parent-basedmiddle
- Vendor neutrality, eBPF instrumentation, the Operator, and browser/serverless OTelsenior
- Operating the OTel Collector: reliability, version skew, failure modes, and governancesenior
- RED and USE: two checklists, one triage disciplinejunior
- Instrumenting RED in Prometheus: counters, histograms, and cardinality disciplinemiddle
- USE on Linux: CPU, memory, disk, network, and PSImiddle
- Golden signals, dashboard layout, and service mesh auto-REDmiddle
- Cardinality as a cost driver: labels, PII, exemplars, and samplingmiddle
- Native histograms, SLO tie-in, and production failure patternsmiddle
- Choosing SLIs and SLO targets: ratios, not feelingsmiddle
- Multi-window multi-burn-rate alerting: why AND beats ORmiddle
- Error budget policy, latency SLOs, and composite journeysmiddle
- Iceberg SLIs, composite SLO math, and SLA vs SLOsenior
- Flame graphs: reading the picture that shows where time goesjunior
- Sampling vs instrumentation profiling: why 99 Hz wins in productionmiddle
- Profile types: CPU, memory, off-CPU, mutex — which one to reach formiddle
- Continuous profiling: always-on flame graphs with eBPF and trace-id correlationmiddle
- How flame graphs are built from samples, and the production workflows that use themmiddle
- Linux perf, eBPF internals, PGO, and the limits of samplingsenior
- Profiling in production: security, war stories, OTel profiles, and the infrastructure designsenior
- The debugging funnel: SLO → RED → trace → profilejunior
- OTel architecture: one SDK, four signals, one wire formatmiddle
- Cost discipline: keeping observability under 5% of infra spendmiddle
- Scale, security, and the ROI of observable systemssenior
- Why profile first: measure where time actually goesjunior
- Amdahl''''s law and self-time: the ceiling on every speedup you can shipmiddle
- The measurement loop: microbench, macrobench, prod profile, observer effectmiddle
- Reading flame graphs: shapes, per-language profilers, and the 60-second scanmiddle
- Statistical baselines: why one run is not a measurementmiddle
- Profiler history and microbenchmark pitfalls: Knuth to GWPsenior
- Hardware counters, cold-start profiles, and profile securitysenior
- Continuous profiling at scale: costs, CI gates, trace correlation, and anti-patternssenior
- What makes a hot path: symptom vs causejunior
- Five shapes of hotspot: CPU, alloc, cache, lock, syscallmiddle
- Reading parent and child chains: where to apply the fixmiddle
- JIT deopt, the fix-and-verify loop, and PR-time profilingmiddle
- Hardware counters and Intel TMA: sub-category diagnosissenior
- False sharing and native-bridge hot pathssenior
- Hot paths in production: security, tail latency, and tooling lineagesenior
- Memory hierarchy: why the same O(N) loop can be 17x slowerjunior
- Row-major vs column-major: access order and the 9x gapjunior
- Branch prediction and branchless codemiddle
- Hardware prefetcher, TLB, and memory-level parallelismsenior
- GC basics: what the runtime taxes you forjunior
- GC algorithms: generational, concurrent, and per-runtimemiddle
- GC tradeoffs: pause, throughput, heap — and object poolingmiddle
- GC tuning: pacing, heap shape, and allocation observabilitymiddle
- GC internals: tri-color invariant, write barriers, and per-runtime deep-divessenior
- GC in production: observability, security, edge cases, and fleet governancesenior
- N+1: one logical operation, many round-tripsjunior
- Fix families: JOIN, IN, preload, and DataLoadermiddle
- Detecting N+1: query logs, APM traces, and CI gatesmiddle
- DataLoader: batching across resolver treesmiddle
- Cross-protocol N+1: HTTP fan-out and Redis MGETmiddle
- N+1 at scale: pool exhaustion, plan changes, and denormalisationsenior
- Batching: amortize fixed cost per operationjunior
- The batching window: size and wait timemiddle
- Batching in Kafka and Postgresmiddle
- io_uring and observability of batchingmiddle
- From Nagle to io_uring: evolution of batchingmiddle
- Backpressure, failure isolation, and batch security in productionsenior
- What a bundle actually costs: download, parse, compile, executejunior
- Core Web Vitals: LCP, INP, and CLSmiddle
- Code splitting: route-level, component-level, vendor splittingmiddle
- Tree shaking and compression: removing what you don''''t usemiddle
- Third-party scripts: the silent budget killermiddle
- CI enforcement and RUM: making budgets stickmiddle
- V8 JIT pipeline, HTTP priorities, and bundle securitysenior
- The performance loop: discipline, not a projectjunior
- Classify and fix: matching bottleneck families to remediesmiddle
- Observability stack and CI gates: catching regressions before they shipmiddle
- Incident to enforcement: SLO burn to verified fix in 35 minutesmiddle
- Culture, economics, and org-scale performancesenior