Networking & Protocols NET · 11 · 06

Defense-in-depth architecture and attack economics

No single defense stops all DDoS vectors — anycast edge absorption, rate limiting, WAF, mTLS, and adaptive load shedding form the layers; attack economics favor the attacker only if you defend alone.

NET Senior ◷ 16 min

Level

FoundationsJuniorMiddleSenior

You deployed a CDN, rate limiting, and a WAF. Then the attacker switches to cache-miss HTTP floods targeting your most expensive database query, spreading across 10,000 IPs each under your per-IP limit. No single defense catches it. You need to understand how the layers interact and when to escalate to humans.

Defense-in-depth architecture: the full stack. No single defense stops all attacks. The layered approach — each layer covers what the one above misses:

Anycast edge scrubbing — every CDN PoP is an active scrubbing center. Attacks are ingested at the nearest PoP rather than concentrated on the origin. Combined capacity across 330+ Cloudflare PoPs exceeds 37 Tbps. A 10 Gbps attack becomes a rounding error spread across many nodes.
Stateless L3/L4 rate limits — per-ASN, per-prefix rate limits drop obvious amplification sources and SYN flood sources before TCP state is created.
WAF at the edge — detects application-layer patterns (SQLi, XSS, bot fingerprints). Running at PL2 (balanced false-positive/coverage), not PL4 (paranoid) for a public API.
Token bucket per IP + per user — stops obvious botnets and authenticated user abuse.
Adaptive concurrency limiting at origin — when in-flight requests exceed capacity threshold, reject new requests with fast 503. Service remains stable; users get errors instead of timeouts.
Observability and human escalation — when automated defenses fail, on-call engineers add custom rules.

Together these six layers mean the attacker must defeat each one in sequence. Without layer 5 (adaptive concurrency), a distributed attack that slips through every earlier filter still brings the origin down via queue exhaustion — which is exactly how the cache-miss HTTP flood in the hook works.

Layer	What it stops	What it misses
Anycast edge	Volumetric floods (Gbps-scale)	Intelligent low-rate attacks per IP
Stateless L3/L4 rate limits	Amplification, SYN floods	HTTP-level attacks on valid ports
WAF (PL2)	Known attack signatures, bot patterns	Zero-days, obfuscated payloads, business-logic abuse
Rate limit (per IP/user)	Obvious botnets, auth abuse	Distributed botnets with residential proxies
Adaptive concurrency	Distributed overload, cache-miss floods	Attacks below the overload threshold
mTLS	Lateral movement inside the network	External-facing attack vectors

mTLS in service meshes with SPIFFE. Istio or Linkerd deploy sidecar proxies on every pod. The control plane (Istiod) runs a SPIFFE-compatible certificate authority. At startup, each sidecar receives a short-lived certificate (1–24 hours). Certificates rotate via SDS (Service Workload API) push — no sidecar restart needed. Every service-to-service call: (1) mTLS handshake (20–50 ms overhead per new connection on older hardware), (2) encrypted payload, (3) both sides verify certificates. Prevents lateral movement if the pod network is compromised. Cost: cert rotation adds operational complexity and monitoring burden (expired cert = infrastructure incident, not a bug).

Protocol/state-exhaustion in depth. SYN floods: each SYN allocates a half-open connection slot in the server’s backlog. When backlog overflows, the server drops new SYN packets from legitimate clients. SYN cookies encode connection state as a cryptographic cookie in the ISN — no memory allocated; legitimate clients reply with a valid ACK that decodes the cookie. ACK floods: RST rate limiting (limit RSTs per second) and firewall suppression of unmatched ACKs. TCP RST injection (on-path MITM): RFC 5961 challenge-ACK forces the attacker to know the exact sequence number rather than just be in-window.

Rate limiting internals: distributed systems complexity. Token bucket with Redis backing: T = min(C, T + R * delta_time). Distributed: each request atomically INCR key; EXPIRE key window. At 100k req/sec, Redis adds 0.5–1 ms per request = 50–100 ms total added latency. Mitigation: local per-server counter + periodic Redis sync (accepts slight inaccuracy, cuts to microsecond decisions). HyperLogLog for approximate rate limiting: ~1.6 kB per sketch, ~2% error, suitable for ASN-level or IP-range limits.

Adaptive concurrency: load shedding formula. Track in-flight requests Q. Set max_queue threshold. For new requests, survival probability = max(0, 1 - Q / max_queue). Accept the request with that probability. This creates graceful degradation: at 50% overload, 50% of new requests are rejected; at 100% overload, all new requests are rejected. Users see fast 503s instead of queue timeouts (which can be 30+ seconds). Circuit breakers reject requests to backends with recent failure rates above threshold — preventing cascading failure across the entire service graph.

Trace it

1/5

Trace a sophisticated attack using amplification + application-layer tactics.

Step 1 of 5

Step 1: attacker starts a 10 Gbps memcached amplification attack. What happens without a CDN?

Locked

Step 2: you migrate to a CDN (Cloudflare). The 10 Gbps is absorbed at the edge. What is the attack cost to the attacker?

Locked

Step 3: attacker switches to HTTP floods — legitimate-looking GET requests, 100,000 req/sec from a botnet. What does rate limiting do?

Locked

Step 4: attacker adds ?x=random to every request (bypassing cache), targeting expensive database endpoints. Cache-hit rate drops from 95% to 5%. What is the new problem?

Locked

Step 5: attacker adapts the random param with realistic User-Agents and spreads further. Service is still slow. What is the operational response?

Debug this

WAF anomaly scoring during an attack

log

2026-05-15 14:23:00 | requests=45000/sec | score-p50=0.5 | score-p95=3.2 | score-p99=8.1
2026-05-15 14:24:00 | requests=120000/sec | score-p50=6.1 | score-p95=14.2 | score-p99=28.5
2026-05-15 14:25:00 | requests=450000/sec | score-p50=12.3 | score-p95=19.8 | score-p99=32.1
2026-05-15 14:26:00 | blocked=380000 | allowed=70000 | score-threshold=5

The WAF is scoring traffic and blocking at threshold=5. What is happening in this timeline, and what should the operator do?

Attack economics. Attacker cost: ~$50–500/month for a botnet service capable of 10 Gbps sustained. Origin defense cost: ~$500/month CDN bill for 10 Gbps sustained traffic. If the attacker can generate 100 Gbps, the origin cannot match. But a CDN with global PoPs ingests 100s of Tbps and spreads the cost across millions of customers — per-customer cost is tiny. The economics favor the attacker only if you defend alone. Sharing infrastructure (CDN) is the answer: you cannot out-scale a botnet; you can make the attack economically unattractive by making it fail.

The attacker's cost is the same either way — only sharing a CDN inverts the asymmetry, so the attack fails instead of you out-spending yourself.

Observability at attack time. Key signals: (1) request rate per second — 10x normal is suspicious; (2) geographic distribution of sources — all from one ASN is suspicious; (3) anomaly score p99 — normal users score <1, attack traffic scores >10; (4) cache-hit rate — attack traffic targeting unique parameters shows sudden drop. Alert thresholds: request rate 10x baseline, anomaly score p99 spike, source-IP entropy drop (100 IPs instead of 100,000), or cache-hit rate drop below 80% during a traffic spike. Human escalation if attack persists >5 minutes or exceeds 50 Gbps.

Pick the best fit

An e-commerce service is under sustained application-layer attack. You must pick a defense strategy.

Design challenge

Design the DDoS defense architecture for a 100 Gbps-capable video CDN serving global users. The CDN operates 50 PoPs in 30 countries.

Absorb 100+ Gbps attacks at the edge without exceeding 10% of any PoP's capacity.
Defend against volumetric (L3/L4), protocol (SYN floods), and application-layer (HTTP floods) attacks.
Maintain p50 latency < 50 ms and p99 < 200 ms for legitimate users during attack.
Detect and block new attack patterns within 60 seconds.

▸Why this works

Why is adaptive concurrency limiting the preferred answer for a Black Friday e-commerce attack, not WAF PL4? WAF PL4 makes preemptive decisions at the request level based on content patterns. If those patterns are wrong (5% false positive), you block paying customers. Adaptive concurrency limiting makes reactive decisions at the system level based on actual load. It never blocks a request preemptively — it only rejects when the system is already overloaded (which is bad regardless of attack). The tradeoff: adaptive limiting accepts that some attack requests go through until the system hits capacity, then rejects everything equally. That is acceptable when the alternative is blocking 5% of legitimate Black Friday customers.

Quiz

Your WAF is at Paranoia Level 2 and attacks get through (only 70% blocked). You raise it to PL4. Legitimate customers now complain (5% false positives). What is a better approach?

Anycast edge scrubbing absorbs volumetric floods

Stateless L3/L4 filters drop amplification, SYN floods

WAF at PL2 block known L7 patterns

Rate limit per IP/user stop obvious botnets

Adaptive concurrency shed load on overload

Origin survives the attack

No single layer stops every vector, so they stack: the anycast edge absorbs gigabit floods, L3/L4 filters drop amplification and SYN floods, the WAF blocks known L7 patterns, rate limits cut obvious botnets, and adaptive concurrency sheds load when the origin is overloaded. Each layer covers the gaps of the one above.

Recall before you leave

01
Why is adaptive concurrency limiting preferable to WAF PL4 for a high-traffic production service under attack?
02
During a Rapid Reset attack (CVE-2023-44487), why does the attack bypass HTTP/1.1-only rate limiting?
03
What metrics should an on-call engineer monitor during a DDoS attack and what thresholds signal escalation?

Recap

Defense-in-depth against DDoS requires stacking multiple layers because no single defense stops all vectors. Anycast edge absorption distributes volumetric attacks across 330+ global PoPs; stateless L3/L4 filters drop amplification and SYN floods before they consume connection state; WAF at PL2 detects known application-layer patterns with tolerable false positives; adaptive concurrency limiting at the origin rejects on overload rather than preemptive IP blocking. Attack economics favor defenders only when using shared CDN infrastructure — a botnet generating 100 Gbps for $500/month is defeated by a CDN that amortizes defense capacity across millions of customers. When automated defenses fail, observability (request rate, anomaly scores, cache-hit rate, source-IP entropy) gives on-call engineers the signal they need to add custom rules within the 60-second escalation window. Now when you review your service’s DDoS posture, you can map each attack vector to the layer that stops it — and immediately spot the gap when a layer is missing.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 7 done

Connected lessons

builds on

appears again in287

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.

Apply this

Put this lesson to work on a real build.

Homelab Secure StackStand up a self-hosted media and home-server stack on nas01.example where five services share a single VPN container's network namespace — the kill-switch drops all traffic the moment the tunnel dies, a split-tunnel whitelist carves out LAN access, and three layered access rings (localhost / LAN 10.0.0.0/24 / mesh-VPN 100.64.0.30) keep the right doors open to the right people. Harden the host with SSH key-only auth, fail2ban, and unattended security upgrades; write a rotating, age-encrypted off-site backup; and walk away knowing the stack stays dark if anything breaks.