Networking & Protocols NET · 12 · 04

Proxy intercepts and security gates: rate limiters, WAF, mTLS

Before a request reaches origin processing, it passes through reverse proxy health-routing, token-bucket rate limiters, WAF signature matching, and optional mTLS service-mesh auth — each adding latency budgets and operational complexity.

NET Middle ◷ 13 min

Level

FoundationsJuniorMiddleSenior

A botnet sends 10,000 requests per second from 10,000 different IPs. Your per-IP rate limiter sees 1 request per IP — well under the limit. Your WAF sees legitimate-looking browser fingerprints. The origin server is about to be overwhelmed. The security gates are running, but they are running alone instead of in layers. This lesson examines each gate, its cost, and why none of them works without the others.

Gate 1: Reverse proxy health routing

Before Bea’s SYN packet reaches origin, it may arrive at Patty — a reverse proxy or CDN edge. Patty makes a routing decision:

Cache hit path. Patty checks her cache: “Do I have the response for this URL with matching Vary headers?” If yes, she responds immediately without forwarding to origin. TCP + TLS were negotiated against Patty’s edge server (~20 ms RTT); origin never sees the request.

Cache miss path. Patty checks origin health. Health checks run out-of-band every few seconds — a periodic GET /health or TCP probe to each origin server. If origin is healthy, Patty forwards. If unhealthy (failed check, process down, 5xx rate too high), Patty routes to a backup server in the pool. This happens mid-flight — a SYN arriving during a failover may be routed to a different server than the one the previous SYN reached.

Connection draining. When a server is being removed from the pool (deploy, scale-down), Patty stops sending new connections to it but lets existing connections finish their response. The draining window (typically 30–60 s) ensures in-flight requests complete cleanly before the server is taken offline.

Gate	Latency	What it blocks	What it misses
Proxy cache check	<1 ms	Cacheable traffic — serves from edge	Non-cacheable, mutation endpoints
Per-IP rate limiter	<1 ms	Single-IP floods, naive scrapers	Distributed botnets (1 req/IP/s)
WAF signature match	1–5 ms	SQLi, XSS, CVE signatures	Novel attacks, encrypted payloads
mTLS client cert verify	20–40 ms	Unauthenticated services in service mesh	Compromised but valid cert holder

Gate 2: Rate limiter (token bucket)

When you design an API, the question is not whether to rate-limit but which algorithm to use and what its blind spots are. The rate limiter enforces quotas: “This IP may send N requests per second.” Implementation: token bucket algorithm. Each IP has a bucket with capacity C. Every second, T tokens are added (up to C). Each request consumes one token. When tokens are exhausted, new requests are dropped or returned 429 Too Many Requests.

Token bucket vs leaky bucket. Token bucket allows bursts up to capacity C. Leaky bucket drains at a constant rate, preventing bursts. Most API rate limiters use token bucket because it allows bursty-but-honest clients (a browser loading 30 subresources simultaneously) without penalising them.

The distributed botnet problem. A botnet with 10,000 IPs, each sending 0.9 req/s, sends 9,000 req/s total. Per-IP limit of 2 req/s passes all of them. Defence: add a global concurrent-connection limit or adaptive concurrency limiting at origin — when in-flight requests exceed the server’s capacity, reject new ones with fast 503 regardless of the per-IP quota.

Jitter on limit resets. If all clients hit their rate limit simultaneously and all tokens reset at second 0, a thundering herd hits the origin at t=1 s. Solution: randomise token-refill offsets per client (add ±10% jitter to the refill window). Rate limit resets stagger across clients, smoothing origin load.

Gate 3: WAF — Web Application Firewall

The WAF inspects application-layer content. Two modes:

Signature-based (rule mode). OWASP ModSecurity Core Rule Set (CRS) defines patterns: SQL injection (union select), XSS (<script>), path traversal (../), CVE-specific payloads. Each request is matched against hundreds of rules. Running at PL1–PL2 (Paranoia Level 1-2) balances false-positives against coverage. PL4 (paranoid) blocks legitimate traffic that happens to match aggressive patterns.

Anomaly-based. Baseline normal request shape (rate, user-agent, header structure, payload entropy). Deviations are scored. An IP sending 100 req/min with headless Chrome fingerprint that suddenly sends 3,000 req/min with identical headers is flagged as a bot. Anomaly detection cannot be bypassed by knowing the rules.

WAF cost. 1–5 ms per request for a rule-based WAF running at the edge. Justification: eliminates broad classes of application attacks that would otherwise require expensive application-layer code paths or DB query defenses.

Gate 4: mTLS — mutual TLS for service-to-service auth

Ordinary TLS authenticates only the server (Bea trusts Sven via Cara’s certificate). mTLS authenticates both parties. During the TLS handshake, Sven sends a CertificateRequest message. Bea sends her client certificate. Sven verifies it against a trusted CA.

Where mTLS matters. Internal microservices in a zero-trust network. If you have a payment service and a user service on the same Kubernetes cluster, mTLS ensures the user service cannot impersonate the payment service — even if an attacker gains network access inside the cluster.

SPIFFE. The SPIFFE/SPIRE framework automates certificate issuance for workloads: each service gets a short-lived certificate (SVIDs) from a SPIFFE server. Certificates rotate automatically (e.g., hourly). mTLS is enforced by the service mesh (Istio, Linkerd) as a sidecar proxy — application code does not handle TLS directly.

mTLS cost. One extra round-trip per new connection (client cert request + send). 20–40 ms added to first-connection latency. On warm connections (session resumption), the cost is zero. Operational cost: certificate distribution, rotation, and revocation must be automated — doing this manually for hundreds of services is impractical.

Three gates cost under 5 ms each; mTLS on a new connection costs roughly 10x more — so it pays to skip it on warm, session-resumed connections.

▸Edge cases

mTLS does not protect against a compromised-but-valid certificate holder. If an attacker steals a valid client certificate, mTLS trusts them. Defence: short certificate lifetimes (1 hour via SPIFFE) + OCSP stapling for revocation. The probability of an attacker intercepting and using a stolen cert before it expires drops dramatically when certs live only an hour.

Trace it

1/6

Trace a request from a normal user and a botnet IP through all four gates.

Step 1 of 6

Normal user: TCP SYN arrives at Patty's edge. First check?

Locked

Cache miss. Rate limiter check for normal user (2 req/s IP limit)?

Locked

WAF check for normal user?

Locked

Botnet IP: 10,000 IPs each sending 1 req/s. Per-IP rate limiter?

Locked

Global concurrency limit at origin. In-flight requests exceed threshold. What happens?

Locked

mTLS for an internal API call (payment service → user service). What extra step?

Quiz

A botnet sends 1 request/second from 10,000 different IPs. Your per-IP rate limit is 5 req/s. How do you defend without blocking legitimate users?

Quiz

Why do mTLS certificates need to be short-lived (e.g., 1 hour) in a zero-trust service mesh?

Each gate adds under 5 ms (mTLS 20–40 ms on first connection) but removes a whole attack class. A cache hit short-circuits all four gates; none of the gates stops every attack alone — a distributed botnet slips through per-IP rate limiting, so a global concurrency limit at origin is still needed.

Recall before you leave

01
What is connection draining, and why is it needed during a rolling deploy?
02
Why does adding jitter to rate-limit token-refill timing reduce origin load spikes?
03
What does SPIFFE add over manually managing mTLS certificates?

Recap

Before a request reaches origin, it passes through four gates: CDN cache lookup (eliminates the request entirely if cached), per-IP rate limiter using a token bucket (stops single-IP floods), WAF signature matching (blocks known attack patterns), and optional mTLS client certificate verification (prevents unauthenticated service-to-service calls in a zero-trust mesh). Each gate adds under 5 ms except mTLS (20–40 ms on first connection). Distributed botnets defeat per-IP rate limits by spreading load — the defence is a global adaptive concurrency limiter at origin that rejects excess in-flight requests with fast 503 regardless of per-IP quota. SPIFFE automates mTLS certificate issuance and rotation, making short-lived hourly certs operationally practical for hundreds of microservices. Now when you design a new API surface or review a security incident, ask which of the four gates was missing or misconfigured — that is almost always where the exploit entered.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 5 done

Connected lessons

builds on

The twelve layers: one URL, seven actorsjunior

unlocks

Resilience: cascading retries, circuit breakers, and error budgetssenior

deepens into

Resilience: cascading retries, circuit breakers, and error budgetssenior

appears again in287

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.