awesome-everything RU
↑ Back to the climb

Networking & Protocols

SYN cookies, TFO, and TIME-WAIT at scale

Crux How SYN cookies trade memory for CPU under flood, what TCP Fast Open saves and why deployment stalled, and how to fix TIME-WAIT exhaustion on a busy load balancer.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at senior altitude — in orbit
◷ 16 min

A busy load balancer starts returning EADDRNOTAVAIL on outbound connections to a backend. ss -tan shows 30,000 TIME-WAIT sockets. Simultaneously, the public listener is being probed with a SYN flood. Both problems have solutions in Linux sysctls — but the wrong sysctl makes things worse.

SYN cookies internals

Under a SYN flood, the server’s half-open queue overflows. Instead of allocating a request_sock for every SYN, the kernel takes the path in net/ipv4/tcp_ipv4.c:cookie_v4_init_sequence:

ISN_in_SYN-ACK = MAC over (saddr, sport, daddr, dport, isn, secret)

The top 5 bits encode the MSS index (8 options), the next 6 bits are a minute counter (replay window), and the low 21 bits are an HMAC. The connection is forgotten immediately. When the ACK comes back, cookie_v4_check recomputes and validates. If valid, the kernel allocates a full socket. If invalid, the ACK is silently dropped.

The trade-off: CPU for memory — fine for large servers, not for small ones. The cost: only MSS survives in the cookie. Window scaling, SACK, and Timestamps offered in the original SYN are silently dropped for cookie-validated connections. On a long-RTT high-bandwidth path during a flood, legitimate clients’ throughput degrades — the window is capped at 64 KiB without scaling.

Linux enables SYN cookies when the half-open queue exceeds tcp_max_syn_backlog (default system-dependent, typically 4096). Enable with net.ipv4.tcp_syncookies=1 (default since kernel 2.4). Value 2 forces unconditional cookie generation for testing.

SYN cookie trade-offs
Cookie protects against
SYN flood / half-open queue overflow
What survives in cookie
MSS index only (5 bits)
What is silently dropped
Window scale, SACK, Timestamps
Window without scaling
Max 64 KiB per connection
Linux default
tcp_syncookies=1 (conditional)
Force on for testing
tcp_syncookies=2
Debug this

tcpdump trace of a SYN flood with SYN cookies

log
$ sudo tcpdump -i eth0 'tcp[tcpflags] & tcp-syn != 0' -tnn
14:23:45.123456 IP 203.0.113.50.34521 > 198.51.100.10.443: Flags [S], seq 1000000, win 29200, length 0
14:23:45.123489 IP 198.51.100.10.443 > 203.0.113.50.34521: Flags [S.], seq 3851629874, ack 1000001, win 28960, length 0
14:23:45.124000 IP 203.0.113.50.34522 > 198.51.100.10.443: Flags [S], seq 2000000, win 29200, length 0
14:23:45.124050 IP 198.51.100.10.443 > 203.0.113.50.34522: Flags [S.], seq 4025814937, ack 2000001, win 28960, length 0
14:23:45.125000 IP 203.0.113.50.34521 > 198.51.100.10.443: Flags [.], seq 1000001, ack 3851629875, win 29200, length 0
14:23:45.125010 IP 203.0.113.50.34522 > 198.51.100.10.443: Flags [.], seq 2000001, ack 4025814938, win 29200, length 0
14:23:45.200000 IP 203.0.113.100.55000 > 198.51.100.10.443: Flags [S], seq 5000000, win 29200, length 0
14:23:45.200050 IP 198.51.100.10.443 > 203.0.113.100.55000: Flags [S.], seq 2941837465, ack 5000001, win 28960, length 0
(no ACK from 203.0.113.100; SYN flood from different spoofed sources)

Two ACKs arrive (legitimate clients complete handshake), but the third source never ACKs. Is the server at risk?

TCP Fast Open (TFO, RFC 7413)

The standard handshake costs 1 RTT before application data. TFO eliminates that for subsequent connections:

  1. First connection: server sends a Fast Open Cookie option in SYN-ACK; client caches per-(server-IP, server-port).
  2. Subsequent connections: client puts the cookie in SYN along with up to MSS bytes of application data; server validates and processes the data while completing the handshake — saving 1 RTT.

On 280 ms RTT (London → Sydney), TFO saves 280 ms per warm connection.

Deployment reality: middleware problems killed broad server-side TFO adoption. Middleboxes (firewalls, DPI appliances, some NAT gateways) strip the TFO option from SYN, breaking cookie delivery. Some NAT gateways break cookies when the client’s external IP changes. Fallback is mandatory but adds latency. Server deployment is limited; most 0-RTT interest has shifted to QUIC, which sits above UDP and bypasses protocol ossification in network infrastructure. Linux supports both sides since kernel 3.6 (2012) via net.ipv4.tcp_fastopen bitmask (1=client, 2=server, 3=both).

Trace it
1/5

A busy load balancer returns EADDRNOTAVAIL on outbound connections to one upstream origin. Trace the diagnosis and fix.

1
Step 1 of 5
Step 1: what does EADDRNOTAVAIL on connect() mean?
2
Locked
Step 2: where are the ports going?
3
Locked
Step 3: why not lower MSL to exit TIME-WAIT faster?
4
Locked
Step 4: enable net.ipv4.tcp_tw_reuse=1. Why is that safe?
5
Locked
Step 5: the structural fix?
Quiz

A senior engineer claims SYN cookies have no performance cost. Where is the bug in this claim?

Quiz

What does tcp_tw_recycle do, and why was it removed from the Linux kernel in 4.12?

Why this works

Why TFO interest shifted to QUIC. TFO requires both the client library and the server application to opt in, middleboxes to not strip the option, and NAT gateways to not alter addresses mid-flight. QUIC sidesteps all of this by running over UDP — middleboxes that don’t understand it just forward it, and the protocol can evolve without kernel changes. QUIC’s 0-RTT resumption achieves the same latency saving as TFO but with far better deployment success. This is a pattern repeated in networking: when the existing protocol layer is too ossified to evolve, move the innovation to a layer above.

Recall before you leave
  1. 01
    Explain the SYN cookie mechanism: what is encoded in the cookie, what is lost, and when does the kernel activate cookies?
  2. 02
    What is TIME-WAIT exhaustion, what causes it, and what are the three production fixes?
  3. 03
    Why has TCP Fast Open seen limited server-side deployment despite being available since Linux 3.6?
Recap

SYN cookies defend busy servers against SYN floods by computing a cryptographic token from the connection 4-tuple and encoding it in the SYN-ACK sequence number, so no memory is allocated until a valid ACK arrives. The cost: only MSS survives in the cookie — SACK, Window Scaling, and Timestamps are dropped, hurting throughput for legitimate long-RTT clients during a flood. Enable with tcp_syncookies=1 (default) and ensure the backlog is large enough that cookies are not needed on steady-state load. TIME-WAIT exhaustion — EADDRNOTAVAIL on connect() — is fixed by tcp_tw_reuse=1 (safe timestamp-based reuse per RFC 6191), widening ip_local_port_range, and using connection pools. Never use tcp_tw_recycle (removed in Linux 4.12, broke NAT). TCP Fast Open saves 1 RTT for warm connections but requires non-stripping middleboxes; QUIC has superseded most TFO interest.

Connected lessons
Continue the climb ↑BBR, production observability, and beyond TCP
shortcuts expand
search
K
prev piece
k
next piece
j
cycle tier
t
this menu
?
sources5
expand
  1. 01
  2. 02
  3. 03
  4. 04
  5. 05

Trademarks belong to their respective owners. Editorial reference only.