Networking & Protocols
SYN cookies, TFO, and TIME-WAIT at scale
A busy load balancer starts returning EADDRNOTAVAIL on outbound connections to a backend. ss -tan shows 30,000 TIME-WAIT sockets. Simultaneously, the public listener is being probed with a SYN flood. Both problems have solutions in Linux sysctls — but the wrong sysctl makes things worse.
SYN cookies internals
Under a SYN flood, the server’s half-open queue overflows. Instead of allocating a request_sock for every SYN, the kernel takes the path in net/ipv4/tcp_ipv4.c:cookie_v4_init_sequence:
ISN_in_SYN-ACK = MAC over (saddr, sport, daddr, dport, isn, secret)The top 5 bits encode the MSS index (8 options), the next 6 bits are a minute counter (replay window), and the low 21 bits are an HMAC. The connection is forgotten immediately. When the ACK comes back, cookie_v4_check recomputes and validates. If valid, the kernel allocates a full socket. If invalid, the ACK is silently dropped.
The trade-off: CPU for memory — fine for large servers, not for small ones. The cost: only MSS survives in the cookie. Window scaling, SACK, and Timestamps offered in the original SYN are silently dropped for cookie-validated connections. On a long-RTT high-bandwidth path during a flood, legitimate clients’ throughput degrades — the window is capped at 64 KiB without scaling.
Linux enables SYN cookies when the half-open queue exceeds tcp_max_syn_backlog (default system-dependent, typically 4096). Enable with net.ipv4.tcp_syncookies=1 (default since kernel 2.4). Value 2 forces unconditional cookie generation for testing.
- Cookie protects against
- SYN flood / half-open queue overflow
- What survives in cookie
- MSS index only (5 bits)
- What is silently dropped
- Window scale, SACK, Timestamps
- Window without scaling
- Max 64 KiB per connection
- Linux default
- tcp_syncookies=1 (conditional)
- Force on for testing
- tcp_syncookies=2
tcpdump trace of a SYN flood with SYN cookies
$ sudo tcpdump -i eth0 'tcp[tcpflags] & tcp-syn != 0' -tnn
14:23:45.123456 IP 203.0.113.50.34521 > 198.51.100.10.443: Flags [S], seq 1000000, win 29200, length 0
14:23:45.123489 IP 198.51.100.10.443 > 203.0.113.50.34521: Flags [S.], seq 3851629874, ack 1000001, win 28960, length 0
14:23:45.124000 IP 203.0.113.50.34522 > 198.51.100.10.443: Flags [S], seq 2000000, win 29200, length 0
14:23:45.124050 IP 198.51.100.10.443 > 203.0.113.50.34522: Flags [S.], seq 4025814937, ack 2000001, win 28960, length 0
14:23:45.125000 IP 203.0.113.50.34521 > 198.51.100.10.443: Flags [.], seq 1000001, ack 3851629875, win 29200, length 0
14:23:45.125010 IP 203.0.113.50.34522 > 198.51.100.10.443: Flags [.], seq 2000001, ack 4025814938, win 29200, length 0
14:23:45.200000 IP 203.0.113.100.55000 > 198.51.100.10.443: Flags [S], seq 5000000, win 29200, length 0
14:23:45.200050 IP 198.51.100.10.443 > 203.0.113.100.55000: Flags [S.], seq 2941837465, ack 5000001, win 28960, length 0
(no ACK from 203.0.113.100; SYN flood from different spoofed sources) Two ACKs arrive (legitimate clients complete handshake), but the third source never ACKs. Is the server at risk?
TCP Fast Open (TFO, RFC 7413)
The standard handshake costs 1 RTT before application data. TFO eliminates that for subsequent connections:
- First connection: server sends a Fast Open Cookie option in SYN-ACK; client caches per-(server-IP, server-port).
- Subsequent connections: client puts the cookie in SYN along with up to MSS bytes of application data; server validates and processes the data while completing the handshake — saving 1 RTT.
On 280 ms RTT (London → Sydney), TFO saves 280 ms per warm connection.
Deployment reality: middleware problems killed broad server-side TFO adoption. Middleboxes (firewalls, DPI appliances, some NAT gateways) strip the TFO option from SYN, breaking cookie delivery. Some NAT gateways break cookies when the client’s external IP changes. Fallback is mandatory but adds latency. Server deployment is limited; most 0-RTT interest has shifted to QUIC, which sits above UDP and bypasses protocol ossification in network infrastructure. Linux supports both sides since kernel 3.6 (2012) via net.ipv4.tcp_fastopen bitmask (1=client, 2=server, 3=both).
A busy load balancer returns EADDRNOTAVAIL on outbound connections to one upstream origin. Trace the diagnosis and fix.
A senior engineer claims SYN cookies have no performance cost. Where is the bug in this claim?
What does tcp_tw_recycle do, and why was it removed from the Linux kernel in 4.12?
Why this works
Why TFO interest shifted to QUIC. TFO requires both the client library and the server application to opt in, middleboxes to not strip the option, and NAT gateways to not alter addresses mid-flight. QUIC sidesteps all of this by running over UDP — middleboxes that don’t understand it just forward it, and the protocol can evolve without kernel changes. QUIC’s 0-RTT resumption achieves the same latency saving as TFO but with far better deployment success. This is a pattern repeated in networking: when the existing protocol layer is too ossified to evolve, move the innovation to a layer above.
- 01Explain the SYN cookie mechanism: what is encoded in the cookie, what is lost, and when does the kernel activate cookies?
- 02What is TIME-WAIT exhaustion, what causes it, and what are the three production fixes?
- 03Why has TCP Fast Open seen limited server-side deployment despite being available since Linux 3.6?
SYN cookies defend busy servers against SYN floods by computing a cryptographic token from the connection 4-tuple and encoding it in the SYN-ACK sequence number, so no memory is allocated until a valid ACK arrives. The cost: only MSS survives in the cookie — SACK, Window Scaling, and Timestamps are dropped, hurting throughput for legitimate long-RTT clients during a flood. Enable with tcp_syncookies=1 (default) and ensure the backlog is large enough that cookies are not needed on steady-state load. TIME-WAIT exhaustion — EADDRNOTAVAIL on connect() — is fixed by tcp_tw_reuse=1 (safe timestamp-based reuse per RFC 6191), widening ip_local_port_range, and using connection pools. Never use tcp_tw_recycle (removed in Linux 4.12, broke NAT). TCP Fast Open saves 1 RTT for warm connections but requires non-stripping middleboxes; QUIC has superseded most TFO interest.