Crux Read real tcpdump traces, ss output, and a connection-handling snippet, then predict the TCP behaviour and pick the highest-leverage fix.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at senior altitude — in orbit
◷ 14 min
Packet captures and socket dumps are where TCP problems are actually diagnosed. Read each trace, work out what the connection is doing, then choose the fix a senior engineer would reach for first.
Goal
Practise the loop you run in every TCP incident: read the trace or the ss line, reconstruct the sequence-number and state machinery, and name the one change that resolves it before touching a sysctl.
The fourth packet carries 14 bytes of data starting at seq 1000001. Why does it start there and not at 1000000, and what does ack 4200001 confirm?
Heads-up There is no reserved data byte. The offset is because the SYN itself consumes exactly one sequence number, so application data begins at ISN+1.
Heads-up The ACK field is the next expected sequence number, not a byte count. It is server-ISN+1 because the server's SYN consumed one number; no data has flowed yet.
Heads-up The third packet is the bare ACK; the fourth has the PSH flag and a 14-byte payload — it is the first application write, not a retransmit.
SyncookiesSent is 41,827 but SyncookiesRecv is only 312. The server is not running out of memory. What is happening and why is the server safe?
Heads-up That is exactly what cookies prevent — no request_sock is allocated per SYN. The high Sent with low Recv is the signature of an absorbed flood, not a leak.
Heads-up Low Recv is expected: a spoofed-source flood never completes the handshake, so few cookies come back. The 312 are legitimate clients; validation is working.
Heads-up Disabling cookies would let the half-open queue overflow and refuse legitimate clients. Cookies are the defence; the cost is CPU for the HMAC, not memory.
Trace 3 — ss during an incident
$ ss -tan state established | wc -l12384$ ss -tan state close-wait | wc -l9821$ ss -sTCP: 23552 (estab 12384, closed 8920, orphaned 2, timewait 1247)$ ps -p 1234 -o pid,stat,rss,cmd PID STAT RSS CMD 1234 Ssl 8392000 /usr/bin/app-server # RSS climbing every minute
Quiz
Completed
9.8k sockets sit in CLOSE-WAIT and RSS keeps climbing. What is the bug, and where do you look in the code?
Heads-up 1247 TIME-WAIT is normal and self-draining; the leak is the 9.8k CLOSE-WAIT, which only the application calling close() can clear. MSL is irrelevant here.
Heads-up The kernel cannot reclaim a CLOSE-WAIT socket until the application calls close(). This is an application bug, not a kernel-performance issue.
Heads-up Port range governs outbound connect() and TIME-WAIT exhaustion, not inbound CLOSE-WAIT accumulation. The fix is closing accepted sockets, not more ports.
Trace 4 — ss -tin on a slow transfer
$ ss -tin state established dst 203.0.113.20ESTAB 0 0 10.0.0.5:44120 203.0.113.20:443 cubic wscale:7,7 rtt:148.2/9.1 mss:1448 cwnd:11 ssthresh:8 bytes_sent:9M bytes_acked:9M retrans:0/214 reordering:12 rate:780Kbps
Quiz
Completed
148 ms RTT, cwnd stuck at 11 MSS with ssthresh 8, 214 cumulative retransmits, and CUBIC — yet throughput is only 780 Kbps on a fast link. What is the diagnosis and the most promising change?
Heads-up wscale:7 already allows a multi-megabyte window; the limiter is that CUBIC keeps cutting cwnd to ~11 on loss. The change is the congestion-control algorithm, not the scale factor.
Heads-up 1448 is the normal Ethernet MSS and is not the bottleneck. The retransmit-driven cwnd collapse under CUBIC is what caps throughput; jumbo frames will not survive the path anyway.
Heads-up ss reports it as ESTAB and actively sending. The throughput limit is CUBIC's loss response on a 148 ms path, addressed by moving to BBR.
Recap
Every TCP incident is read in traces. The sequence-number math falls out of one rule: the SYN (and FIN) each consume one number, so the first data byte is ISN+1. A high SyncookiesSent with low Recv is an absorbed flood, not a leak. CLOSE-WAIT pileup with climbing RSS is always a missing application close(). And cwnd pinned low with retransmits under CUBIC on a long-RTT lossy path is the signature that says: move this socket to BBR. Diagnose from the wire and the socket table, then make the one change the evidence points to.