awesome-everything RU
↑ Back to the climb

Networking & Protocols

TCP handshake: trace and code reading

Crux Read real tcpdump traces, ss output, and a connection-handling snippet, then predict the TCP behaviour and pick the highest-leverage fix.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at senior altitude — in orbit
◷ 14 min

Packet captures and socket dumps are where TCP problems are actually diagnosed. Read each trace, work out what the connection is doing, then choose the fix a senior engineer would reach for first.

Goal

Practise the loop you run in every TCP incident: read the trace or the ss line, reconstruct the sequence-number and state machinery, and name the one change that resolves it before touching a sysctl.

Trace 1 — the handshake on the wire

14:02:11.001 IP 10.0.0.5.51514 > 10.0.0.9.6379: Flags [S],  seq 1000000, win 64240, length 0
14:02:11.001 IP 10.0.0.9.6379  > 10.0.0.5.51514: Flags [S.], seq 4200000, ack 1000001, win 65160, length 0
14:02:11.002 IP 10.0.0.5.51514 > 10.0.0.9.6379: Flags [.],  seq 1000001, ack 4200001, win 64240, length 0
14:02:11.002 IP 10.0.0.5.51514 > 10.0.0.9.6379: Flags [P.], seq 1000001, ack 4200001, win 64240, length 14
Quiz

The fourth packet carries 14 bytes of data starting at seq 1000001. Why does it start there and not at 1000000, and what does ack 4200001 confirm?

Trace 2 — SYN flood under cookies

$ sudo tcpdump -i eth0 'tcp[tcpflags] & tcp-syn != 0' -tnn
IP 203.0.113.7.40001 > 198.51.100.10.443: Flags [S],  seq 100, win 29200, length 0
IP 198.51.100.10.443 > 203.0.113.7.40001: Flags [S.], seq 3851629874, ack 101, win 28960, length 0
IP 203.0.113.9.55000 > 198.51.100.10.443: Flags [S],  seq 200, win 29200, length 0
IP 198.51.100.10.443 > 203.0.113.9.55000: Flags [S.], seq 4025814937, ack 201, win 28960, length 0
(thousands more SYNs from random sources; almost none send the final ACK)
$ nstat -az | grep -i syncookies
TcpExtSyncookiesSent            41827
TcpExtSyncookiesRecv              312
Quiz

SyncookiesSent is 41,827 but SyncookiesRecv is only 312. The server is not running out of memory. What is happening and why is the server safe?

Trace 3 — ss during an incident

$ ss -tan state established | wc -l
12384
$ ss -tan state close-wait | wc -l
9821
$ ss -s
TCP:   23552 (estab 12384, closed 8920, orphaned 2, timewait 1247)
$ ps -p 1234 -o pid,stat,rss,cmd
  PID STAT   RSS CMD
 1234 Ssl 8392000 /usr/bin/app-server   # RSS climbing every minute
Quiz

9.8k sockets sit in CLOSE-WAIT and RSS keeps climbing. What is the bug, and where do you look in the code?

Trace 4 — ss -tin on a slow transfer

$ ss -tin state established dst 203.0.113.20
ESTAB 0 0 10.0.0.5:44120 203.0.113.20:443
   cubic wscale:7,7 rtt:148.2/9.1 mss:1448 cwnd:11 ssthresh:8
   bytes_sent:9M bytes_acked:9M retrans:0/214 reordering:12 rate:780Kbps
Quiz

148 ms RTT, cwnd stuck at 11 MSS with ssthresh 8, 214 cumulative retransmits, and CUBIC — yet throughput is only 780 Kbps on a fast link. What is the diagnosis and the most promising change?

Recap

Every TCP incident is read in traces. The sequence-number math falls out of one rule: the SYN (and FIN) each consume one number, so the first data byte is ISN+1. A high SyncookiesSent with low Recv is an absorbed flood, not a leak. CLOSE-WAIT pileup with climbing RSS is always a missing application close(). And cwnd pinned low with retransmits under CUBIC on a long-RTT lossy path is the signature that says: move this socket to BBR. Diagnose from the wire and the socket table, then make the one change the evidence points to.

Continue the climb ↑TCP handshake: measure and tame a connection-churn problem
shortcuts expand
search
K
prev piece
k
next piece
j
cycle tier
t
this menu
?
sources3
expand
  1. 01
  2. 02
  3. 03

Trademarks belong to their respective owners. Editorial reference only.