awesome-everything RU
↑ Back to the climb

Networking & Protocols

WebSocket frame format: opcodes, masking, fragmentation

Crux The 2-byte header that carries every WebSocket message — what FIN, opcode, MASK, and the three-tier length encoding mean, and why client frames must be masked.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at middle altitude — in the sky
◷ 12 min

After the WebSocket handshake the HTTP parser is gone. What flows on the wire is a compact binary format that carries every message — text, binary, keepalive pings, and graceful closes — in as few as 2 bytes of overhead. Understanding that format is what separates “it sometimes works” from “I know exactly what broke.”

The frame header anatomy

A WebSocket frame starts with 2 mandatory bytes, followed by optional length extension and masking key fields, and then the payload:

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-------+-+-------------+-------------------------------+
|F|R|R|R| opcode|M| Payload len |    Extended payload length    |
|I|S|S|S|  (4)  |A|     (7)    |             (16/64)           |
|N|V|V|V|       |S|             |   (if payload len==126/127)   |
| |1|2|3|       |K|             |                               |
+-+-+-+-+-------+-+-------------+-------------------------------+
|     Masking-key (if MASK set, 4 bytes)                        |
+---------------------------------------------------------------+
|                    Payload data                               |
+---------------------------------------------------------------+

Byte 1 breakdown:

  • FIN (bit 7)1 means this is the last (or only) fragment of a message.
  • RSV1-3 (bits 6-4) — reserved for extensions (e.g., permessage-deflate sets RSV1=1).
  • Opcode (bits 3-0) — what kind of data the frame carries:
OpcodeMeaning
0x0Continuation frame
0x1Text data (UTF-8)
0x2Binary data
0x8Close
0x9Ping
0xAPong

Byte 2 breakdown:

  • MASK (bit 7)1 = payload is XOR-masked (client→server always; server→client never).
  • Payload length (bits 6-0):
    • 0–125 — the actual length.
    • 126 — next 2 bytes (uint16) hold the real length.
    • 127 — next 8 bytes (uint64) hold the real length.

Frame overhead totals:

  • Small server→client frame: 2 bytes header only.
  • Small client→server frame: 2 bytes header + 4 bytes masking key = 6 bytes.
WebSocket frame overhead at a glance
Minimum frame header (no mask, payload ≤125 bytes)
2 bytes
Client→server overhead (mask required)
6 bytes
Max payload in 7-bit length field
125 bytes
Length extension for 126–65535 byte payloads
+2 bytes (uint16)
Length extension for larger payloads
+8 bytes (uint64)
Control frames (ping/pong/close) max payload
125 bytes

Why client frames must be masked

Masking is not encryption — it is a cache-poisoning defense. Here is the attack it prevents:

A malicious JavaScript on site-a.com opens a WebSocket connection to an intermediate proxy. It then sends bytes that happen to spell out a valid HTTP response. If the proxy is naive and stateless, it treats those bytes as HTTP and reflects them to other clients — poisoning its cache.

With masking, the client XORs every payload byte with a 4-byte random key sent in the frame header:

masked_byte[i] = payload[i] XOR mask_key[i % 4]

The receiver XORs back with the same key to recover the original payload. Because the mask key is random per frame, the JavaScript on the malicious site cannot pre-craft bytes that both look like an HTTP response AND decode correctly under XOR. The attack becomes infeasible.

Server frames are not masked because JavaScript on site-a.com cannot read raw bytes from a server response on site-b.com anyway (same-origin policy blocks it).

Fragmentation and continuation frames

A large message can be split across multiple frames. Rules:

  1. First fragment: real opcode (0x1 or 0x2), FIN=0.
  2. Middle fragments: opcode 0x0 (continuation), FIN=0.
  3. Last fragment: opcode 0x0, FIN=1.

The receiver reassembles in order. Control frames (ping, pong, close) cannot be fragmented and are limited to 125 bytes; they can arrive interleaved between data fragments.

Control frames: ping, pong, close

Ping (0x9): a keepalive probe. The receiver must reply with a pong carrying the same payload. Proxies often have idle timeouts (60 seconds is common); sending a ping every 25–30 seconds resets the proxy’s timer and keeps the connection alive.

Pong (0xA): the mandatory reply to a ping. Can also be sent unsolicited as a unilateral heartbeat.

Close (0x8): initiates the closing handshake. The body contains an optional 2-byte status code followed by UTF-8 reason text. Standard codes:

CodeMeaning
1000Normal closure
1001Going away (server shutdown, tab closed)
1006Abnormal closure (no close frame; generated by the implementation)
1008Policy violation
1011Unexpected condition
1013Try again later

After sending a close frame, each side must wait for the peer’s close frame before closing the TCP connection.

Why this works

Why RSV bits matter for extensions. The permessage-deflate extension (RFC 7692) negotiated during the handshake uses RSV1=1 to signal that the payload is DEFLATE-compressed. A server that did not negotiate the extension and sees RSV1=1 must close the connection with code 1002 (protocol error). This strict checking ensures extensions cannot silently corrupt frames.

Parsing a small WebSocket frame

1/3
# Server sends "OK" to client # Opcode 0x1 = text, FIN=1, payload = "OK" (2 bytes), no mask Frame bytes (hex): 81 02 4F 4B Byte 1: 0x81 = 10000001 FIN=1 — complete message, no fragments RSV=000 — no extensions active Opcode=0001 — text data (UTF-8) Byte 2: 0x02 = 00000010 MASK=0 — server never masks (correct) Payload length=2 Bytes 3-4: 0x4F 0x4B "O" (0x4F), "K" (0x4B) = "OK" Wire: 2 bytes header + 2 bytes payload = 4 bytes total
Quiz

Why does the client's Sec-WebSocket-Key get transformed into Sec-WebSocket-Accept by adding a fixed GUID, hashing it, and base64-encoding it?

Quiz

Why must client-to-server WebSocket frames be masked, but server-to-client frames must NOT be?

Order the steps

Order the steps of a WebSocket close handshake:

  1. 1 One side sends a close frame with status code 1000
  2. 2 The other side receives it and replies with a close frame
  3. 3 The sender of the second close frame closes the TCP connection
  4. 4 Both sides are now in the closed state
Recall before you leave
  1. 01
    Explain why masking defends against cache-poisoning even though the mask key is sent in plain text inside the frame.
  2. 02
    A chat server is broadcasting a message to 10,000 connected clients. The broadcast completes in 100 ms. Network RTT is only 5 ms. Where does the other 95 ms come from?
  3. 03
    What is the FIN bit for in a WebSocket frame, and how does it interact with the opcode?
Recap

Every WebSocket message rides in one or more frames. The 2-byte header encodes the FIN bit (last fragment flag), opcode (text, binary, ping, pong, close), MASK flag, and payload length. Client-to-server frames must XOR their payload with a random 4-byte masking key to prevent cache-poisoning attacks where malicious JavaScript crafts bytes resembling HTTP responses; server-to-client frames are never masked because the same-origin policy already blocks JavaScript from reading cross-origin raw bytes. Large messages may be fragmented across frames using opcode 0x0 (continuation) with FIN=0 on all but the last. Control frames (ping, pong, close) carry at most 125 bytes and cannot be fragmented. Close frames carry a 2-byte status code; 1000 is normal, 1006 is generated when no close frame was received.

Connected lessons
appears again in152
Continue the climb ↑WebSocket vs SSE vs long-polling: choosing the right transport
shortcuts expand
search
K
prev piece
k
next piece
j
cycle tier
t
this menu
?
sources3
expand
  1. 01
  2. 02
  3. 03

Trademarks belong to their respective owners. Editorial reference only.