APIs API · 05 · 01

gRPC and Protobuf: the binary contract and what it costs you

gRPC trades JSON''''s human-readable payloads for a schema-bound binary wire format that is 4-10x faster and far smaller — but the field numbers become a contract you can never break, and the browser can''''t speak it without a proxy.

API Junior ◷ 17 min

Level

FoundationsJuniorMiddleSenior

A new microservice ships. Within an hour, the orders service starts reading wrong values: priority shows up as discount_pct, and refunds fire on healthy orders. Nobody changed the data — someone renumbered a protobuf field from 4 to 3 to “tidy up the schema”. On the wire there are no field names, only numbers, so every consumer still running the old schema decoded field 3 into the wrong slot. The proto compiled clean. The tests passed. Production silently corrupted state for forty minutes.

The wire format: numbers, not names

In ten minutes you’ll understand why a “clean” schema refactor corrupts production data — and how to evolve a contract that dozens of services depend on without a coordinated redeploy.

JSON ships the schema with every message. {"userId": 42, "active": true} repeats the keys userId and active in every single payload, as UTF-8 text, on every request. Protobuf does the opposite: it serializes only field number, wire type, and value. That same message becomes roughly 08 2A 10 01 — a handful of bytes where 08 encodes “field 1, varint”, 2A is 42, and so on. The field names exist only in the .proto file at compile time; they never travel.

That single design choice produces the numbers seniors actually care about. For numeric-heavy data, protobuf runs about 4-5x smaller than JSON thanks to varint encoding (small integers cost one byte, not their decimal-string length). Serialization is roughly 3-6x faster and parsing 5-10x faster, because there is no text tokenizing, no key string allocation, no reflection — the decoder walks a known layout. In Go benchmarks that shows up as ~5-7x faster round-trips with ~3x fewer allocations. The catch is the inverse of JSON’s superpower: you cannot decode a protobuf message without its schema. The bytes are meaningless on their own.

▸Why this works

The compactness is data-shape dependent, and seniors get burned by assuming it’s universal. For string-heavy payloads protobuf is only ~4% smaller than JSON — strings are stored as raw UTF-8 in both, so the only saving is dropping the repeated key names. The big wins (4-5x) come from numbers, booleans, enums, and repeated fields. If your payload is mostly free text, the size argument for protobuf nearly vanishes; you adopt it for the schema and the speed, not the bytes.

Field numbers are the contract, forever

Because the wire carries field 3 and never "discount_pct", the field number — not the name — is the identity of the data. This flips the usual intuition completely:

Renaming a field is free. discount_pct → discountPercent changes nothing on the wire; old and new binaries interoperate perfectly. The name is a local source-code label.
Renumbering a field is catastrophic. Change 3 to 4 and every peer still on the old schema reads your new field 4 into whatever they call field 4, and your old field 3 data lands wherever they expect 3. This is the Hook bug. The compiler cannot catch it because each side compiles a self-consistent schema.
Reusing a deleted number is the same disaster, delayed. Delete field 3, then later add a new unrelated field and assign it 3 — now old clients sending the original field 3 poison the new one.

The discipline that prevents all of this is add-only evolution plus reserved. You never change or reuse a number; you only append new fields with new numbers, and when you remove a field you reserve its number (and name) so nobody can ever reclaim it:

message Order {
  reserved 3;                  // discount_pct, removed 2026-Q1 — never reuse
  reserved "discount_pct";
  string id = 1;
  int64 amount_cents = 2;
  Priority priority = 4;       // safe: appended with a fresh number
}

Done this way, the schema is both backward compatible (new code reads messages written by old code: missing new fields just take their proto3 defaults — 0, "", empty) and forward compatible (old code reads messages from new code: unknown field numbers are skipped and preserved, not errored). That two-way compatibility is the entire reason large polyglot systems can deploy producers and consumers independently.

Schema change	Safe?	Why (wire-level reason)
Add a new field with a new number	Yes	Old peers skip the unknown number; new fields default for old writers
Rename a field (same number)	Yes	Names never travel; only the number is on the wire
Remove a field + `reserved` its number	Yes	Reserve blocks future reuse, so the number can’t be poisoned
Change a field’s number	No	Peers decode the value into the wrong field — silent corruption
Reuse a deleted number for a new field	No	Stragglers on the old schema write the old meaning into it

One connection, four call shapes

Ask yourself: how many concurrent gRPC calls does your service mesh handle before the connection count becomes its own problem? That’s exactly where HTTP/2 multiplexing earns its keep.

gRPC runs over HTTP/2, and that buys multiplexing: many concurrent RPCs interleave as independent streams over a single TCP connection, with no head-of-line blocking at the HTTP layer — 50 in-flight calls need one connection, not 50. A cancelled or deadlined call sends an RST_STREAM that kills just that stream and leaves the others untouched. This is why gRPC throughput in a service mesh holds up under concurrency where HTTP/1.1’s connection-per-request model collapses.

On top of that single connection, the schema’s service block defines four call shapes, and picking the wrong one is a common design smell:

Unary — one request, one response. The REST-shaped 95% case.
Server streaming — one request, a stream of responses (a price feed, a log tail, server-pushed progress).
Client streaming — a stream of requests, one response (uploading chunks, batching telemetry).
Bidirectional streaming — both sides stream independently over the same call (chat, live collaboration, a long-lived control channel).

Together these four shapes mean you match the communication pattern to the data flow — not the other way around. When you see a unary call inside a tight loop fetching a live feed, that’s the signal to reach for server streaming instead.

service PriceService {
  rpc GetQuote(QuoteRequest) returns (Quote);                  // unary
  rpc WatchPrices(WatchRequest) returns (stream Quote);        // server streaming
  rpc UploadTrades(stream Trade) returns (UploadSummary);      // client streaming
  rpc Trade(stream Order) returns (stream Fill);               // bidirectional
}

Two operational features ride along and are easy to skip until they bite. Deadlines: the client sets a timeout that propagates down the call chain; when it expires the RPC is auto-cancelled on both ends, so a slow downstream can’t pin server threads forever. The classic confusing failure is a deadline-exceeded call that succeeded on the server (“I sent all responses”) but failed on the client (“they arrived too late”) — the work happened, the caller saw an error. Cancellation: either side can abort mid-flight, and propagated cancellation stops downstream work instead of letting orphaned requests grind on. Skipping deadlines is the most common reason a gRPC mesh cascades into resource exhaustion under partial failure.

Where gRPC wins, and where it hurts

gRPC is the right default for internal, service-to-service traffic: a typed contract shared across polyglot services, the 4-10x speed and size wins on hot paths, native streaming, and deadline propagation across the call graph. In a busy mesh those savings are not academic — they cut tail latency and CPU on the serialization path that REST burns on JSON.

The pain is real and lands in two places. First, the browser cannot speak gRPC. Browsers can’t control HTTP/2 framing or trailers, so you need grpc-web plus a proxy (Envoy, or the framework’s own) to translate between the browser and the gRPC backend — extra infrastructure, and grpc-web still can’t do client or bidirectional streaming. (Connect-RPC sidesteps some of this by serving unary calls as plain HTTP.) Second, debuggability collapses. A binary payload is unreadable in the Chrome Network tab — you see “bytes transferred”, not fields. curl is useless without the schema and a decoder; trailers hide inside the response body bytes where DevTools has no hook. Logs, request replay, and ad-hoc inspection all need extra tooling. That opacity is the tax you pay for the compactness — the same property that makes the wire small makes it inscrutable to humans.

The whole trade in one frame: JSON's readability and direct browser reach versus Protobuf's 4-5x smaller, 5-10x faster schema-bound binary.

Pick the best fit

A public-facing REST endpoint hit directly from a React app is slow under load. A teammate proposes switching it to gRPC. Pick the call.

Quiz

A field discount_pct = 3 is removed from a proto. What's the senior move to keep the schema safe?

Quiz

Why can't you paste a captured gRPC request into a tool and read it like a JSON body?

Order the steps

Order the safe steps to evolve a proto message in production:

1 Never change or reuse an existing field number — the number is the wire identity
2 To remove a field, delete it and add reserved for its number and name
3 To add data, append a new field with a fresh, never-used number
4 Rely on proto3 defaults so old readers tolerate missing new fields (backward compat)
5 Rely on unknown-field skipping so old readers tolerate new fields (forward compat)

One .proto generates both stub and skeleton; the call travels as schema-bound binary frames over a multiplexed HTTP/2 connection — names never go on the wire.

Recall before you leave

01
Explain why renaming a protobuf field is safe but renumbering it is a production-grade disaster.
02
When would you choose gRPC over REST/JSON, and what concrete costs does that decision drag in?

Recap

gRPC’s defining choice is the protobuf wire format: it serializes field number, wire type, and value, never the field names — those live only in the .proto. That makes payloads 4-5x smaller and 4-10x faster to (de)serialize than JSON for numeric data, but it also means the field number, not the name, is the data’s identity. So renaming is free, renumbering or reusing a deleted number silently corrupts every peer on a different schema, and the only safe evolution is append-only with reserved markers for removals — which is what gives protobuf its two-way backward/forward compatibility. Over HTTP/2, gRPC multiplexes many RPCs on one connection and offers four call shapes (unary, server, client, bidirectional streaming) plus deadlines and cancellation that propagate down the call graph. The price is that browsers need grpc-web and a proxy, and the binary payload is opaque to humans and standard tooling. The senior pattern: gRPC for internal service-to-service hops, REST/JSON or Connect-RPC at the browser edge. Now when you see a proto PR that renumbers a field “to tidy up”, you know exactly what to do: reject it and ask for reserved instead.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 5 done

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.