APIs
gRPC and Protobuf: the binary contract and what it costs you
A new microservice ships. Within an hour, the orders service starts reading wrong values: priority shows up as discount_pct, and refunds fire on healthy orders. Nobody changed the data — someone renumbered a protobuf field from 4 to 3 to “tidy up the schema”. On the wire there are no field names, only numbers, so every consumer still running the old schema decoded field 3 into the wrong slot. The proto compiled clean. The tests passed. Production silently corrupted state for forty minutes.
The wire format: numbers, not names
JSON ships the schema with every message. {"userId": 42, "active": true} repeats the keys userId and active in every single payload, as UTF-8 text, on every request. Protobuf does the opposite: it serializes only field number, wire type, and value. That same message becomes roughly 08 2A 10 01 — a handful of bytes where 08 encodes “field 1, varint”, 2A is 42, and so on. The field names exist only in the .proto file at compile time; they never travel.
That single design choice produces the numbers seniors actually care about. For numeric-heavy data, protobuf runs about 4-5x smaller than JSON thanks to varint encoding (small integers cost one byte, not their decimal-string length). Serialization is roughly 3-6x faster and parsing 5-10x faster, because there is no text tokenizing, no key string allocation, no reflection — the decoder walks a known layout. In Go benchmarks that shows up as ~5-7x faster round-trips with ~3x fewer allocations. The catch is the inverse of JSON’s superpower: you cannot decode a protobuf message without its schema. The bytes are meaningless on their own.
Why this works
The compactness is data-shape dependent, and seniors get burned by assuming it’s universal. For string-heavy payloads protobuf is only ~4% smaller than JSON — strings are stored as raw UTF-8 in both, so the only saving is dropping the repeated key names. The big wins (4-5x) come from numbers, booleans, enums, and repeated fields. If your payload is mostly free text, the size argument for protobuf nearly vanishes; you adopt it for the schema and the speed, not the bytes.
Field numbers are the contract, forever
Because the wire carries field 3 and never "discount_pct", the field number — not the name — is the identity of the data. This flips the usual intuition completely:
- Renaming a field is free.
discount_pct→discountPercentchanges nothing on the wire; old and new binaries interoperate perfectly. The name is a local source-code label. - Renumbering a field is catastrophic. Change
3to4and every peer still on the old schema reads your new field4into whatever they call field4, and your old field3data lands wherever they expect3. This is the Hook bug. The compiler cannot catch it because each side compiles a self-consistent schema. - Reusing a deleted number is the same disaster, delayed. Delete field
3, then later add a new unrelated field and assign it3— now old clients sending the original field3poison the new one.
The discipline that prevents all of this is add-only evolution plus reserved. You never change or reuse a number; you only append new fields with new numbers, and when you remove a field you reserve its number (and name) so nobody can ever reclaim it:
message Order {
reserved 3; // discount_pct, removed 2026-Q1 — never reuse
reserved "discount_pct";
string id = 1;
int64 amount_cents = 2;
Priority priority = 4; // safe: appended with a fresh number
}Done this way, the schema is both backward compatible (new code reads messages written by old code: missing new fields just take their proto3 defaults — 0, "", empty) and forward compatible (old code reads messages from new code: unknown field numbers are skipped and preserved, not errored). That two-way compatibility is the entire reason large polyglot systems can deploy producers and consumers independently.
| Schema change | Safe? | Why (wire-level reason) |
|---|---|---|
| Add a new field with a new number | Yes | Old peers skip the unknown number; new fields default for old writers |
| Rename a field (same number) | Yes | Names never travel; only the number is on the wire |
Remove a field + reserved its number | Yes | Reserve blocks future reuse, so the number can’t be poisoned |
| Change a field’s number | No | Peers decode the value into the wrong field — silent corruption |
| Reuse a deleted number for a new field | No | Stragglers on the old schema write the old meaning into it |
One connection, four call shapes
gRPC runs over HTTP/2, and that buys multiplexing: many concurrent RPCs interleave as independent streams over a single TCP connection, with no head-of-line blocking at the HTTP layer — 50 in-flight calls need one connection, not 50. A cancelled or deadlined call sends an RST_STREAM that kills just that stream and leaves the others untouched. This is why gRPC throughput in a service mesh holds up under concurrency where HTTP/1.1’s connection-per-request model collapses.
On top of that single connection, the schema’s service block defines four call shapes, and picking the wrong one is a common design smell:
- Unary — one request, one response. The REST-shaped 95% case.
- Server streaming — one request, a stream of responses (a price feed, a log tail, server-pushed progress).
- Client streaming — a stream of requests, one response (uploading chunks, batching telemetry).
- Bidirectional streaming — both sides stream independently over the same call (chat, live collaboration, a long-lived control channel).
service PriceService {
rpc GetQuote(QuoteRequest) returns (Quote); // unary
rpc WatchPrices(WatchRequest) returns (stream Quote); // server streaming
rpc UploadTrades(stream Trade) returns (UploadSummary); // client streaming
rpc Trade(stream Order) returns (stream Fill); // bidirectional
}Two operational features ride along and are easy to skip until they bite. Deadlines: the client sets a timeout that propagates down the call chain; when it expires the RPC is auto-cancelled on both ends, so a slow downstream can’t pin server threads forever. The classic confusing failure is a deadline-exceeded call that succeeded on the server (“I sent all responses”) but failed on the client (“they arrived too late”) — the work happened, the caller saw an error. Cancellation: either side can abort mid-flight, and propagated cancellation stops downstream work instead of letting orphaned requests grind on. Skipping deadlines is the most common reason a gRPC mesh cascades into resource exhaustion under partial failure.
Where gRPC wins, and where it hurts
gRPC is the right default for internal, service-to-service traffic: a typed contract shared across polyglot services, the 4-10x speed and size wins on hot paths, native streaming, and deadline propagation across the call graph. In a busy mesh those savings are not academic — they cut tail latency and CPU on the serialization path that REST burns on JSON.
The pain is real and lands in two places. First, the browser cannot speak gRPC. Browsers can’t control HTTP/2 framing or trailers, so you need grpc-web plus a proxy (Envoy, or the framework’s own) to translate between the browser and the gRPC backend — extra infrastructure, and grpc-web still can’t do client or bidirectional streaming. (Connect-RPC sidesteps some of this by serving unary calls as plain HTTP.) Second, debuggability collapses. A binary payload is unreadable in the Chrome Network tab — you see “bytes transferred”, not fields. curl is useless without the schema and a decoder; trailers hide inside the response body bytes where DevTools has no hook. Logs, request replay, and ad-hoc inspection all need extra tooling. That opacity is the tax you pay for the compactness — the same property that makes the wire small makes it inscrutable to humans.
A public-facing REST endpoint hit directly from a React app is slow under load. A teammate proposes switching it to gRPC. Pick the call.
A field discount_pct = 3 is removed from a proto. What's the senior move to keep the schema safe?
Why can't you paste a captured gRPC request into a tool and read it like a JSON body?
Order the safe steps to evolve a proto message in production:
- 1 Never change or reuse an existing field number — the number is the wire identity
- 2 To remove a field, delete it and add reserved for its number and name
- 3 To add data, append a new field with a fresh, never-used number
- 4 Rely on proto3 defaults so old readers tolerate missing new fields (backward compat)
- 5 Rely on unknown-field skipping so old readers tolerate new fields (forward compat)
- 01Explain why renaming a protobuf field is safe but renumbering it is a production-grade disaster.
- 02When would you choose gRPC over REST/JSON, and what concrete costs does that decision drag in?
gRPC’s defining choice is the protobuf wire format: it serializes field number, wire type, and value, never the field names — those live only in the .proto. That makes payloads 4-5x smaller and 4-10x faster to (de)serialize than JSON for numeric data, but it also means the field number, not the name, is the data’s identity. So renaming is free, renumbering or reusing a deleted number silently corrupts every peer on a different schema, and the only safe evolution is append-only with reserved markers for removals — which is what gives protobuf its two-way backward/forward compatibility. Over HTTP/2, gRPC multiplexes many RPCs on one connection and offers four call shapes (unary, server, client, bidirectional streaming) plus deadlines and cancellation that propagate down the call graph. The price is that browsers need grpc-web and a proxy, and the binary payload is opaque to humans and standard tooling. The senior pattern: gRPC for internal service-to-service hops, REST/JSON or Connect-RPC at the browser edge.