awesome-everything RU
↑ Back to the climb

Observability

Why structured logs exist: the diary vs the spreadsheet

Crux Free-text logs look readable and become un-queryable at scale. Structured logs are a spreadsheet — every line is a record with addressable fields.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at junior altitude — the surface
◷ 8 min

The pager fires at 02:00. You have 10 million log lines and five minutes. If logs are free-text, you write a regex and hope. If logs are structured, you write one query and know.

Diary versus spreadsheet

Free-text logs are a diary. Structured logs are a spreadsheet. Both record what happened, but only one lets you ask “show me every payment failure last week grouped by upstream provider” without reading every entry by hand.

The diary is fine when you have ten entries. At ten million entries it is unusable: fields are inconsistent, formats drift between services, and every new question requires writing a new regex.

The spreadsheet has rules: every row has the same columns, every column has a type, every value goes in the right cell.

FormatExampleQuery for “status 503 from payment”
Free-text[ERROR] checkout: gateway timeout (payment, status 503)Regex, substring match, misses variants
Structured JSON{“level”:“error”,“upstream”:“payment”,“status”:503}Single indexed-field query, milliseconds

Why the cost compounds

The cost of a log line is paid in three places: at write time (CPU + RAM for serialization), in transit (network egress), and at the backend (ingest GB + indexed-event count + retention bytes). Structured logs are not free — JSON serialization costs CPU — but they pay back at query time.

A free-text line “user 42 failed checkout because gateway returned 503” requires regex and substring search to filter. The same data as JSON {"level":"error","user_id":42,"route":"/checkout","upstream":"payment-gateway","upstream_status":503,"trace_id":"abc..."} is a single indexed-field query with millisecond response time over weeks of data.

The discipline: pick a schema, populate it consistently, and treat the log line as an API your future on-call self will read at 03:00 with no context.

The triage scenario

Bea · Browser gets paged: error rate up on checkout. She queries the log backend with service:checkout AND level:error AND @timestamp:[now-15m TO now] — gets 240 matching events, each a JSON record. She facets by the upstream field: 220 of 240 errors come from payment-gateway. She facets by upstream_status: all 220 are HTTP 503. Diagnosis in 30 seconds.

Sven · Origin server pulls one trace_id from a failing log line and opens the trace to see the gateway call timing out. If logs were free-text, Bea would have written a regex, missed cases with different wording, and lost ten minutes.

Why this works

JSON is the de-facto encoding for structured logs in 2026 — not XML, not protobuf, not a custom binary. JSON is the lowest-common-denominator that humans can still read in a terminal, that every logging backend can parse out of the box, and that every language’s standard library serializes cheaply. JSON Lines (one JSON object per line, newline-delimited) is the canonical format: append-friendly, streamable, parseable line-by-line, easy for grep + jq when you are SSH’d into a box.

Quiz

Which log line is structured?

Order the steps

Order the fields you should put on every production log line, in roughly priority order:

  1. 1 timestamp (ISO-8601 UTC, the most basic)
  2. 2 level (DEBUG / INFO / WARN / ERROR)
  3. 3 service.name (which service emitted this)
  4. 4 trace_id and span_id (the join key to traces)
  5. 5 message (human-readable summary)
  6. 6 event-specific fields (route, status_code, user_segment, ...)
  7. 7 resource attributes (host, region, version — typically set once)
Complete the analogy

Fill in the blank: a structured log is a _______, where every row has the same columns, every column has a type, and every value goes in the right cell.

Recall before you leave
  1. 01
    In two sentences, why is JSON the de-facto encoding for structured logs in 2026?
  2. 02
    What is the cost of being unstructured, and when does it show up?
  3. 03
    Why does the structured log treat trace_id as a required field, not an optional one?
Recap

Structured logs are JSON events with a stable schema: every line is a record with the same shape — timestamp, level, service.name, trace_id, message, and event-specific fields. Free-text logs record the same data as sentences, which is readable at small scale and un-queryable at million-line scale because every query needs a regex. The key payback is at query time: indexed JSON fields answer “show me all 503s from payment-gateway in the last 15 minutes” in one query with millisecond latency. The cost is at write time (CPU for serialization) and at the backend (indexed events are billed per million). The discipline is treating the log schema as an API contract — consistent across services, populated on every line.

Connected lessons
appears again in268
Continue the climb ↑The production log schema: fields every line must carry
shortcuts expand
search
K
prev piece
k
next piece
j
cycle tier
t
this menu
?
sources3
expand
  1. 01
  2. 02
  3. 03

Trademarks belong to their respective owners. Editorial reference only.