awesome-everything RU
↑ Back to the climb

Observability

The production log schema: fields every line must carry

Crux What a production-grade structured log line contains in 2026 — the OTel Logs Data Model fields, resource attributes, and why the schema is the API your on-call tooling expects.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at middle altitude — in the sky
◷ 12 min

Two services on the same team call the field “status_code” in one and “http_status” in the other. Every dashboard that joins them breaks. The fix is not a regex — it is a shared schema enforced before the code ships.

What every production log line carries

A production-grade structured log line in 2026 has at minimum these fields, in roughly this priority order:

  1. timestamp — ISO-8601 UTC with sub-second precision: 2026-04-12T14:02:11.482Z. The primary sort key for every backend.
  2. level — string enum: TRACE / DEBUG / INFO / WARN / ERROR / FATAL. Controls what gets stored and who gets paged.
  3. service.name — the OTel-canonical key identifying the emitter. This is the join key back to metrics and traces.
  4. trace_id and span_id — when the log is emitted inside a traced request, these are the join keys to the tracing backend. Without them the log-to-trace pivot does not work.
  5. message — a short human-readable summary. The only field humans skim; the rest is for machines.
  6. Event-specific fields — a flat key-value tree using OTel Semantic Conventions: http.route, http.response.status_code, db.system, error.type, etc.
  7. Resource attributes — set once at service start: host.name, cloud.region, service.version, deployment.environment. These describe the emitter, not the event.
Field groupExample fieldsSet byFrequency
Coretimestamp, level, messageLogger SDKEvery line
Service identityservice.name, service.versionLogger SDK + configEvery line
Trace contexttrace_id, span_id, trace_flagsActive span mixinInside traced requests
Event-specifichttp.route, http.response.status_code, error.typeApplication codePer event type
Resourcehost.name, cloud.region, deployment.environmentPlatform / configOnce at startup

The OTel Logs Data Model

The OpenTelemetry Logs specification (stable API late 2023, SDKs late 2024) formalises this split. A log record contains:

  • Timestamp — when the event actually occurred (set by the application).
  • ObservedTimestamp — when the collector saw the record. Under clock skew or backpressure these two diverge; senior teams alert on the divergence as a pipeline-health metric.
  • SeverityNumber — a numeric ladder (TRACE=1-4, DEBUG=5-8, INFO=9-12, WARN=13-16, ERROR=17-20, FATAL=21-24) so backends can compare severity across libraries that use different text labels.
  • Body — the human-readable message.
  • Attributes — the flat key-value map of event-specific data.
  • Resource — set once per emitter (service.name, host.name, cloud.region).
  • TraceId / SpanId / TraceFlags — W3C traceparent context inherited from the active span at emit time.

Adopting this shape — even on a non-OTel backend — buys forward compatibility: you can swap backends without rewriting instrumentation, and every dashboard and on-call run-book can speak the same field names.

OTel Semantic Conventions: one field name across all services

The Semantic Conventions define the canonical names for common fields: http.route (not route, not path, not url), http.response.status_code (not status, not http_status), db.system (not db_type), error.type. Using these names means:

  • A dashboard querying http.response.status_code works uniformly across Node and Go services.
  • A backend alert rule referencing error.type:timeout works without per-service configuration.
  • A new service that onboards the schema inherits every existing query.

The cost of not following conventions shows up in incident response: when the checkout service calls the field status and the payment service calls it http_status, the join query fails and someone has to know the mapping under pressure at 03:00.

Why this works

The schema is not bureaucracy — it is load-bearing. Every query, every dashboard, every alert rule, every run-book step that references a field name assumes that field name is stable and consistent across services. Schema drift breaks them silently: the query returns fewer results, the alert fires on only some services, the run-book step requires manual translation. Centralising the schema in a per-org wrapper logger (one module each service imports) makes drift a build-time error instead of a 03:00 discovery.

Log schema numbers
OTel Logs API stability (GA)
Late 2023
OTel Logs SDK stability (most langs)
Late 2024
Typical structured log line size
~0.5–2 KB
SeverityNumber range (TRACE to FATAL)
1–24
OTLP/Logs over gRPC, compressed
~30–50% smaller than JSON
Quiz

A log line carries Timestamp = 14:00:00 and ObservedTimestamp = 14:05:00. What does this gap indicate?

Quiz

Why does the OTel Logs Data Model define a numeric SeverityNumber (1-24) in addition to the text SeverityText?

Order the steps

Order the OTel Logs Data Model fields from most fundamental to most event-specific:

  1. 1 Timestamp — when the event occurred
  2. 2 SeverityNumber — numeric severity (1-24)
  3. 3 Resource — per-emitter attributes (service.name, host.name)
  4. 4 TraceId / SpanId — W3C trace context
  5. 5 Body — human-readable message
  6. 6 Attributes — event-specific flat key-value (http.route, error.type, ...)
Recall before you leave
  1. 01
    What are the two timestamps in the OTel Logs Data Model, and why do both matter?
  2. 02
    Why should you use OTel Semantic Convention field names (http.route, http.response.status_code) instead of your own names?
  3. 03
    Which fields should be set once at startup versus emitted on every log line?
Recap

A production log schema in 2026 follows the OTel Logs Data Model: Timestamp (event time), ObservedTimestamp (collector time), SeverityNumber (1-24) plus SeverityText, Body, Resource (per-emitter: service.name, host.name, cloud.region), Attributes (event-specific: http.route, error.type), and TraceId/SpanId (W3C trace context). Resource attributes are set once at startup; trace context is pulled from the active span at emit time; event-specific attributes are written in application code. Using OTel Semantic Convention field names uniformly across services is what makes cross-service queries, dashboards, and run-books work without per-service translation. The gap between Timestamp and ObservedTimestamp is a pipeline health signal — alert on p99 over 60 seconds.

Connected lessons
appears again in268
Continue the climb ↑Log levels and alert routing
shortcuts expand
search
K
prev piece
k
next piece
j
cycle tier
t
this menu
?
sources3
expand
  1. 01
  2. 02
  3. 03

Trademarks belong to their respective owners. Editorial reference only.