Observability OBS · 02 · 07

OTel Logs Data Model and audit logs as a subsystem

The OTel Logs Data Model is the cross-vendor schema converging point for 2026. Audit logs share the JSON format but differ on retention, immutability, and access — mixing them with operational logs is a compliance failure.

OBS Senior ◷ 14 min

Level

FoundationsJuniorMiddleSenior

A team passes a SOC 2 audit. The next year they switch log backends. They rewrite the log schema, redeploy the collector, and break the audit-log queries that the compliance team depends on. The log schema was vendor-proprietary — not OTel. The audit log separation was never formalized. Two months of remediation follows.

The OTel Logs Data Model

The OpenTelemetry Logs specification (API stable: late 2023; SDKs stable in most languages: late 2024) defines a log record as a typed structure with the following fields:

Timestamp: when the event occurred (set by the application).
ObservedTimestamp: when the collector observed the record. Under clock skew or collector backpressure these two diverge — senior teams alert on (ObservedTimestamp - Timestamp) p99 > 60s as a pipeline-health metric.
SeverityNumber: a numeric ladder (TRACE=1-4, DEBUG=5-8, INFO=9-12, WARN=13-16, ERROR=17-20, FATAL=21-24) that lets backends compare severity across libraries that use different text labels (DEBUG vs Debug vs debug vs trace).
SeverityText: the human-readable label (e.g., "INFO", "ERROR").
Body: the human-readable log message.
Resource: per-emitter attributes set once at service start — service.name, host.name, cloud.region, service.version, deployment.environment. These are the OTel Resource concept, not per-event fields.
Attributes: the flat key-value map of event-specific data, using OTel Semantic Conventions where they apply — http.route, http.response.status_code, db.system, error.type, exception.type.
TraceId, SpanId, TraceFlags: inherited from the active span at emit time (as covered in lesson 06).

The wire format is OTLP/Logs: protobuf over gRPC or HTTP/1.1. Compressed OTLP is 30-50% smaller than the equivalent JSON, which matters at fleet scale.

Field group	Fields	Set by
Time	Timestamp, ObservedTimestamp	Application / Collector
Severity	SeverityNumber, SeverityText	Application
Content	Body, Attributes	Application (event-specific)
Resource	service.name, host.name, cloud.region, …	SDK at startup (once per emitter)
Trace context	TraceId, SpanId, TraceFlags	OTel SDK (from active span)

A LogRecord is not flat: the fields nest by ownership. The Resource wraps everything (set once per emitter), the InstrumentationScope identifies the emitting logger, and the per-event fields sit inside. The same TraceId/SpanId on the record correlate it with the trace; an category=audit attribute is what later routes a record into the high-integrity audit stream.

Resource service.name, host.name, cloud.region — once per emitter

InstrumentationScope logger name + version

Timestamp / ObservedTimestamp app sets / collector sets

SeverityNumber + SeverityText 1-24 ladder

Body human-readable message

Attributes http.route, db.system, category=audit

TraceId / SpanId / TraceFlags correlates with the trace

Resource (set once per emitter) wraps the InstrumentationScope, which wraps the per-event record fields. TraceId/SpanId correlate the record with its trace; an attribute like category=audit routes it to the high-integrity audit stream.

Why vendor portability matters

Adopting the OTel Logs Data Model now means backend swaps are a collector configuration change, not an instrumentation rewrite. A service that emits OTel-shaped logs today can point the OTel Collector at Honeycomb, ClickHouse, Loki, Datadog (via the OTLP receiver), or Splunk (via the OTel ingest path) by changing one exporter block. No application code changes.

The reverse — emitting to a vendor-proprietary schema — locks every log call site to that vendor’s field names and ingest format. Migration cost is effectively re-instrumenting every service.

The combination of cheap-to-adopt (every modern logger SDK is OTel-compatible or has a thin adapter) and high-cost-to-defer is the definition of a no-regret architecture choice.

▸Why this works

The two-timestamp design (Timestamp vs ObservedTimestamp) is worth understanding. Timestamp reflects when the application decided the event occurred — it can be set to a past time for late-arriving events. ObservedTimestamp is set by the collector the moment it receives the record. When a collector pipeline has backpressure, the ObservedTimestamp diverges from ingest time at the backend. Monitoring the p99 of (ObservedTimestamp - Timestamp) detects both collector backpressure and clock skew. If it stays below 60 seconds, the pipeline is healthy.

Audit logs as a first-class subsystem

Operational logs answer “why is the service slow?” Audit logs answer “who did what, when, to which resource?”

They share the JSON schema and trace_id. But they differ on three axes that make mixing them a compliance failure:

Retention: operational logs are kept 7-90 days (hot/warm/cold tiers). Audit logs are kept 1-7 years (SOC 2: 1 year; HIPAA / PCI DSS: 7 years). A single index with a 30-day retention policy will silently delete audit events that compliance mandates be kept for years.

Immutability: operational logs can be deleted (retention policies, right-to-erasure, cost control). Audit logs must be append-only — hash-chained or cryptographically signed — so that tampering is detectable. Many compliance frameworks explicitly require that audit log integrity be verifiable.

Access: operational logs are readable by the development team for incident response. Audit logs are readable by a narrow role (platform security team + external auditors) with an audit-of-audits: access to the audit log index is itself logged.

Together these three axes mean that audit and operational logs cannot safely coexist in one index: the retention mismatch silently destroys compliance evidence, the immutability mismatch makes tampering undetectable, and the access mismatch exposes audit history to everyone with log search access. Without all three separated, you pass audits until you don’t.

Same JSON format and trace_id, but the three divergent axes — retention, immutability, access — are exactly why audit and operational logs need separate pipelines.

The architectural pattern: emit audit events through the same logger SDK used for operational events, but with a category: "audit" attribute. The collector routes category=audit lines through a separate pipeline to a dedicated backend index with the stricter retention, immutability, and access properties. All other lines go to the operational pipeline. The routing is transparent to the application — it just sets the field.

What belongs in audit logs: authentication events (login, logout, failed login), authorization changes (role grants, permission changes), data access on regulated records (read of a customer’s health or financial data), configuration changes (secrets rotated, access control lists modified). The schema is narrower than operational logs: typically who, what, when, resource, outcome, trace_id.

Quiz

Which specification defines the SeverityNumber enumeration (1-24) that maps log levels across libraries and backends to a comparable numeric severity?

Quiz

A team stores operational logs and audit logs in the same index with a 30-day retention policy. What is the compliance failure?

Order the steps

Order the steps to implement a compliant audit-log subsystem alongside operational logging:

1 Define the audit event types: auth, role change, data access, config change
2 Add category='audit' attribute to all audit event log calls in the application code
3 Configure a collector routing rule: category=audit goes to the audit pipeline, everything else to the operational pipeline
4 Create a dedicated audit backend index with append-only storage, 1-year minimum retention
5 Restrict audit index access to the platform security team and auditors; configure audit-of-audits (access to the audit index is itself logged)
6 Add a CI check that fires a test auth event and verifies it reaches the audit index, not the operational index

Recall before you leave

01
Why is the OTel Logs Data Model the right choice even for a team not yet using any OTel-aware backend?
02
What is the significance of the two-timestamp design (Timestamp vs ObservedTimestamp) in the OTel Logs Data Model?
03
Name the three axes on which audit logs must differ from operational logs, and why each matters for compliance.

Recap

The OTel Logs Data Model (API stable late 2023, SDK stable late 2024) defines the cross-vendor log record schema: Timestamp (application), ObservedTimestamp (collector), SeverityNumber 1-24 for cross-library severity comparison, SeverityText, Body, Resource attributes set once per emitter, event Attributes using OTel Semantic Conventions, and TraceId/SpanId/TraceFlags from the active span. The two-timestamp design detects pipeline backpressure and clock skew: alert when (ObservedTimestamp - Timestamp) p99 exceeds 60 seconds. Adopting this schema today makes backend migration a collector config change, not a re-instrumentation. Audit logs share the JSON schema but require separation: dedicate a pipeline and backend index with 1-7 years retention (SOC 2/HIPAA/PCI), append-only immutability, and narrow access with audit-of-audits. Route via a category=audit attribute at the collector — the application just sets the field. Mixing audit and operational logs in one index silently violates retention, immutability, and access requirements. Now when you see a proposed architecture that routes everything to one log index, you know the three questions to raise before it ships — not after the next audit review.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 6 done

Connected lessons

builds on

Trace context propagation in logssenior

unlocks

What is OpenTelemetry: API, SDK, Collector, OTLPjunior

appears again in297

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.