awesome-everything RU
↑ Back to the climb

Observability

OTel Logs Data Model and audit logs as a subsystem

Crux The OTel Logs Data Model is the cross-vendor schema converging point for 2026. Audit logs share the JSON format but differ on retention, immutability, and access — mixing them with operational logs is a compliance failure.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at senior altitude — in orbit
◷ 14 min

A team passes a SOC 2 audit. The next year they switch log backends. They rewrite the log schema, redeploy the collector, and break the audit-log queries that the compliance team depends on. The log schema was vendor-proprietary — not OTel. The audit log separation was never formalized. Two months of remediation follows.

The OTel Logs Data Model

The OpenTelemetry Logs specification (API stable: late 2023; SDKs stable in most languages: late 2024) defines a log record as a typed structure with the following fields:

  • Timestamp: when the event occurred (set by the application).
  • ObservedTimestamp: when the collector observed the record. Under clock skew or collector backpressure these two diverge — senior teams alert on (ObservedTimestamp - Timestamp) p99 > 60s as a pipeline-health metric.
  • SeverityNumber: a numeric ladder (TRACE=1-4, DEBUG=5-8, INFO=9-12, WARN=13-16, ERROR=17-20, FATAL=21-24) that lets backends compare severity across libraries that use different text labels (DEBUG vs Debug vs debug vs trace).
  • SeverityText: the human-readable label (e.g., "INFO", "ERROR").
  • Body: the human-readable log message.
  • Resource: per-emitter attributes set once at service start — service.name, host.name, cloud.region, service.version, deployment.environment. These are the OTel Resource concept, not per-event fields.
  • Attributes: the flat key-value map of event-specific data, using OTel Semantic Conventions where they apply — http.route, http.response.status_code, db.system, error.type, exception.type.
  • TraceId, SpanId, TraceFlags: inherited from the active span at emit time (as covered in lesson 06).

The wire format is OTLP/Logs: protobuf over gRPC or HTTP/1.1. Compressed OTLP is 30-50% smaller than the equivalent JSON, which matters at fleet scale.

Field groupFieldsSet by
TimeTimestamp, ObservedTimestampApplication / Collector
SeveritySeverityNumber, SeverityTextApplication
ContentBody, AttributesApplication (event-specific)
Resourceservice.name, host.name, cloud.region, …SDK at startup (once per emitter)
Trace contextTraceId, SpanId, TraceFlagsOTel SDK (from active span)

Why vendor portability matters

Adopting the OTel Logs Data Model now means backend swaps are a collector configuration change, not an instrumentation rewrite. A service that emits OTel-shaped logs today can point the OTel Collector at Honeycomb, ClickHouse, Loki, Datadog (via the OTLP receiver), or Splunk (via the OTel ingest path) by changing one exporter block. No application code changes.

The reverse — emitting to a vendor-proprietary schema — locks every log call site to that vendor’s field names and ingest format. Migration cost is effectively re-instrumenting every service.

The combination of cheap-to-adopt (every modern logger SDK is OTel-compatible or has a thin adapter) and high-cost-to-defer is the definition of a no-regret architecture choice.

Why this works

The two-timestamp design (Timestamp vs ObservedTimestamp) is worth understanding. Timestamp reflects when the application decided the event occurred — it can be set to a past time for late-arriving events. ObservedTimestamp is set by the collector the moment it receives the record. When a collector pipeline has backpressure, the ObservedTimestamp diverges from ingest time at the backend. Monitoring the p99 of (ObservedTimestamp - Timestamp) detects both collector backpressure and clock skew. If it stays below 60 seconds, the pipeline is healthy.

Audit logs as a first-class subsystem

Operational logs answer “why is the service slow?” Audit logs answer “who did what, when, to which resource?”

They share the JSON schema and trace_id. But they differ on three axes that make mixing them a compliance failure:

Retention: operational logs are kept 7-90 days (hot/warm/cold tiers). Audit logs are kept 1-7 years (SOC 2: 1 year; HIPAA / PCI DSS: 7 years). A single index with a 30-day retention policy will silently delete audit events that compliance mandates be kept for years.

Immutability: operational logs can be deleted (retention policies, right-to-erasure, cost control). Audit logs must be append-only — hash-chained or cryptographically signed — so that tampering is detectable. Many compliance frameworks explicitly require that audit log integrity be verifiable.

Access: operational logs are readable by the development team for incident response. Audit logs are readable by a narrow role (platform security team + external auditors) with an audit-of-audits: access to the audit log index is itself logged.

The architectural pattern: emit audit events through the same logger SDK used for operational events, but with a category: "audit" attribute. The collector routes category=audit lines through a separate pipeline to a dedicated backend index with the stricter retention, immutability, and access properties. All other lines go to the operational pipeline. The routing is transparent to the application — it just sets the field.

What belongs in audit logs: authentication events (login, logout, failed login), authorization changes (role grants, permission changes), data access on regulated records (read of a customer’s health or financial data), configuration changes (secrets rotated, access control lists modified). The schema is narrower than operational logs: typically who, what, when, resource, outcome, trace_id.

Quiz

Which specification defines the SeverityNumber enumeration (1-24) that maps log levels across libraries and backends to a comparable numeric severity?

Quiz

A team stores operational logs and audit logs in the same index with a 30-day retention policy. What is the compliance failure?

Order the steps

Order the steps to implement a compliant audit-log subsystem alongside operational logging:

  1. 1 Define the audit event types: auth, role change, data access, config change
  2. 2 Add category='audit' attribute to all audit event log calls in the application code
  3. 3 Configure a collector routing rule: category=audit goes to the audit pipeline, everything else to the operational pipeline
  4. 4 Create a dedicated audit backend index with append-only storage, 1-year minimum retention
  5. 5 Restrict audit index access to the platform security team and auditors; configure audit-of-audits (access to the audit index is itself logged)
  6. 6 Add a CI check that fires a test auth event and verifies it reaches the audit index, not the operational index
Recall before you leave
  1. 01
    Why is the OTel Logs Data Model the right choice even for a team not yet using any OTel-aware backend?
  2. 02
    What is the significance of the two-timestamp design (Timestamp vs ObservedTimestamp) in the OTel Logs Data Model?
  3. 03
    Name the three axes on which audit logs must differ from operational logs, and why each matters for compliance.
Recap

The OTel Logs Data Model (API stable late 2023, SDK stable late 2024) defines the cross-vendor log record schema: Timestamp (application), ObservedTimestamp (collector), SeverityNumber 1-24 for cross-library severity comparison, SeverityText, Body, Resource attributes set once per emitter, event Attributes using OTel Semantic Conventions, and TraceId/SpanId/TraceFlags from the active span. The two-timestamp design detects pipeline backpressure and clock skew: alert when (ObservedTimestamp - Timestamp) p99 exceeds 60 seconds. Adopting this schema today makes backend migration a collector config change, not a re-instrumentation. Audit logs share the JSON schema but require separation: dedicate a pipeline and backend index with 1-7 years retention (SOC 2/HIPAA/PCI), append-only immutability, and narrow access with audit-of-audits. Route via a category=audit attribute at the collector — the application just sets the field. Mixing audit and operational logs in one index silently violates retention, immutability, and access requirements.

Connected lessons
appears again in268
Continue the climb ↑Structured logging: multiple-choice review
shortcuts expand
search
K
prev piece
k
next piece
j
cycle tier
t
this menu
?
sources3
expand
  1. 01
  2. 02
  3. 03

Trademarks belong to their respective owners. Editorial reference only.