Data Engineering
Event sourcing: build an auditable ledger
Reading about event sourcing is not the same as living with an append-only log. Build a small ledger where the log is the only source of truth, derive every read from it, then prove the properties that make the pattern worth its costs — temporal query, projection rebuild, snapshot speedup, schema evolution, and GDPR erasure — each with evidence.
Turn the unit’s mental model into a working system: an append-only event store with optimistic concurrency, a disposable CQRS read model, replay-based temporal queries, snapshots that bound load cost, an upcaster for an evolved event, and crypto-shredding for erasure — demonstrated, not asserted.
Build a small event-sourced account-ledger service (any language) where the append-only event log is the sole source of truth, all reads are derived projections, and you can demonstrate audit, temporal query, rebuild, snapshot, versioning, and GDPR erasure on it.
- A concurrency test: two appends at the same expected version, one succeeds and one fails with a version-conflict error — never a lost update or a silently merged write.
- An idempotency test: deliver the same event to the projection twice and show the read model is identical to single delivery (checkpoint blocks the redelivery).
- A rebuild proof: drop the balances read model, replay, and show byte-for-byte the same balances; plus the second projection produced solely from replayed history.
- A temporal-query result matching a hand-computed as-of balance, and a snapshot benchmark: load time for a long stream with snapshots vs full replay, with the numbers reported.
- An upcaster test: a stored pre-change event loads and folds correctly through the current code path, and the persisted event on disk is shown to be unchanged.
- A crypto-shredding proof: before forget, the name decrypts; after destroying the key, replay still rebuilds all balances while the name is permanently unrecoverable. Note in writing the legal caveat that regulators may still treat undeletable encrypted PII as personal data.
- Add read-your-own-writes for the post-command screen: return state from the command result so the UI does not display the lagging projection, and demonstrate the eventual-consistency lag is invisible to the user.
- Make the projection a separate process consuming the stream asynchronously, measure the read-model lag under load, and show the read model converges after the write burst stops.
- Add a second event store backend (e.g. swap a SQL table for a Kafka topic) and document what you must change — infinite retention, no compaction, and how you reproduce expected-version concurrency that Kafka lacks natively.
- Add an audit endpoint that answers 'who changed the plan on date X and what was the prior value' purely from the log, demonstrating the audit trail and temporal query the customer-dispute scenario from the lesson needs.
This is the loop every event-sourced system runs: append immutable events with optimistic concurrency as the only source of truth, fold them to load aggregates and validate commands, derive disposable idempotent projections for reads, and answer temporal queries by folding up to a timestamp. Then defend the costs — snapshots to bound replay (checksummed against drift), upcasters to evolve schemas without rewriting history, and crypto-shredding to meet GDPR against an immutable log. Build it once on a toy ledger and the production version — audit disputes, new read models months later, schema evolution under load — becomes a known shape instead of a leap of faith.