Data Engineering
Event sourcing: multiple-choice review
Six questions that cut across the whole unit. Each mirrors a decision you make designing or operating an event-sourced system — not a definition to recite, but a tradeoff to weigh when the log is the truth and everything else is derived.
Confirm you can connect the append-only log, projections, replay, snapshots, versioning, and GDPR — the synthesis the lesson built toward, at the depth where the decisions actually bite.
A teammate argues 'we already publish domain events to Kafka after every DB write, so we are event-sourced.' Why is that probably false?
Why are projections (read models) safe to treat as disposable, and what does that buy you operationally?
A user clicks Save, you append the event, then redirect to a list rendered from the projection — and their change is not there. What is the root cause and the right fix?
Loading one aggregate folds 4 million events and is too slow. You add snapshots. What is the one failure mode a senior guards against?
OrderPlaced gains a required currency field. Millions of historical OrderPlaced events lack it. What is the standard approach?
A user exercises GDPR erasure; their PII is embedded across hundreds of immutable events. What does a senior actually ship?
The through-line of the unit is one inversion and its consequences: the append-only log is the source of truth, current state is a fold, and everything else is derived. Projections are disposable read models rebuilt by replay, so reads are eventually consistent. Snapshots bound replay cost but drift silently if unguarded. Versioning is permanent — old shapes live forever, so you upcast instead of rewriting. And GDPR against an immutable log is met with crypto-shredding, with a real legal caveat. Every answer here resolves back to the same rule: never mutate the log; derive, replay, and transform around it.