Data Engineering
Event sourcing: free-recall review
Retrieval beats re-reading. For each prompt, say or write a full answer from memory before opening the model answer — the effort of recall is what fixes the pattern in your head, especially the parts that bite in production.
Reconstruct the unit’s spine — log as truth, state as a fold, CQRS projections, eventual consistency, snapshots, and permanent versioning — without looking back at the lesson.
- 01What is the single inversion that defines event sourcing, and why is it the source of every benefit and every cost?
- 02A colleague says 'we publish events to Kafka, so we are event-sourced.' Why may that be false, and what would actually make it event sourcing?
- 03What is CQRS in this context, why are projections disposable, and what two properties make them safe in production?
- 04How do you handle read-model lag in the UI without leaking the architecture to the user?
- 05Why does replay cost grow unbounded, how do snapshots fix it, and what is the snapshot footgun?
- 06Why is event schema versioning permanent, how does upcasting handle it, and how does GDPR erasure interact with an immutable log?
If you could reconstruct each answer from memory, you hold the unit’s spine: one inversion (the append-only log is the truth, state is a fold) buys audit, temporal queries, and replay, and bills you in versioning, GDPR, snapshots, and eventual consistency. Event sourcing is not a Kafka topic emitted after a DB write; CQRS projections are disposable, idempotent, and eventually consistent; lag is a per-screen UX problem; snapshots bound replay but drift silently; versioning is permanent so you upcast; and GDPR is met with crypto-shredding under a legal caveat. The one rule under all of it: never mutate the log.