awesome-everything RU
↑ Back to the climb

Data Engineering

Parquet: free-recall review

Crux Free-recall prompts across the Parquet unit — columnar layout, footer pushdown, encoding vs compression, row-group sizing, schema evolution, and table formats.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at senior altitude — in orbit
◷ 14 min

Retrieval beats re-reading. For each prompt, say or write a full answer from memory before you open the model answer — the effort of recall is what makes the layout decisions stick.

Goal

Reconstruct the unit’s core mechanisms — columnar layout, footer-driven pushdown, the encoding/compression split, row-group sizing, schema evolution, and what table formats add — without looking back at the lesson.

Recall before you leave
  1. 01
    Explain end to end why a filtered, projected query on Parquet reads far less than the same query on CSV.
  2. 02
    Describe the physical nesting inside a Parquet file, from the file down to the encoded values.
  3. 03
    How do encoding and compression differ in Parquet, and why keep them mentally separate?
  4. 04
    What is the small-files problem, why does it cripple query planning, and how do table formats help?
  5. 05
    How do you choose a row-group size, and what goes wrong at each extreme?
  6. 06
    Why is schema evolution a trap with raw Parquet, and how do table formats make it safe?
Recap

If you could reconstruct each answer from memory you hold the unit’s spine: Parquet is columnar and self-describing, so pruning and pushdown read only what a query needs — but only when data is clustered by the filter columns. The file nests file to row group to column chunk to page, and each page is encoded (a structural, type-aware layer) then compressed (a byte codec) — two separate wins with separate failure modes. Row-group size is a real knob with bad extremes both ways, the small-files problem is fixed by compaction, and because raw Parquet has no transactions or stable schema identity, table formats wrap it with a manifest for ACID, safe schema evolution, time travel, and file-level pruning.

Continue the climb ↑Parquet: code and config reading
shortcuts expand
search
K
prev piece
k
next piece
j
cycle tier
t
this menu
?
sources2
expand
  1. 01
  2. 02

Trademarks belong to their respective owners. Editorial reference only.