Algorithms from zero
Sorting and search: build a query engine
Reading about sorting and binary search is not the same as wiring them into a system that has to answer many queries fast. Build a small read-heavy query engine that sorts its data once, then serves membership, range, median, and threshold queries — and prove each algorithm is correct and fast with your own tests and timings.
Turn the unit’s mental model into a working engine: sort once as preprocessing, implement merge sort and quicksort from scratch, answer queries with binary search and its variants, find the median with quickselect, and back every claim about correctness and complexity with evidence.
Build a query engine over a dataset of records (each with a numeric key and a payload) that sorts the data once and then answers a stream of queries — exists, range, median, and 'smallest key whose running total exceeds T' — using the unit's algorithms, with tests and timings proving each one.
- All tests pass, including the edge cases (empty, single, all-equal, sorted, reverse-sorted), each checked against a brute-force reference.
- A demonstrated stability difference: a printed before/after showing merge sort preserves equal-key order and your quicksort does not.
- A timing table showing sort time scaling roughly as n log n and each query as O(log n) or O(n) (median) — measured, not estimated — across 10^3, 10^5, 10^6.
- A short write-up: which sort you would ship for this engine and why (stability vs in-place memory vs worst-case), and why binary search and quickselect beat re-scanning or re-sorting per query.
- Add the quicksort O(n²) defence experiment: feed already-sorted input to a fixed-last-pivot quicksort and to your randomized one, and chart the wall-clock blow-up to show the worst case is real.
- Support updates: when a record is inserted, keep the array sorted with an insertion-sort splice (O(n) shift) and discuss when re-sorting from scratch becomes cheaper than incremental insertion.
- Generalise the threshold query into a reusable binary-search-on-answer helper that takes any monotonic predicate, and reuse it for a second problem (e.g. minimum capacity to fit all records into D buckets).
- Add a guard against degenerate input on quickselect (median-of-medians or randomized pivot) and show the worst-case input no longer slows it down.
This is the loop behind every read-heavy system: pay the O(n log n) sort once, then answer queries cheaply — O(log n) with binary search and its lower/upper-bound variants, O(n) for the median with quickselect, and O(log answer-range) for monotonic threshold queries. Choosing the sort (stable merge vs in-place quicksort), defending quicksort’s pivot, and proving each algorithm with tests and timings is exactly the engineering you will repeat on real data.