Algorithms from zero ALG · 03 · 10

Sorting and search: build a query engine

Hands-on project — build a small query engine that sorts once and answers membership, range, median, and threshold queries, then prove every algorithm with tests and timings.

ALG Middle ◷ 210 min

Level

FoundationsJuniorMiddleSenior

Reading about sorting and binary search is not the same as wiring them into a system that has to answer many queries fast. Build a small read-heavy query engine that sorts its data once, then serves membership, range, median, and threshold queries — and prove each algorithm is correct and fast with your own tests and timings.

Goal

Turn the unit’s mental model into a working engine: sort once as preprocessing, implement merge sort and quicksort from scratch, answer queries with binary search and its variants, find the median with quickselect, and back every claim about correctness and complexity with evidence.

Project

0 of 7

Objective

Build a query engine over a dataset of records (each with a numeric key and a payload) that sorts the data once and then answers a stream of queries — exists, range, median, and 'smallest key whose running total exceeds T' — using the unit's algorithms, with tests and timings proving each one.

Requirements

Acceptance criteria

All tests pass, including the edge cases (empty, single, all-equal, sorted, reverse-sorted), each checked against a brute-force reference.
A demonstrated stability difference: a printed before/after showing merge sort preserves equal-key order and your quicksort does not.
A timing table showing sort time scaling roughly as n log n and each query as O(log n) or O(n) (median) — measured, not estimated — across 10^3, 10^5, 10^6.
A short write-up: which sort you would ship for this engine and why (stability vs in-place memory vs worst-case), and why binary search and quickselect beat re-scanning or re-sorting per query.

Senior stretch

Add the quicksort O(n²) defence experiment: feed already-sorted input to a fixed-last-pivot quicksort and to your randomized one, and chart the wall-clock blow-up to show the worst case is real.
Support updates: when a record is inserted, keep the array sorted with an insertion-sort splice (O(n) shift) and discuss when re-sorting from scratch becomes cheaper than incremental insertion.
Generalise the threshold query into a reusable binary-search-on-answer helper that takes any monotonic predicate, and reuse it for a second problem (e.g. minimum capacity to fit all records into D buckets).
Add a guard against degenerate input on quickselect (median-of-medians or randomized pivot) and show the worst-case input no longer slows it down.

Recap

This is the loop behind every read-heavy system: pay the O(n log n) sort once, then answer queries cheaply — O(log n) with binary search and its lower/upper-bound variants, O(n) for the median with quickselect, and O(log answer-range) for monotonic threshold queries. Choosing the sort (stable merge vs in-place quicksort), defending quicksort’s pivot, and proving each algorithm with tests and timings is exactly the engineering you will repeat on real data.

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.