Algorithms from zero
Hashing: build a resizing hash map
Reading about chaining, load factor, and amortized resize is not the same as making them work. Build a hash map from the array up — no built-in Map under the hood — then drive it into its worst case and watch the theory show up in the numbers.
Turn the unit’s mental model into working code and measured evidence: implement set/get/delete over an array of chains, resize on a load-factor threshold, and prove with experiments that lookups are O(1) average, that resize is O(1) amortized, and that a bad hash collapses it to O(n).
Implement a hash map from scratch over a raw array using separate chaining and load-factor-triggered resizing, then design experiments that demonstrate its average-case, amortized, and worst-case behaviour with real measurements — not just claims.
- A correctness test suite: set/get/has/delete round-trips, overwriting an existing key returns the new value (not a duplicate), deleting a key removes only that entry from its chain, and all keys survive a resize.
- A results table for Experiment A showing average comparisons-per-lookup roughly flat across N from 1k to 100k, with the measured load factor next to it.
- A results table or plot for Experiment B showing the per-insert cost spiking at resize points while the cumulative average per insert stays bounded by a small constant.
- A results table for Experiment C with the bad hash showing comparisons-per-lookup rising roughly linearly with N, plus a one-paragraph explanation tying it back to chain length = load factor and the O(1 + alpha) formula.
- Add automatic shrink-on-delete: when the load factor drops below 0.25, halve the array and rehash. Show it keeps memory bounded under a delete-heavy workload without breaking lookups.
- Add a randomized seed to the hash function (mix a per-instance random value into h) and demonstrate that two instances place the same keys in different buckets — the defense against an adversary crafting colliding inputs.
- Implement an open-addressing variant (linear probing) with the same interface, then compare its lookup comparisons and memory against the chained version at load factors 0.5, 0.75, and 0.9.
- Use your HashMap (not a built-in) to solve two-sum and group-anagrams, and confirm both run in one O(n) pass over the input by reading your instrumentation.
This is the loop behind every hash map you will ever use: an array plus a hash function gives O(1) average access, chaining absorbs the inevitable collisions, the load factor governs chain length, and resizing keeps it bounded at O(1) amortized. Building it once and measuring its average, amortized, and worst-case behaviour turns ‘hash maps are O(1)’ from a slogan into something you can prove — and know exactly when it stops being true.