Algorithms from zero
Hash maps and hash sets
In the time-and-space lesson, you learned that checking “is value X in the array?” with a nested loop costs O(n²) time. Then you learned that a Set—an unordered collection of unique values—can do it in O(n) time and O(n) space, a time–space tradeoff. But there is something faster: a hash set. It answers “is X present?” in O(1) average time, without any loop. The secret is a hash function that acts like a key to a filing cabinet: it turns the value into a drawer number, and you look there instantly.
After this lesson you can explain how a hash function maps a key to an array index, understand why hash maps and sets give O(1) average lookup, build a simple hash set or hash map using a JavaScript Set or Map, solve “does the array contain a duplicate?” using a hash set in O(n) time, and recognize the tradeoff: O(1) average time costs memory and does not preserve order.
The problem we are solving:
Imagine a phonebook. You have 1 million names and phone numbers. If you store them in an array and search for “Alice”, you scan from the start: Alice? No. Bob? No. Charlie? No. … — O(n) time.
But a phonebook is organized alphabetically. You jump to the “A” section, then scan just a few names. Still linear, but faster in practice.
Can we do better? Yes, if we can jump straight to the answer without scanning at all. That is what hashing does.
The idea of hashing:
A hash function is a function that takes a key (a name, a number, a word) and returns an index — a number between 0 and n-1, where n is the size of the array. The idea is clever: instead of scanning an array, use the hash function to calculate where the data should be, then look there instantly.
function hash(key, arraySize) {
// Turn the key into a number between 0 and arraySize - 1
return someFormula(key) % arraySize;
}
const index = hash("Alice", 1000); // Returns, say, 342
// Look at position 342 — Alice's phone number is there (ideally)In a good hash function, different keys map to different indices. “Alice” always maps to 342. “Bob” might map to 157. “Charlie” to 899. The array has a bucket at each index, and the key–value pair sits in that bucket.
Why O(1)? Calculate the index (one formula) and look at that index (one operation). Done. No loop.
Hash set vs. hash map:
A hash set (or unordered set) stores only values — it answers “is X in the set?”. It uses a hash function to map each value to an array index, then stores the value (or a marker) at that index.
1
const mySet = new Set();
2
mySet.add(10);
3
mySet.add(20);
4
mySet.add(30);
5
6
console.log(mySet.has(20)); // true, found instantly via hash
7
console.log(mySet.has(99)); // false, looked and not there
- L1 Create an empty set
- L2 Hash 10, store at resulting index
- L6 Hash 20, check index, found in O(1)
- L7 Hash 99, check index, not found, still O(1)
A hash map (or dictionary) stores key–value pairs. For each key, it calculates an index using a hash function, then stores both the key and the value at that index. Lookups by key are O(1) on average.
1
const myMap = new Map();
2
myMap.set("Alice", "555-1234");
3
myMap.set("Bob", "555-5678");
4
5
console.log(myMap.get("Alice")); // "555-1234", found in O(1)
6
console.log(myMap.get("Charlie")); // undefined, not there, still O(1)
- L1 Create an empty map
- L2 Hash 'Alice', store key–value pair at resulting index
- L5 Hash 'Alice', jump to index, retrieve value
Core operations on a hash set or map:
- Add/set by key — Hash the key, jump to the index, place the value. O(1) average.
- Lookup/contains — Hash the key, jump to the index, check if it is there. O(1) average.
- Delete by key — Hash the key, jump to the index, remove it. O(1) average.
The word “average” is important: in a badly designed hash function or under heavy collisions (see the edge case), performance can degrade. But with a good hash function and reasonable load factor (number of items / array size), O(1) average is reliable.
The tradeoff:
- Memory cost: A hash set or map uses extra memory—an array to hold the buckets, usually with spare capacity to keep collisions rare. You spend memory to save time.
- No order: Iteration over a hash set or map does not follow any predictable order (except that JavaScript Maps iterate in insertion order, a concession to developer expectations).
- Keys must be hashable: In some languages, only certain types can be keys (immutable types in Python, for example). JavaScript is more forgiving: you can hash any value.
Building a set — “does the array contain a duplicate?”
Recall from time-and-space: the nested-loop version is O(n²).
function hasDuplicateNested(arr) {
for (let i = 0; i < arr.length; i++) {
for (let j = i + 1; j < arr.length; j++) {
if (arr[i] === arr[j]) return true;
}
}
return false;
}With a hash set, we do one pass and O(1) lookups:
function hasDuplicateSet(arr) {
const seen = new Set();
for (let num of arr) {
if (seen.has(num)) return true; // O(1) lookup
seen.add(num); // O(1) add
}
return false;
}Time: O(n) (one loop, each operation O(1)). Space: O(n) (store up to n values).
Building a map — “store and retrieve phone numbers by name”
function createPhonebook() {
const book = new Map();
// Add entries
book.set("Alice", "555-1234");
book.set("Bob", "555-5678");
book.set("Charlie", "555-9999");
// Retrieve
console.log(book.get("Alice")); // "555-1234", O(1) lookup
console.log(book.get("David")); // undefined, still O(1)
// Check if exists
console.log(book.has("Bob")); // true
console.log(book.has("Eve")); // false
// Delete
book.delete("Bob");
// Iterate (insertion order in JS)
for (const [name, phone] of book) {
console.log(`${name}: ${phone}`);
}
return book;
}Try the comparison:
Let us trace the hash set approach on [1, 2, 3, 4, 5, 3]:
1
function hasDuplicateSet(arr) {
2
const seen = new Set();
3
for (let num of arr) {
4
if (seen.has(num)) return true;
5
seen.add(num);
6
}
7
return false;
8
}
Notice: we found the duplicate on the 6th element without comparing it to all 5 previous elements. The hash set jumps straight to the bucket for 3 and finds it already there.
Hash set and hash map operations:
| Operation | Time (average) | Space |
|---|---|---|
| Add / Set | O(1) | — |
| Lookup / Has | O(1) | — |
| Delete | O(1) | — |
| Iterate all | O(n) | O(1) aux |
| Contains duplicate (array) | O(n) | O(n) aux |
Why is “average” important?
A hash function should spread keys uniformly across the array. If it does, most operations are O(1). But if many keys hash to the same index (a collision), the buckets get crowded, and you may need to scan within a bucket. This is rare with a good hash function and proper load factor management, but it is possible. Worst case: all keys hash to one bucket, time becomes O(n).
In practice, with JavaScript’s built-in Set and Map, the implementation is optimized, and O(1) average is reliable.
Why does a hash set answer 'is X present?' in O(1) average time?
You have an array [5, 2, 9, 5, 3]. Which data structure detects the duplicate (5) most efficiently?
In a hash map with entries 'Alice' → '555-1234' and 'Bob' → '555-5678', what does map.get('Alice') do?
What is the main tradeoff of using a hash set instead of storing values in an array?
If you check for duplicates in an array of 1000 elements using a hash set, how many times does the hash function need to be called (approximately)?
Edge cases
Collisions: If two different keys hash to the same index, they collide. Resolving collisions (storing multiple key–value pairs at one index, via chaining or open addressing) is the subject of the next lesson in this unit. For now, know that a good hash function and proper load factor (typically keeping utilization below 75%) minimize collisions and maintain O(1) average time.
lesson.inset.misconception
“Hash sets are always faster than arrays.” Not quite. Hashing has O(1) average lookup, but the constant factor is larger than array indexing (hashing is a bit of work). For very small sets or when iteration order matters, an array or linked list might be simpler. Hash sets shine when you have large collections and frequent lookups.
Why does hashing solve the 'find a duplicate in O(n) time' problem, while a nested loop takes O(n²)?
A hash set and hash map are data structures that trade memory for speed.
Key facts:
- Hash function: Maps a key to an array index instantly. Core idea:
index = hash(key) % arraySize. - O(1) average lookup: Calculate the index, jump there, done. No scanning required (average case).
- Hash set: Stores unique values. Answers “is X present?” in O(1) average.
- Hash map: Stores key–value pairs. Looks up value by key in O(1) average.
- Time–space tradeoff: Spend O(n) memory to save O(n) time (compared to nested-loop O(n²)).
- JavaScript tools: Use the built-in
Setfor unique values,Mapfor key–value pairs. - Collisions: Different keys hashing to the same index degrade performance. A good hash function and load factor keep this rare.
Next, you will learn how collisions happen and how to resolve them, and you will build hash functions that distribute keys fairly.