Algorithms from zero ALG · 05 · 01

Hash maps and hash sets

A hash set answers ''''is X present?'''' in O(1) time. A hash map stores key–value pairs and looks up by key in O(1). Both trade memory for speed using a hash function to jump straight to the answer.

ALG ◷ 20 min

In the time-and-space lesson, you learned that checking “is value X in the array?” with a nested loop costs O(n²) time. Then you learned that a Set—an unordered collection of unique values—can do it in O(n) time and O(n) space, a time–space tradeoff. But there is something faster: a hash set. It answers “is X present?” in O(1) average time, without any loop. The secret is a hash function that acts like a key to a filing cabinet: it turns the value into a drawer number, and you look there instantly.

Goal

After this lesson you can explain how a hash function maps a key to an array index, understand why hash maps and sets give O(1) average lookup, build a simple hash set or hash map using a JavaScript Set or Map, solve “does the array contain a duplicate?” using a hash set in O(n) time, and recognize the tradeoff: O(1) average time costs memory and does not preserve order.

The idea

The problem we are solving:

Imagine a phonebook. You have 1 million names and phone numbers. If you store them in an array and search for “Alice”, you scan from the start: Alice? No. Bob? No. Charlie? No. … — O(n) time.

But a phonebook is organized alphabetically. You jump to the “A” section, then scan just a few names. Still linear, but faster in practice.

Can we do better? Yes, if we can jump straight to the answer without scanning at all. That is what hashing does.

The idea of hashing:

A hash function is a function that takes a key (a name, a number, a word) and returns an index — a number between 0 and n-1, where n is the size of the array. The idea is clever: instead of scanning an array, use the hash function to calculate where the data should be, then look there instantly.

function hash(key, arraySize) {
  // Turn the key into a number between 0 and arraySize - 1
  return someFormula(key) % arraySize;
}

const index = hash("Alice", 1000);  // Returns, say, 342
// Look at position 342 — Alice's phone number is there (ideally)

In a good hash function, different keys map to different indices. “Alice” always maps to 342. “Bob” might map to 157. “Charlie” to 899. The array has a bucket at each index, and the key–value pair sits in that bucket.

Why O(1)? Calculate the index (one formula) and look at that index (one operation). Done. No loop.

Hash set vs. hash map:

A hash set (or unordered set) stores only values — it answers “is X in the set?”. It uses a hash function to map each value to an array index, then stores the value (or a marker) at that index.

1 const mySet = new Set();

2 mySet.add(10);

3 mySet.add(20);

4 mySet.add(30);

6 console.log(mySet.has(20)); // true, found instantly via hash

7 console.log(mySet.has(99)); // false, looked and not there

L1 Create an empty set
L2 Hash 10, store at resulting index
L6 Hash 20, check index, found in O(1)
L7 Hash 99, check index, not found, still O(1)

A hash map (or dictionary) stores key–value pairs. For each key, it calculates an index using a hash function, then stores both the key and the value at that index. Lookups by key are O(1) on average.

1 const myMap = new Map();

2 myMap.set("Alice", "555-1234");

3 myMap.set("Bob", "555-5678");

5 console.log(myMap.get("Alice")); // "555-1234", found in O(1)

6 console.log(myMap.get("Charlie")); // undefined, not there, still O(1)

L1 Create an empty map
L2 Hash 'Alice', store key–value pair at resulting index
L5 Hash 'Alice', jump to index, retrieve value

Core operations on a hash set or map:

Add/set by key — Hash the key, jump to the index, place the value. O(1) average.
Lookup/contains — Hash the key, jump to the index, check if it is there. O(1) average.
Delete by key — Hash the key, jump to the index, remove it. O(1) average.

Together these three operations all share the same shape: compute one index, go there directly. Without hashing, each would require scanning up to n elements; with hashing, none of them do.

The word “average” is important: in a badly designed hash function or under heavy collisions (see the edge case), performance can degrade. But with a good hash function and reasonable load factor (number of items / array size), O(1) average is reliable.

The tradeoff:

Memory cost: A hash set or map uses extra memory—an array to hold the buckets, usually with spare capacity to keep collisions rare. You spend memory to save time.
No order: Iteration over a hash set or map does not follow any predictable order (except that JavaScript Maps iterate in insertion order, a concession to developer expectations).
Keys must be hashable: In some languages, only certain types can be keys (immutable types in Python, for example). JavaScript is more forgiving: you can hash any value.

A hash map maps a key to an array index. hash(key) % size computes the bucket index; the key–value pair lives in that one slot of the backing array. Lookup re-hashes the same key, gets the same index, and jumps straight to the bucket — one multiply-and-mod, no scanning, so O(1) average access.

The code

Building a set — “does the array contain a duplicate?”

Recall from time-and-space: the nested-loop version is O(n²).

function hasDuplicateNested(arr) {
  for (let i = 0; i < arr.length; i++) {
    for (let j = i + 1; j < arr.length; j++) {
      if (arr[i] === arr[j]) return true;
    }
  }
  return false;
}

With a hash set, we do one pass and O(1) lookups:

function hasDuplicateSet(arr) {
  const seen = new Set();
  for (let num of arr) {
    if (seen.has(num)) return true;  // O(1) lookup
    seen.add(num);                   // O(1) add
  }
  return false;
}

Time: O(n) (one loop, each operation O(1)). Space: O(n) (store up to n values).

Building a map — “store and retrieve phone numbers by name”

function createPhonebook() {
  const book = new Map();

  // Add entries
  book.set("Alice", "555-1234");
  book.set("Bob", "555-5678");
  book.set("Charlie", "555-9999");

  // Retrieve
  console.log(book.get("Alice"));    // "555-1234", O(1) lookup
  console.log(book.get("David"));    // undefined, still O(1)

  // Check if exists
  console.log(book.has("Bob"));      // true
  console.log(book.has("Eve"));      // false

  // Delete
  book.delete("Bob");

  // Iterate (insertion order in JS)
  for (const [name, phone] of book) {
    console.log(`${name}: ${phone}`);
  }

  return book;
}

Try the comparison:

Output

Step-by-step trace

Let us trace the hash set approach on [1, 2, 3, 4, 5, 3]:

1 function hasDuplicateSet(arr) {

2 const seen = new Set();

3 for (let num of arr) {

4 if (seen.has(num)) return true;

5 seen.add(num);

6 }

7 return false;

8 }

Notice: we found the duplicate on the 6th element without comparing it to all 5 previous elements. The hash set jumps straight to the bucket for 3 and finds it already there.

Complexity

Hash set and hash map operations:

Operation	Time (average)	Space
Add / Set	O(1)	—
Lookup / Has	O(1)	—
Delete	O(1)	—
Iterate all	O(n)	O(1) aux
Contains duplicate (array)	O(n)	O(n) aux

Why is “average” important?

A hash function should spread keys uniformly across the array. If it does, most operations are O(1). But if many keys hash to the same index (a collision), the buckets get crowded, and you may need to scan within a bucket. This is rare with a good hash function and proper load factor management, but it is possible. Worst case: all keys hash to one bucket, time becomes O(n).

In practice, with JavaScript’s built-in Set and Map, the implementation is optimized, and O(1) average is reliable.

Practice 0 / 5

Why does a hash set answer 'is X present?' in O(1) average time?

You have an array [5, 2, 9, 5, 3]. Which data structure detects the duplicate (5) most efficiently?

In a hash map with entries 'Alice' → '555-1234' and 'Bob' → '555-5678', what does map.get('Alice') do?

What is the main tradeoff of using a hash set instead of storing values in an array?

If you check for duplicates in an array of 1000 elements using a hash set, how many times does the hash function need to be called (approximately)?

▸Edge cases

Collisions: If two different keys hash to the same index, they collide. Resolving collisions (storing multiple key–value pairs at one index, via chaining or open addressing) is the subject of the next lesson in this unit. For now, know that a good hash function and proper load factor (typically keeping utilization below 75%) minimize collisions and maintain O(1) average time.

▸lesson.inset.misconception

“Hash sets are always faster than arrays.” Not quite. Hashing has O(1) average lookup, but the constant factor is larger than array indexing (hashing is a bit of work). For very small sets or when iteration order matters, an array or linked list might be simpler. Hash sets shine when you have large collections and frequent lookups.

Check yourself

Quiz

Why does hashing solve the 'find a duplicate in O(n) time' problem, while a nested loop takes O(n²)?

Recap

A hash set and hash map are data structures that trade memory for speed.

Key facts:

Hash function: Maps a key to an array index instantly. Core idea: index = hash(key) % arraySize.
O(1) average lookup: Calculate the index, jump there, done. No scanning required (average case).
Hash set: Stores unique values. Answers “is X present?” in O(1) average.
Hash map: Stores key–value pairs. Looks up value by key in O(1) average.
Time–space tradeoff: Spend O(n) memory to save O(n) time (compared to nested-loop O(n²)).
JavaScript tools: Use the built-in Set for unique values, Map for key–value pairs.
Collisions: Different keys hashing to the same index degrade performance. A good hash function and load factor keep this rare.

Now when you see a problem asking “does X exist?” or “how many times did Y appear?”, your first instinct should be to reach for a hash set or map — not a nested loop.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 5 done

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.