Algorithms from zero ALG · 09 · 08

Union-Find (Disjoint Set Union): path compression and union by rank

DSU keeps a partition via a parent[] forest. find(x) returns a set''''s root; union(a,b) merges roots. Union by rank keeps trees shallow; path compression flattens them. Together: amortized O(alpha(n)), near-constant. Powers dynamic connectivity, cycle detection, Kruskal''''s MST

ALG ◷ 28 min

In the connected-components lesson you counted the pieces of a graph by looping over vertices and launching BFS/DFS from each unvisited one. That works perfectly—but only if you already have the whole graph. What if edges arrive one at a time, and after each one you must answer “are these two vertices in the same piece yet?” Recomputing all components with a fresh BFS after every single edge is wasteful. Union-Find—also called the Disjoint Set Union (DSU)—is the data structure built for exactly this. It maintains a partition of elements into disjoint sets and answers “same set?” and “merge these two sets” in near-constant time. It is the engine behind dynamic connectivity, undirected cycle detection, and Kruskal’s minimum spanning tree.

Goal

After this lesson you can implement DSU as a parent[] forest where each element starts as its own root. You can write find(x) (return the representative/root of x’s set) and union(a, b) (merge two sets), and explain connected(a, b) as find(a) === find(b). You understand the two optimizations: union by rank/size (attach the shallower tree under the deeper root) and path compression (during find, repoint nodes toward the root). You know that without them find is O(n) on a degenerate chain, and that with both together each operation is amortized O(alpha(n))—where alpha is the inverse Ackermann function, below 5 for any input you will ever see—so effectively near-constant, but not literally O(1) per call. You can name the applications: dynamic connectivity, cycle detection in an undirected graph, Kruskal’s MST, and counting components incrementally. You can contrast DSU (incremental, edges arrive over time) with BFS/DFS components (static, recompute the whole thing).

The idea

What problem does Union-Find solve?

You have n elements numbered 0..n-1. They are grouped into disjoint sets (no element is in two sets). Two operations must be fast:

find(x) — return a canonical representative (the “root”) of the set that contains x. Two elements are in the same set exactly when they share the same root.
union(a, b) — merge the set containing a with the set containing b into one set.

A derived query, connected(a, b), is just find(a) === find(b).

The parent[] forest

Why a forest of trees instead of a simple label array? Because merging two label arrays naively costs O(n)—you have to relabel every element in one group. Trees let you merge in O(1) by just linking one root under the other. Each set is stored as a tree. Every element has a parent. The root of a tree is the element whose parent is itself—that root is the set’s representative. Initially every element is its own parent, so you have n singleton trees:

parent: index  0   1   2   3   4   5
        value  0   1   2   3   4   5     (each points to itself = its own root)

Forest:  (0)  (1)  (2)  (3)  (4)  (5)    six separate sets

find(x) walks up parent pointers until it reaches the root. union(a, b) finds both roots and points one root at the other.

Naive union and why trees can degenerate

If you always attach root a under root b blindly, you can build a long chain:

union(1,0), union(2,1), union(3,2), union(4,3):

        0          find(4) must walk: 4 -> 3 -> 2 -> 1 -> 0
        ^          That is 4 hops for n = 5 elements.
        1          A chain of n nodes makes find O(n) — as slow as a linked list.
        ^
        2
        ^
        3
        ^
        4

The whole point of DSU is to keep these trees shallow. Two independent tricks do that.

Optimization 1: union by rank (or size)

Keep an extra rank[] array (an upper bound on a tree’s height). When merging two roots, attach the one with the smaller rank under the one with the larger rank. Equal ranks: pick either as the new root and bump its rank by 1. This keeps trees logarithmically shallow on its own—a tree of rank r has at least 2^r elements, so height never exceeds log2(n).

Union by size (track element counts, attach the smaller tree under the larger) gives the same bound and is interchangeable. We use rank here.

Two sets, roots A (rank 1) and B (rank 0):

   A (rank 1)        B (rank 0)         union(A, B): rank[A] > rank[B]
   |                                    -> attach B under A. Rank[A] stays 1.
   x

   A (rank 1)
   | \
   x  B            shallow, not a chain

Optimization 2: path compression

During find(x), after you locate the root, repoint the nodes you walked over so they point closer to (ideally directly at) the root. Next time, those nodes reach the root in one hop. A simple, effective variant is path halving: make every node on the path point to its grandparent as you climb.

Before find(4):  0           After find(4) with compression:
                 ^
                 1                0
                 ^             / / | \
                 2            1 2  3  4     all now point near the root
                 ^
                 3            Subsequent find(4) is nearly O(1).
                 ^
                 4

Worked example: build, union, query

Start with 6 singletons. Apply union(0,1), union(2,3), union(1,3):

Initial:  (0) (1) (2) (3) (4) (5)     6 sets

union(0,1): roots 0 and 1, equal rank.
            parent[1] = 0, rank[0] = 1.        -> {0,1} root 0

union(2,3): roots 2 and 3, equal rank.
            parent[3] = 2, rank[2] = 1.        -> {2,3} root 2

union(1,3): find(1) = 0 (rank 1), find(3) = 2 (rank 1), equal.
            parent[2] = 0, rank[0] = 2.        -> {0,1,2,3} root 0

Forest now:
        0  (rank 2)
      / | \
     1  2  ...        and 3 hangs under 2
        |
        3

connected(1, 3) = (find(1) == find(3)) = (0 == 0) = true
connected(1, 4) = (0 == 4) = false
Number of sets: started at 6, did 3 successful unions, 6 - 3 = 3 sets: {0,1,2,3}, {4}, {5}.

Each successful union drops the set count by one. Maintaining a count field gives you the number of connected components for free, updated incrementally as edges arrive.

DSU vs. BFS/DFS components

Both answer “which piece is this in?”, but they fit different shapes of problem:

BFS/DFS components (previous lesson): static. You have all edges, you traverse once, you label every vertex. Re-adding an edge means recomputing.
Union-Find: incremental / dynamic. Edges arrive over time; after each union you can immediately query connectivity. No re-traversal. This is why DSU is the natural fit for Kruskal’s MST (process edges in weight order) and online connectivity.

Each set is a tree rooted at its representative. union links one root under the other; find walks parent pointers to the root; path compression re-points the walked nodes straight to the root, flattening the tree so operations stay amortized near-O(1).

The code

DSU with path compression and union by rank

1 class DSU {

2 constructor(n) {

3 // Each element starts as its own parent (a singleton set)

4 this.parent = Array.from({ length: n }, (_, i) => i);

5 this.rank = new Array(n).fill(0); // upper bound on tree height

6 this.count = n; // number of disjoint sets

7 }

9 find(x) {

10 // Walk to the root; path halving compresses as we climb

11 while (this.parent[x] !== x) {

12 this.parent[x] = this.parent[this.parent[x]]; // point x at its grandparent

13 x = this.parent[x];

14 }

15 return x;

16 }

18 union(a, b) {

19 let ra = this.find(a);

20 let rb = this.find(b);

21 if (ra === rb) return false; // already one set: nothing merges

22 // Union by rank: attach the smaller-rank tree under the larger

23 if (this.rank[ra] < this.rank[rb]) { const t = ra; ra = rb; rb = t; }

24 this.parent[rb] = ra;

25 if (this.rank[ra] === this.rank[rb]) this.rank[ra]++;

26 this.count--; // two sets became one

27 return true;

28 }

30 connected(a, b) {

31 return this.find(a) === this.find(b);

32 }

33 }

L4 parent[i] = i: every element is initially its own root.
L5 rank approximates tree height; only roots' ranks matter.
L6 Track the live set count so component counting is O(1).
L11 Loop until x is a root (parent points to itself).
L12 Path halving: repoint x to its grandparent, flattening the tree.
L19 Find both roots first; if equal, the sets are already merged.
L22 Union by rank: ensure ra is the deeper (larger-rank) root.
L23 Hang the shallower root rb under the deeper root ra.
L24 Equal ranks: the merged tree got one level taller — bump rank.
L25 A successful union reduces the number of disjoint sets by one.

Key point: both optimizations are needed for the alpha(n) bound

Union by rank alone caps height at O(log n). Path compression alone also helps a lot. Used together, the amortized cost per operation collapses to O(alpha(n))—the inverse Ackermann function, which is below 5 for any n that fits in the universe. Drop both and find is O(n) on a degenerate chain.

Running DSU: build sets, union, query, count

class DSU {
constructor(n) {
  this.parent = Array.from({ length: n }, (_, i) => i);
  this.rank = new Array(n).fill(0);
  this.count = n;
}
find(x) {
  while (this.parent[x] !== x) {
    this.parent[x] = this.parent[this.parent[x]];
    x = this.parent[x];
  }
  return x;
}
union(a, b) {
  let ra = this.find(a);
  let rb = this.find(b);
  if (ra === rb) return false;
  if (this.rank[ra] < this.rank[rb]) { const t = ra; ra = rb; rb = t; }
  this.parent[rb] = ra;
  if (this.rank[ra] === this.rank[rb]) this.rank[ra]++;
  this.count--;
  return true;
}
connected(a, b) {
  return this.find(a) === this.find(b);
}
}

const dsu = new DSU(7);
dsu.union(0, 1);
dsu.union(1, 2);   // {0,1,2}
dsu.union(3, 4);   // {3,4}
dsu.union(5, 6);   // {5,6}

console.log("find(2) =", dsu.find(2));
console.log("connected(0, 2) =", dsu.connected(0, 2));
console.log("connected(2, 4) =", dsu.connected(2, 4));
console.log("union(4, 6) merged? " + dsu.union(4, 6)); // {3,4,5,6}
console.log("connected(3, 5) =", dsu.connected(3, 5));
console.log("union(3, 6) merged? " + dsu.union(3, 6)); // same set already
console.log("components = " + dsu.count);

Output

Step-by-step trace

1 dsu = new DSU(6); // parent[i] = i, rank = 0, count = 6

2 dsu.union(0, 1);

3 dsu.union(2, 3);

4 dsu.union(1, 3);

5 dsu.connected(1, 3);

Complexity

Without optimizations: find is O(n) worst case

A plain parent[] forest with naive union can build a chain of n elements (each new root attached under the previous one). Then find on the deepest element walks n - 1 parent pointers: O(n) per operation. A sequence of m operations costs up to O(m * n)—no better than a linked list.

Naive chain (n = 5):  0 <- 1 <- 2 <- 3 <- 4
find(4) walks 4 -> 3 -> 2 -> 1 -> 0  =  4 hops  =  n - 1  =  O(n)

Union by rank alone: O(log n)

Attaching the smaller-rank tree under the larger guarantees a tree of rank r holds at least 2^r elements, so height never exceeds log2(n). Each find is then O(log n).

Both together: amortized O(alpha(n)) — near-constant

Combine union by rank with path compression and the amortized cost per operation drops to O(alpha(n)), where alpha is the inverse Ackermann function. The Ackermann function grows so explosively that its inverse grows almost imperceptibly: alpha(n) is at most 4 (commonly stated as < 5) for every n up to astronomically large values—far beyond the number of atoms in the universe. So in practice each find/union behaves like a near-constant-time operation.

Be precise about the claim. This is amortized O(alpha(n)) averaged over a sequence of operations, not a guaranteed O(1) for every single call, and it is provably not true constant time (Tarjan proved an alpha(n) lower bound for this style of structure). It is also better than O(log n)—do not state O(log n) as the optimized bound.

n              alpha(n)
2              1
4              2
16             3
65536          4
2^65536        4   (still 4 — astronomically large input)

Space: O(n)

Two arrays of length n (parent and rank), plus an optional count: O(n) total. find and union use O(1) extra space (the iterative find with path halving needs no recursion stack).

Practice 0 / 6

What does find(x) return in a Union-Find structure?

What is the purpose of union by rank (or size)?

What does path compression do during find(x)?

With BOTH union by rank and path compression, what is the amortized time per operation?

Why is Union-Find a better fit than re-running BFS/DFS for dynamic connectivity (edges arriving over time)?

You start with 8 disjoint singletons and perform 3 successful unions (each merging two different sets). How many disjoint sets remain?

▸Why this works

Why find(a) === find(b) is the connectivity test. Every element in a set reaches the same root by walking parent pointers, and roots are unique per set. So two elements are connected if and only if their finds land on the same root. This is also how undirected cycle detection works: process each edge (u, v); if find(u) === find(v) the edge closes a cycle (both endpoints already in one set); otherwise union them. Kruskal’s MST uses the exact same check to reject edges that would form a cycle.

▸Common mistake

Comparing raw parents instead of roots. A common bug is to test parent[a] === parent[b] or to union by writing parent[a] = b (an element, not its root). Both are wrong: connectivity is about roots, not immediate parents. Always call find on both elements first, then link the two roots. Linking non-roots corrupts the forest and breaks future queries.

▸Edge cases

Union on elements already in the same set. If find(a) === find(b), the elements are already merged. A correct union must detect this and do nothing (here it returns false and leaves count unchanged). Forgetting this check decrements the set count wrongly and, in cycle detection, you would miss that the edge actually closed a cycle. In this lesson the early if (ra === rb) return false; handles it.

▸More practice

Union by size as a drop-in alternative. Instead of rank[], keep size[] (number of elements under each root, initialized to 1). On union, attach the smaller-size root under the larger and add the sizes. The complexity bound is identical to union by rank, and size is often handier because you can query “how big is x’s component?” in O(1) by reading size[find(x)]. Pick whichever the problem needs; never skip both, or trees can degenerate to O(n).

Check yourself

Quiz

Explain what Union-Find (DSU) is, how find and union work on the parent[] forest, what union by rank and path compression do, the correct optimized complexity, and when to use DSU over BFS/DFS.

Recap

Union-Find (Disjoint Set Union) maintains a partition of elements as a parent[] forest: each set is a tree, and the root is its representative.

Operations: find(x) walks parent pointers to the root; union(a, b) links one root under the other; connected(a, b) is find(a) === find(b). Track a count to read off the number of connected components incrementally.

Two optimizations: union by rank/size attaches the shallower tree under the deeper root; path compression flattens the path during find. Together they give amortized O(alpha(n))—the inverse Ackermann function, below 5 for any practical n, so near-constant (but not true O(1), and better than O(log n)). Without them, find is O(n) on a degenerate chain. Space: O(n).

When to reach for it: dynamic connectivity, undirected cycle detection, Kruskal’s MST, and incremental component counting—anywhere edges arrive over time and you would otherwise recompute BFS/DFS components from scratch. Now when you see an edge stream and a “same group?” query, you will know to build a DSU with both optimizations—and never re-run a full BFS just to answer one connectivity question.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 5 done

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.