Data Structures for Disjoint Sets | Union Find Algorithm

A Disjoint-Set data structure is a way to keep track of sets that do not overlap with each other. Each set in the collection has a representative (set-number/id), which is a member of the set. It doesn’t matter which member is selected as the representative, as long as it is consistent ie. if we ask for the representative of a dynamic set twice without modifying the set between the requests, we get the same answer both times.

This data structure is useful in situations where we need to group items together based on certain criteria, and then perform operations on these groups. For example, in a network, we might want to group nodes that are connected to each other, and then perform operations on these groups, such as finding the shortest path between two nodes.
Overall, the disjoint-set data structure is a powerful tool for managing collections of sets that do not overlap with each other, and can be used in a wide variety of applications.

Operations on Disjoint data structure

1- Make-Set (Initialise)
Creates different sets for all the members each.

Pseudocode

Make-Set (int N) int *id = new int[N]; for (int i=0; i<N; i++) {  id[i] = i; } return id

2- Union (p, q)
Unites the dynamic sets that contain p and q, say Sp and Sq into a new set that is the union of these two sets. Aforementioned two sets are assumed to be disjoint.

Pseudocode

Union (int p, int q) int i = getRoot(p) int j = getRoot(q) id[i] = j

3- Connected (p, q)
Returns a boolean value (true or false) on the basis that if p and q are in same set or different set.

Pseudocode

Connected (int p, int q) if (getRoot(p) == getRoot(q))  return true else  return false

Note:
Array id[] used above is an array that maintains the id (representative) of each disjoint set. In simple words, it is an int array such that id[i] returns the current set-number (or id) of member i.
Method getRoot() used above gets set-number (or id) of a member i.

Now, we’ll learn about certain Union-Find Algorithms with each having its own pros and cons. These algorithms basically differ in the way we unite two members p and q and the way we check if p and q are in the same set.

Quick Union Implementation
//
//  main.cpp
//  Quick Union
//
//  Created by Himanshu on 20/04/23.
//

#include <iostream>
using namespace std;

class QuickUnion {

private:

int *id;

int getRoot (int i) {

while (i != id[i]) {
i = id[i];
}

return i;
}

public:

QuickUnion (int N) {
id = new int[N];

for (int i=0; i<N; i++) {
id[i] = i;
}
}

bool connected (int p, int q) {
return (getRoot(p) == getRoot(q));
}

void makeUnion (int p, int q) {
int i = getRoot(p);
int j = getRoot(q);

id[i] = j;
}
};

int main() {
QuickUnion quickUnion(4);

cout<<"Are 0th and 1st elements connected?"<<endl;
if (quickUnion.connected(0, 1)) {
cout<<"yes"<<endl;
} else {
cout<<"no"<<endl;
}

quickUnion.makeUnion(0, 1);

cout<<"Are 0th and 1st elements connected?"<<endl;
if (quickUnion.connected(0, 1)) {
cout<<"yes"<<endl;
} else {
cout<<"no"<<endl;
}

return 0;
}


Output

Are 0th and 1st elements connected?noAre 0th and 1st elements connected?yes

This algorithm is called Quick Union, because its makeUnion method performs well on average cases. However makeUnion’s worst-case is still O(N).

Quick Find Implementation
//
//  main.cpp
//  Quick Find
//
//  Created by Himanshu on 20/04/23.
//

#include <iostream>
using namespace std;

class QuickFind {

private:

int *id;
int n;

public:

QuickFind (int N) {
n = N;
id = new int[N];

for (int i=0; i<N; i++) {
id[i] = i;
}
}

bool connected (int p, int q) {
return (id[p] == id[q]);
}

void makeUnion (int p, int q) {
int pid = id[p];
int qid = id[q];

for (int i=0; i<n; i++) {
if (id[i] == pid) {
id[i] = qid;
}
}
}
};

int main() {
QuickFind quickFind(4);

cout<<"Are 0th and 1st elements connected?"<<endl;
if (quickFind.connected(0, 1)) {
cout<<"yes"<<endl;
} else {
cout<<"no"<<endl;
}

quickFind.makeUnion(0, 1);

cout<<"Are 0th and 1st elements connected?"<<endl;
if (quickFind.connected(0, 1)) {
cout<<"yes"<<endl;
} else {
cout<<"no"<<endl;
}

return 0;
}


Output

Are 0th and 1st elements connected?noAre 0th and 1st elements connected?yes

This algorithm is called Quick Find, because its connected (find) method performs well with O(1) time complexity.

Weighted Quick Union Implementation
//
//  main.cpp
//  Weighted Quick Union
//
//  Created by Himanshu on 20/04/23.
//

#include <iostream>
using namespace std;

class WeightedQuickUnion {

private:

int *id, *setSize;

int getRoot (int i) {

while (i != id[i]) {
i = id[i];
}

return i;
}

public:

WeightedQuickUnion (int N) {

id = new int[N];
setSize = new int[N];

for (int i=0; i<N; i++) {
id[i] = i;
setSize[i] = 1;
}
}

bool connected (int p, int q) {
return (getRoot(p) == getRoot(q));
}

void makeUnion (int p, int q) {
int i = getRoot(p);
int j = getRoot(q);

if (i == j) {
return;
}

if (setSize[i] < setSize[j]) {
id[i] = j;
setSize[j] += setSize[i];
} else {
id[j] = i;
setSize[i] += setSize[j];
}
}
};

int main() {
WeightedQuickUnion wtQuickUnion(4);

cout<<"Are 0th and 1st elements connected?"<<endl;
if (wtQuickUnion.connected(0, 1)) {
cout<<"yes"<<endl;
} else {
cout<<"no"<<endl;
}

wtQuickUnion.makeUnion(0, 1);

cout<<"Are 0th and 1st elements connected?"<<endl;
if (wtQuickUnion.connected(0, 1)) {
cout<<"yes"<<endl;
} else {
cout<<"no"<<endl;
}

return 0;
}


Output

Are 0th and 1st elements connected?noAre 0th and 1st elements connected?yes

Note: setSize[] array is used to keep the track of size of all the trees (or disjoint sets). So that we could attach new member to a larger disjoint-set or tree, leading to more wider and small-depth trees.

Weighted Quick Union algorithm performs well for both connected (find) and union operations. Both the operations have approximately O(lgN) time complexity.
The Weighted Quick Union algorithm is more efficient than the normal Quick Union algorithm because it reduces the height of the tree by always attaching the smaller tree to the root of the larger tree. This means that the resulting tree will be more balanced (or wide) and have smaller depths (or height) [as shown in the image above], making the find operation more efficient. In the normal Quick Union algorithm, the trees can become very tall, leading to inefficient find operations. This happens when one side of the tree is much deeper than the other, and finding the root involves traversing many nodes along the way.

Path Compression

Weighted Quick Union algorithm could be optimised further by using a more efficient getRoot() method.
Path compression is a technique to make future find operations faster by making each node in the path point to the root directly.

Path Compression getRoot() Implementation

    int getRoot (int i) {

while (i != id[i]) {
id[i] = id[id[i]];
i = id[i];
}

return i;
}


In this implementation, when the getRoot() function is called, it traverses the path from the given node i to its root, and while doing so, it sets the parent of each node in the path to the root node. This way, the path gets compressed, and the next time we call the getRoot() function on any node in the path, the traversal time will be reduced significantly.

Reference: Coursera