Kruskal's algorithm, minimum spanning trees

Kruskal's algorithm receives a graph G=(V, E) from which it creates a minimum spanning tree G'=(V, E').

What is a minimum spanning tree?

A spanning tree is a subgraph of an undirected and connected graph G. The following conditions must hold for the subgraph:

All nodes contained in the graph G are also contained in the subgraph.
The subgraph is connected and free of cycles.
The set of edges of the subgraph is a subset of the edge set of G.

A spanning tree G' is a minimum spanning tree of G, if there is no spanning tree of G whose sum of all edge weights is smaller than the sum of all edge weights of G'.

Example to spanning trees:

As an example, let the graph G look like this:

Graph to be used as an example - example graph

The following two graphs are both minimum spanning trees of G because they are both spanning trees of G and there is no spanning tree of G whose sum of edge weights is less than 14.

2 minimum spanning trees, both with minimum sum of edge weights

The following graph is a spanning tree of G because the graph is free of cycles and connected and because all nodes from G are also contained in this graph. But the sum of edge weights is 16 and there are 2 spanning trees where the sum of edge weights is 14. Thus, the following graph is a spanning tree of G, but not a minimal one:

spanning tree, but not minimal - sum of edge weights not minimal

The following graph contains a cycle and is therefore not a spanning tree.

the graph contains one cycle and is therefore not a minimum spanning tree

The following graph is not connected and is therefore not a spanning tree.

the graph is not connected and therefore is not a minimum spanning tree

How Kruskal's algorithm works

Pseudocode

Function Kruskal(V, E)
    E' := {}
    Q := copy of E
    Sort Q ascending by edge weight
 
    While Q is not empty
        e := shortest edge in Q
        Q.remove(e)
        If the graph (V, E' ∪ {e}) does not form a cycle
            E' := E' ∪ {e}

Input:

Kruskal's algorithm obtains an undirected and connected graph G=(V, E), where V is the set of nodes and E is the set of edges.

Important values and sets:

E': the set of edges of the evolving minimum spanning tree
Q: the set of edges for which it has not yet been decided whether they should be inserted into E' or not
e: the edge that is being decided whether to insert into E'.

Goal:

After the execution of Kruskal's algorithm (V, E') forms a minimum spanning tree of G.

Initialization:

At the beginning there are no edges in the set of edges E' yet. All edges from E will be inserted into Q and Q is sorted in ascending order according to the edge weights.

Loop:

In each iteration, the edge with the smallest edge weight is removed from Q. If this can be inserted into E' without causing a cycle in the evolving minimum spanning tree, the edge is inserted into E'.

Example of the Kruskal algorithm:

As an example, a minimum spanning tree of the graph from the previous chapter is to be created using the Kruskal algorithm.

Example graph to which Kruskal is to be applied

The edge set of the new graph E' is still empty at the beginning. All edges of the example graph are inserted into Q and Q is sorted according to the edge weights in ascending order.

in the initialization phase the edges are sorted and E' is empty

In the first iteration, the edge {E, C} is removed from Q. Since no cycle forms when inserted into the new graph, it is inserted into E'.

in iteration 1 of Kruskal the edge with smallest weight is inserted in E'.

Exactly the same is done in the second iteration. The edge {D, C}, i.e. the edge with the smallest edge weight, is removed from Q. And since this can also be inserted into E' without forming a cycle, it is inserted in E'.

the second edge is removed from Q and inserted into E'.

Also in the next 3 iterations no cycle is formed and thus the edges {B, D}, {A, C} and {D, F} are inserted in E'.

the third edge is added to the spanning tree

Kruskal algorithm for determination of MST

In the sixth iteration, a cycle would occur if {F, E} would be inserted into E'. So it is not inserted in E' after it was removed from Q.

Edge would create a cycle and thus is not inserted into MST

Also, in the seventh iteration, a cycle would occur if {B, A} would insert into E'. So also {B, A} will not be inserted into E'.

Edge is not inserted because otherwise a cycle is generated

Thus, the developed minimum spanning tree looks as follows:

the minimum spanning tree generated with the Kruskal algorithm

Check for cycles

In principle, the algorithm works very simply. So far, however, we have ignored how the check for cycles works. A so-called "union-find structure" is used for this.

Assume that a minimum spanning tree is to be determined for the following graph:

Example graph to explain the UnionFind algorithm

After 9 iterations of the Kruskal algorithm, the graph looks like this, where the gray edges are the edges for which it has not yet been decided whether to insert them into E' or not:

Intermediate state from the execution of Kruskal's algorithm

Now suppose that exactly those nodes belong to a common group with a common color that are directly or indirectly connected to each other by edges in E'.

Now, how do we decide for an edge whether it should be inserted into E'? If the two nodes belonging to the edge have the same color, it means that there is already a path between the two nodes in the graph (V, E'). If the edge were inserted, the two nodes would be connected by both the already existing path and the new edge, thus creating a cycle.
If the two nodes have different colors, it means that there is no path between them yet and thus inserting the edge in E' would not result in a cycle because the new edge would be the only connection between the two nodes.
For the algorithm, this means that the edge is inserted exactly when both nodes are of a different color. In order for the algorithm to work reliably in future iterations, the two groups of the two nodes connected by the new edge must be combined into one group.

In practice, of course, no colors are assigned to the groups. Instead, there is a node for each group that represents this group. The merging of 2 groups would however be associated with high effort for very large groups, if for one of the two groups a new representative would be assigned to each node individually. Therefore, a tree-like structure is introduced, where the node representing the group is the root. For each node v∈V a variable predecessor[v] is introduced. If v is a root, predecessor[v] is assigned the symbolic value null. Otherwise predecessor[v] is a node on the path between v and the root, where predecessor[v] may also be the root but not v itself. This ensures that for each node the root can be reached via the predecessor nodes and that the root can recognize itself as a root.

In union-find structures there are 2 important methods: find and union.
find(v) returns the root of the tree in which v is located.
union(u, v) merges the tree of u and the tree of v into one tree. To do this, the predecessor reference from one root is set to the root of the other node.

The best way to show this is with an example. The input graph is again the graph from above:

In the following figures, an arrow from the node u to the node v meansthat predecessor[u] == v holds. The reference from the root to null is not shown.
At the beginning of Kruskal's algorithm E' is still empty and therefore each node is in its own group with itself as root.

all nodes are in their own group after initialization

In the first iteration it is checked whether the edge {K, I} should be inserted into E'. Da finden(K) und finden(I) unterschiedliche Knoten zurück liefern, wird zum einen die KanteSince find(K) and find(I) return different nodes, on the one hand the edge {K, I} is inserted into E' and on the other hand either the predecessor of K is set to I or vice versa. In this case, K becomes the predecessor of I.

If the edge {F, G} is to be inserted next, are again find(F) and find(G) compared with each other and since find(F) != find(G) holds, the edge is inserted in E' and the two trees are united in the union-find structure.

In the third iteration, the edge {I, L} is inserted into E', because find(I) = K != L = find(L) holds. It makes sense to keep the trees as flat as possible. Because the tree with the root L has a lower height than the tree with the root K, L is appended to K.

And this is how it continues. After 9 iterations, the evolving spanning tree looks like this:

Intermediate result from the execution of Kruskal's algorithm

And the Union Find structure:

the UnionFind data structure after 9 iterations

Next, we check whether the edge {G, H} should be inserted into E'. Since find(G) = G == G = find(H) holds, the edge is not inserted in E' and the union-find structure is not changed either. The same happens when trying to insert {B, D} into E'.

But with the edge {L, H} result in different roots again and therefore the edge can be inserted again into E' and the trees in the Union-Find structure are merged again.

Also the edge {C, J} can be inserted into E' and the Union-Find structure is adjusted again.

final step of the Union Find data structure

D and F have the same root element and the same applies to J and K. Therefore, {D, F} and {J, K} are not included in the minimum spanning tree.

Pseudocode Kruskal with Union-Find

Now the pseudo code from above is to be extended by code for the union-find algorithm:

Function find(v)
    If predecessor[v] == null
        return v
    Else
        return find(predecessor[v])

Wenn es sich um einen Wurzelknoten handelt, wird der Knoten selber zurück gegeben. Otherwise find is called with the predecessor node.

Function union(u, v)
    root_u := find[u]
    root_v := find[v]
 
    If rank[root_u] > rank[root_v]
        predecessor[root_v] := root_u
    Else if rank[root_u] < rank[root_v]
        predecessor[root_u] := root_v
    Else
        predecessor[root_v] := root_u
        rank[root_u]++

For each node v∈V a variable rank[v] was introducedwhich indicates how high the tree is. To avoid traversing too many nodes when calling find, the lower tree should always be inserted into the higher one. The height of the tree to whose roots the other tree is attached changes only if both trees are of the same height.

Function Kruskal(V, E)
    E' := {}
    Q := copy of E
    Sort Q ascending by edge weight
    For each node v ∈ V
        predecessor[v] := null
        rank[v] := 1
 
    As long as Q is not empty
        {u, v} := shortest edge in Q
        Q.remove({u, v})
        If find(u) != find(v)
            E' := E' ∪ {{u, v}}
            union(u, v)

Compress Union-Find

Even if you always hang the smaller tree at the root from the larger tree to keep the trees flatter, the trees can get quite tall on very large graphs. When calling find, the entire chain must then be gone through each time. Of course, this requires a lot of effort. This effort could be reduced significantly if the predecessor reference would point directly to the root. As already described above, however, each node of one tree should not be assigned the root of the other tree individually as a predecessor, each time 2 trees are united. A good compromise is that whenever you call the find method where the method has not been applied to the root element, the predecessor is set to the root node, which is returned anyway.

As an example, find(H) should be called with the following tree:

last step of the Union Find data structure

When calling find(H) a total of 4 nodes are traversed. If every traversed node (except the root) is appended directly to the root, the tree afterwards looks like this:

Union Find - Calling find() with compression

If find(H) were called again, only 2 nodes would have to be traversed.

The find method can be modified as follows for this purpose:

Function find(v)
    If predecessor[v] == null
        return v
    Else
        predecessor[v] := find(predecessor[v])
        return predecessor[v]

Kruskal's Algorithm

node names with:

type of edges:

What is a minimum spanning tree?

Example to spanning trees:

How Kruskal's algorithm works

Pseudocode

Input:

Important values and sets:

Goal:

Initialization:

Loop:

Example of the Kruskal algorithm:

Check for cycles

Pseudocode Kruskal with Union-Find

Compress Union-Find

Implementation

Kruskal in Java:

GraphAlgorithms.java

TestClass.java

UndirectedGraph.java

Node.java

UndirectedEdge.java

EdgeComparator.java

good explanatory videos on Youtube