Minimum Spanning Trees
Explores algorithms for finding minimum spanning trees (MSTs) in graphs, including Kruskal's algorithm and Prim's algorithm. Discusses their correctness and performance.
Kruskal's Algorithm for Minimum Spanning Trees
Introduction
Kruskal's algorithm is a greedy algorithm used to find the Minimum Spanning Tree (MST) for a weighted, undirected graph. An MST is a subset of the edges that connects all the vertices together, without any cycles, and with the minimum possible total edge weight. In other words, it finds the cheapest way to connect all the vertices.
Explanation of Kruskal's Algorithm
The core idea behind Kruskal's algorithm is to iteratively add the cheapest edge to the MST, as long as adding the edge doesn't create a cycle. This is achieved by maintaining a disjoint-set data structure (also known as a union-find data structure) to keep track of connected components.
Detailed Explanation of Kruskal's Algorithm
Here's a step-by-step breakdown of Kruskal's Algorithm:
- Sort the edges by weight: Sort all the edges of the graph in non-decreasing order of their weights. This is a crucial step as it ensures that we are always considering the cheapest available edge.
- Initialize a Disjoint Set Union (DSU) data structure: Create a DSU data structure for each vertex in the graph. Initially, each vertex is in its own separate set (i.e., each vertex is its own connected component). This data structure will be used to efficiently determine if adding an edge will create a cycle.
- Iterate through the sorted edges: For each edge
(u, v)
in the sorted list:- Find the sets of
u
andv
: Use thefind()
operation of the DSU to determine the representative (root) vertex of the sets containingu
andv
. - Check for cycles: If
find(u)
is not equal tofind(v)
, it means thatu
andv
belong to different connected components. Adding the edge(u, v)
will not create a cycle.- Add the edge to the MST: Add the edge
(u, v)
to the MST. - Union the sets of
u
andv
: Use theunion()
operation of the DSU to merge the sets containingu
andv
into a single set. This effectively merges the two connected components into one.
- Add the edge to the MST: Add the edge
- If
find(u)
is equal tofind(v)
: This means thatu
andv
already belong to the same connected component. Adding the edge(u, v)
would create a cycle. Therefore, the edge is skipped.
- Find the sets of
- The MST is complete: After iterating through all the edges, the edges that were added to the MST form the minimum spanning tree of the graph.
Implementation Details using Disjoint Set Union (DSU)
The efficiency of Kruskal's algorithm hinges on the efficient implementation of the Disjoint Set Union (DSU) data structure. DSU provides two crucial operations:
find(x)
: Determines the representative (root) of the set that elementx
belongs to. This operation is often implemented using path compression to improve efficiency. Path compression makes futurefind()
operations faster by flattening the tree structure of the set.union(x, y)
: Merges the sets containing elementsx
andy
into a single set. This operation is often implemented using union by rank (or union by size) to maintain a balanced tree structure and prevent the trees from becoming too deep.
DSU Implementation (Python):
class DSU:
def __init__(self, n):
self.parent = list(range(n))
self.rank = [0] * n # Used for Union by Rank
def find(self, x):
if self.parent[x] != x:
# Path compression: Make x's parent directly the root
self.parent[x] = self.find(self.parent[x])
return self.parent[x]
def union(self, x, y):
root_x = self.find(x)
root_y = self.find(y)
if root_x != root_y:
# Union by Rank
if self.rank[root_x] < self.rank[root_y]:
self.parent[root_x] = root_y
elif self.rank[root_x] > self.rank[root_y]:
self.parent[root_y] = root_x
else:
self.parent[root_y] = root_x
self.rank[root_x] += 1
# Example usage within Kruskal's Algorithm:
def kruskal(graph):
"""
Finds the Minimum Spanning Tree of a graph using Kruskal's algorithm.
Args:
graph: A list of tuples, where each tuple represents an edge in the format (weight, u, v).
u and v are vertex indices (0-based).
Returns:
A list of tuples representing the edges in the MST, or None if the graph is empty.
"""
if not graph:
return None
n = max(max(u, v) for weight, u, v in graph) + 1 # Number of vertices
graph.sort() # Sort edges by weight
dsu = DSU(n)
mst = []
for weight, u, v in graph:
if dsu.find(u) != dsu.find(v):
mst.append((weight, u, v))
dsu.union(u, v)
return mst
# Example Graph (represented as a list of edges)
graph = [
(4, 0, 1), # Edge between vertex 0 and vertex 1 with weight 4
(8, 0, 7),
(11, 1, 7),
(8, 1, 2),
(7, 7, 6),
(1, 7, 8),
(6, 2, 8),
(2, 8, 6),
(7, 2, 3),
(4, 6, 5),
(9, 2, 5),
(10, 3, 5),
(5, 3, 4)
]
mst = kruskal(graph)
if mst:
print("Minimum Spanning Tree:")
total_weight = 0
for weight, u, v in mst:
print(f"Edge: ({u}, {v}) Weight: {weight}")
total_weight += weight
print(f"Total Weight: {total_weight}")
else:
print("Graph is empty.")
Correctness Proof
We prove the correctness of Kruskal's algorithm using contradiction. Assume that Kruskal's algorithm does *not* produce the MST. Let T be the MST produced by Kruskal's and T' be the actual MST. There must be at least one edge in T that is not in T'. Let 'e' be the first edge that Kruskal's algorithm adds to T that is not in T'.
Since T' is a spanning tree, adding 'e' to T' must create a cycle. Let 'C' be this cycle in T' + {e}. Because T is a tree (no cycles), there must be at least one edge 'e'' in C that is not in T.
When Kruskal's algorithm considered adding 'e', 'e'' was *not* added. This is only possible if adding 'e'' would have created a cycle in the edges of T already selected. This implies that the endpoints of 'e'' were already in the same connected component at that point in the algorithm.
But since the endpoints of 'e'' are now in the same cycle 'C' in T' + {e}, they must have been connected by a path in T'. This path, combined with 'e'', forms a cycle. However, T' is an MST and therefore cannot contain cycles. This seems contradictory, but we need to factor the weighing of 'e' and 'e''
Since Kruskal chose 'e' over 'e'', it must be that weight(e) <= weight(e'). Now consider forming T'' by removing 'e'' from T' + {e}. This results in a new spanning tree T'' with weight weight(T') - weight(e') + weight(e). Because weight(e) <= weight(e'), weight(T'') <= weight(T'). Given that T' is a MST then weight(T'') = weight(T')
We can repeat this process until T and T' are the same. This is because we know that the edges of T and T' are of the same weights, thus we just have to rearrange them.
Therefore, by contradiction, Kruskal's algorithm must produce a minimum spanning tree.