Minimum Spanning Trees

Explores algorithms for finding minimum spanning trees (MSTs) in graphs, including Kruskal's algorithm and Prim's algorithm. Discusses their correctness and performance.


Kruskal's Algorithm for Minimum Spanning Trees

Introduction

Kruskal's algorithm is a greedy algorithm used to find the Minimum Spanning Tree (MST) for a weighted, undirected graph. An MST is a subset of the edges that connects all the vertices together, without any cycles, and with the minimum possible total edge weight. In other words, it finds the cheapest way to connect all the vertices.

Explanation of Kruskal's Algorithm

The core idea behind Kruskal's algorithm is to iteratively add the cheapest edge to the MST, as long as adding the edge doesn't create a cycle. This is achieved by maintaining a disjoint-set data structure (also known as a union-find data structure) to keep track of connected components.

Detailed Explanation of Kruskal's Algorithm

Here's a step-by-step breakdown of Kruskal's Algorithm:

  1. Sort the edges by weight: Sort all the edges of the graph in non-decreasing order of their weights. This is a crucial step as it ensures that we are always considering the cheapest available edge.
  2. Initialize a Disjoint Set Union (DSU) data structure: Create a DSU data structure for each vertex in the graph. Initially, each vertex is in its own separate set (i.e., each vertex is its own connected component). This data structure will be used to efficiently determine if adding an edge will create a cycle.
  3. Iterate through the sorted edges: For each edge (u, v) in the sorted list:
    • Find the sets of u and v: Use the find() operation of the DSU to determine the representative (root) vertex of the sets containing u and v.
    • Check for cycles: If find(u) is not equal to find(v), it means that u and v belong to different connected components. Adding the edge (u, v) will not create a cycle.
      • Add the edge to the MST: Add the edge (u, v) to the MST.
      • Union the sets of u and v: Use the union() operation of the DSU to merge the sets containing u and v into a single set. This effectively merges the two connected components into one.
    • If find(u) is equal to find(v): This means that u and v already belong to the same connected component. Adding the edge (u, v) would create a cycle. Therefore, the edge is skipped.
  4. The MST is complete: After iterating through all the edges, the edges that were added to the MST form the minimum spanning tree of the graph.

Implementation Details using Disjoint Set Union (DSU)

The efficiency of Kruskal's algorithm hinges on the efficient implementation of the Disjoint Set Union (DSU) data structure. DSU provides two crucial operations:

  • find(x): Determines the representative (root) of the set that element x belongs to. This operation is often implemented using path compression to improve efficiency. Path compression makes future find() operations faster by flattening the tree structure of the set.
  • union(x, y): Merges the sets containing elements x and y into a single set. This operation is often implemented using union by rank (or union by size) to maintain a balanced tree structure and prevent the trees from becoming too deep.

DSU Implementation (Python):

 class DSU:
    def __init__(self, n):
        self.parent = list(range(n))
        self.rank = [0] * n  # Used for Union by Rank

    def find(self, x):
        if self.parent[x] != x:
            # Path compression:  Make x's parent directly the root
            self.parent[x] = self.find(self.parent[x])
        return self.parent[x]

    def union(self, x, y):
        root_x = self.find(x)
        root_y = self.find(y)

        if root_x != root_y:
            # Union by Rank
            if self.rank[root_x] < self.rank[root_y]:
                self.parent[root_x] = root_y
            elif self.rank[root_x] > self.rank[root_y]:
                self.parent[root_y] = root_x
            else:
                self.parent[root_y] = root_x
                self.rank[root_x] += 1


# Example usage within Kruskal's Algorithm:

def kruskal(graph):
    """
    Finds the Minimum Spanning Tree of a graph using Kruskal's algorithm.

    Args:
        graph: A list of tuples, where each tuple represents an edge in the format (weight, u, v).
               u and v are vertex indices (0-based).

    Returns:
        A list of tuples representing the edges in the MST, or None if the graph is empty.
    """

    if not graph:
        return None

    n = max(max(u, v) for weight, u, v in graph) + 1  # Number of vertices

    graph.sort()  # Sort edges by weight

    dsu = DSU(n)
    mst = []

    for weight, u, v in graph:
        if dsu.find(u) != dsu.find(v):
            mst.append((weight, u, v))
            dsu.union(u, v)

    return mst


# Example Graph (represented as a list of edges)
graph = [
    (4, 0, 1),  # Edge between vertex 0 and vertex 1 with weight 4
    (8, 0, 7),
    (11, 1, 7),
    (8, 1, 2),
    (7, 7, 6),
    (1, 7, 8),
    (6, 2, 8),
    (2, 8, 6),
    (7, 2, 3),
    (4, 6, 5),
    (9, 2, 5),
    (10, 3, 5),
    (5, 3, 4)
]

mst = kruskal(graph)

if mst:
    print("Minimum Spanning Tree:")
    total_weight = 0
    for weight, u, v in mst:
        print(f"Edge: ({u}, {v}) Weight: {weight}")
        total_weight += weight
    print(f"Total Weight: {total_weight}")
else:
    print("Graph is empty.") 

Correctness Proof

We prove the correctness of Kruskal's algorithm using contradiction. Assume that Kruskal's algorithm does *not* produce the MST. Let T be the MST produced by Kruskal's and T' be the actual MST. There must be at least one edge in T that is not in T'. Let 'e' be the first edge that Kruskal's algorithm adds to T that is not in T'.

Since T' is a spanning tree, adding 'e' to T' must create a cycle. Let 'C' be this cycle in T' + {e}. Because T is a tree (no cycles), there must be at least one edge 'e'' in C that is not in T.

When Kruskal's algorithm considered adding 'e', 'e'' was *not* added. This is only possible if adding 'e'' would have created a cycle in the edges of T already selected. This implies that the endpoints of 'e'' were already in the same connected component at that point in the algorithm.

But since the endpoints of 'e'' are now in the same cycle 'C' in T' + {e}, they must have been connected by a path in T'. This path, combined with 'e'', forms a cycle. However, T' is an MST and therefore cannot contain cycles. This seems contradictory, but we need to factor the weighing of 'e' and 'e''

Since Kruskal chose 'e' over 'e'', it must be that weight(e) <= weight(e'). Now consider forming T'' by removing 'e'' from T' + {e}. This results in a new spanning tree T'' with weight weight(T') - weight(e') + weight(e). Because weight(e) <= weight(e'), weight(T'') <= weight(T'). Given that T' is a MST then weight(T'') = weight(T')

We can repeat this process until T and T' are the same. This is because we know that the edges of T and T' are of the same weights, thus we just have to rearrange them.

Therefore, by contradiction, Kruskal's algorithm must produce a minimum spanning tree.