Greedy Algorithms

Introduction to greedy algorithms. Discusses the characteristics of problems suitable for greedy solutions. Covers examples like Dijkstra's algorithm, Kruskal's algorithm, and Huffman coding.


Kruskal's Algorithm: Finding the Minimum Spanning Tree

Introduction

In graph theory, a minimum spanning tree (MST) of a connected, weighted graph is a subset of the edges that connects all the vertices together, without any cycles and with the minimum possible total edge weight. Kruskal's algorithm is a greedy algorithm that finds a minimum spanning tree for a weighted, undirected graph. This document provides a comprehensive study of Kruskal's algorithm, including its principles, implementation using disjoint sets (union-find), and analysis of its time complexity.

Explanation of Kruskal's Algorithm

Kruskal's algorithm works by iteratively adding the smallest weight edges to the MST, as long as adding the edge does not create a cycle. Here's a step-by-step breakdown:

  1. Sort the edges: Sort all the edges in the graph in non-decreasing order of their weight.
  2. Initialize MST: Create an empty set to store the edges of the MST.
  3. Iterate through sorted edges: For each edge in the sorted list, check if adding it to the MST would create a cycle.
  4. Cycle Detection: To detect cycles, we use the Disjoint Set data structure (also known as Union-Find). The Disjoint Set keeps track of the connected components in the graph. If the two vertices connected by the current edge belong to the same set, then adding the edge would create a cycle.
  5. Union Operation: If adding the edge does not create a cycle (i.e., the two vertices connected by the edge belong to different sets), add the edge to the MST and merge the two sets containing the vertices using the Union operation of the Disjoint Set data structure.
  6. Termination: Repeat steps 3-5 until all vertices are connected, or until we have added |V| - 1 edges to the MST, where |V| is the number of vertices in the graph.

In essence, Kruskal's algorithm greedily selects the smallest edge that doesn't create a cycle, guaranteeing that the resulting tree has the minimum possible weight.

Implementation using Disjoint Sets (Union-Find)

The efficiency of Kruskal's algorithm heavily relies on the Disjoint Set data structure for cycle detection. The Disjoint Set provides two main operations:

  • Find(x): Determines the set (or connected component) to which element x belongs. It returns the representative of the set.
  • Union(x, y): Merges the sets containing elements x and y into a single set.

A common and efficient implementation of the Disjoint Set uses two techniques:

  • Path Compression: During the Find operation, each node on the path from the input node to the root is directly linked to the root. This flattens the tree and makes subsequent Find operations faster.
  • Union by Rank: Each set maintains a rank, which is an upper bound on the height of the tree. When merging two sets, the set with the smaller rank is attached to the set with the larger rank. If the ranks are equal, one tree is chosen arbitrarily to become the parent, and its rank is incremented.

Pseudocode

 // Disjoint Set Data Structure
            class DisjointSet {
                parent: array of int // Initially, parent[i] = i
                rank: array of int   // Initially, rank[i] = 0

                Find(x): int {
                    if parent[x] != x:
                        parent[x] = Find(parent[x]) // Path Compression
                    return parent[x]
                }

                Union(x, y): void {
                    rootX = Find(x)
                    rootY = Find(y)

                    if rootX != rootY:
                        if rank[rootX] < rank[rootY]:
                            parent[rootX] = rootY
                        else if rank[rootX] > rank[rootY]:
                            parent[rootY] = rootX
                        else:
                            parent[rootY] = rootX
                            rank[rootX] += 1
                    }
                }
            }

            // Kruskal's Algorithm
            Kruskal(graph): set of edges {
                MST = empty set
                edges = graph.edges sorted by weight (non-decreasing)
                ds = new DisjointSet(graph.vertices)

                for each edge (u, v, weight) in edges:
                    if ds.Find(u) != ds.Find(v):
                        MST.add(edge)
                        ds.Union(u, v)

                return MST
            } 

Efficiency Analysis

Time Complexity

The time complexity of Kruskal's algorithm is primarily determined by the sorting step and the Disjoint Set operations.

  • Sorting Edges: Sorting the edges takes O(E log E) time, where E is the number of edges. Since in a connected graph E >= V-1, and in the worst case E can be V2, O(E log E) is equivalent to O(E log V).
  • Disjoint Set Operations: With path compression and union by rank, the Find and Union operations take almost constant time, more precisely, O(α(V)) amortized time per operation, where α(V) is the inverse Ackermann function, which grows extremely slowly and can be considered practically constant for any reasonable input size. We perform these operations at most 2E times (E iterations, each potentially calling Find twice and Union once). Therefore, the total time complexity of the Disjoint Set operations is O(E α(V)).

Therefore, the overall time complexity of Kruskal's algorithm is O(E log E) + O(E α(V)). Since O(α(V)) is practically constant, the dominating term is O(E log E), which is often written as O(E log V).

Space Complexity

The space complexity is determined by:

  • Storing the edges: O(E)
  • Disjoint Set data structure: O(V) for storing the parent and rank arrays.

Therefore, the overall space complexity is O(E + V). In many cases, E will be larger than V, making the space complexity effectively O(E).

Conclusion

Kruskal's algorithm is an efficient and widely used algorithm for finding the Minimum Spanning Tree of a graph. Its greedy approach, coupled with the efficient Disjoint Set data structure, allows it to solve the MST problem effectively. The use of Disjoint Sets makes Kruskal's Algorithm particularly well suited for sparse graphs (graphs with relatively few edges compared to the number of vertices). While other algorithms exist, such as Prim's algorithm, Kruskal's algorithm offers a distinct advantage in certain scenarios, particularly where edges are readily available and sorted.