Sorting Algorithms

A comprehensive review of common sorting algorithms, including Bubble Sort, Insertion Sort, Selection Sort, Merge Sort, Quick Sort, Heap Sort, and Radix Sort. Focuses on their performance characteristics and suitability for different data sets.


Heap Sort

Introduction to Heap Sort

Heap sort is a comparison-based sorting algorithm that leverages the properties of a binary heap data structure to efficiently sort an array. It is known for its guaranteed O(n log n) time complexity and its in-place sorting capability (with a slight caveat). This document provides a comprehensive explanation of Heap Sort, covering the heap data structure, heapify operation, step-by-step example, time and space complexity analysis, and its advantages and disadvantages.

Heap Data Structure

A heap is a specialized tree-based data structure that satisfies the heap property. It can be implemented using an array. There are two main types of heaps:

  • Max Heap: For every node i, the value of the node is greater than or equal to the value of its children (heap[i] >= heap[left(i)] and heap[i] >= heap[right(i)]). The root node contains the largest element.
  • Min Heap: For every node i, the value of the node is less than or equal to the value of its children (heap[i] <= heap[left(i)] and heap[i] <= heap[right(i)]). The root node contains the smallest element.

For Heap Sort, we typically use a Max Heap.

Given an index i of a node in the array representation of a heap:

  • Left child: left(i) = 2*i + 1
  • Right child: right(i) = 2*i + 2
  • Parent: parent(i) = floor((i-1)/2)

Heapify Operation

The heapify operation (also known as max_heapify in the context of a Max Heap) is a crucial step in heap sort. It ensures that a subtree rooted at a given node satisfies the heap property. In essence, it moves a node down the tree until it is in the correct position where it is greater than or equal to its children.

Here's how the heapify operation works:

  1. Take a node at index i in the array.
  2. Compare the node's value with the values of its left and right children.
  3. If either child is larger than the node, find the largest of the node and its children.
  4. If the largest element is not the node, swap the node with the largest child.
  5. Recursively call heapify on the affected child's subtree to ensure it also satisfies the heap property.

Here's a pseudocode representation of the heapify operation:

 function heapify(array, n, i):
            largest = i  // Initialize largest as root
            left = 2*i + 1
            right = 2*i + 2

            // If left child is larger than root
            if left < n and array[left] > array[largest]:
                largest = left

            // If right child is larger than largest so far
            if right < n and array[right] > array[largest]:
                largest = right

            // If largest is not root
            if largest != i:
                swap(array[i], array[largest])

                // Recursively heapify the affected sub-tree
                heapify(array, n, largest) 

Heap Sort Algorithm

The Heap Sort algorithm consists of two main phases:

  1. Build Heap: Convert the input array into a Max Heap. This is done by applying the heapify operation to all non-leaf nodes in reverse level order, starting from the last non-leaf node. The last non-leaf node is at index (n/2) - 1, where n is the size of the array.
  2. Sort: Repeatedly extract the maximum element (root) from the heap, place it at the end of the sorted portion of the array, and then re-heapify the remaining heap.

Here's the pseudocode for the complete Heap Sort algorithm:

 function heapSort(array, n):
            // Build Max Heap
            for i from n/2 - 1 down to 0:
                heapify(array, n, i)

            // Extract elements one by one from heap
            for i from n-1 down to 0:
                // Swap current root with end element
                swap(array[0], array[i])

                // heapify the reduced heap
                heapify(array, i, 0) // i is now the new size of the heap 

Step-by-Step Example

Let's sort the array [4, 10, 3, 5, 1] using Heap Sort.

  1. Initial Array: [4, 10, 3, 5, 1]
  2. Build Max Heap:
    • Start from the last non-leaf node (index 1, value 10): heapify(arr, 5, 1). 10 is already larger than its children (5 and 1), so no change.
    • Move to the next non-leaf node (index 0, value 4): heapify(arr, 5, 0). 4 is smaller than 10. Swap 4 and 10: [10, 4, 3, 5, 1]. Now heapify(arr, 5, 1). 4 is smaller than 5. Swap 4 and 5: [10, 5, 3, 4, 1]. The Max Heap is built.

    Resulting Max Heap (as array): [10, 5, 3, 4, 1]

  3. Sorting:
    • Swap 10 (root) with 1 (last element): [1, 5, 3, 4, 10]. The sorted portion is now [10].
    • heapify(arr, 4, 0). 1 is smaller than 5. Swap 1 and 5: [5, 1, 3, 4, 10]. Now heapify(arr, 4, 1). 1 is smaller than 4. Swap 1 and 4: [5, 4, 3, 1, 10].
    • Swap 5 (root) with 1 (last element of unsorted part): [1, 4, 3, 5, 10]. The sorted portion is now [5, 10].
    • heapify(arr, 3, 0). 1 is smaller than 4. Swap 1 and 4: [4, 1, 3, 5, 10]. Now heapify(arr, 3, 1). No change.
    • Swap 4 (root) with 3 (last element of unsorted part): [3, 1, 4, 5, 10]. The sorted portion is now [4, 5, 10].
    • heapify(arr, 2, 0). 3 is larger than 1, so no change.
    • Swap 3 (root) with 1 (last element of unsorted part): [1, 3, 4, 5, 10]. The sorted portion is now [3, 4, 5, 10].
    • heapify(arr, 1, 0). The last unsorted element is already in the correct place.
    • Swap 1 (root) with 1 (last element of unsorted part): [1, 3, 4, 5, 10]. The sorted portion is now [1, 3, 4, 5, 10].
  4. Sorted Array: [1, 3, 4, 5, 10]

Here's an illustration with a different array [12, 11, 13, 5, 6, 7]

Initial array: [12, 11, 13, 5, 6, 7]

The `heapify` process will arrange the array as: [13, 11, 12, 5, 6, 7]

After several iterations of swapping and heapifying, the result will be:

Sorted array: [5, 6, 7, 11, 12, 13]

Time and Space Complexity Analysis

  • Time Complexity:
    • Best Case: O(n log n)
    • Average Case: O(n log n)
    • Worst Case: O(n log n)
    • Building the Heap: O(n)
    • Heapifying n elements: O(log n) for each element.

    Heap Sort has a guaranteed O(n log n) time complexity in all cases, making it a reliable sorting algorithm.

  • Space Complexity:
    • Worst Case: O(1). Heap Sort is generally considered an in-place sorting algorithm because it only requires a constant amount of extra memory. While the heapify function is recursive, the recursion depth is logarithmic, contributing only O(log n) to the call stack, which is often not counted in the overall space complexity analysis as auxiliary space when the recursion depth is logarithmic. Some implementations, however, may use O(n) auxiliary space, especially if they involve creating a separate copy of the array.

Advantages of Heap Sort

  • Guaranteed O(n log n) Time Complexity: Consistent performance regardless of the input data.
  • In-place Sorting: Requires minimal extra memory (typically O(1)).
  • Efficient: Generally faster than selection sort and bubble sort, and competitive with merge sort and quicksort in certain scenarios.

Disadvantages of Heap Sort

  • Not Stable: Does not preserve the relative order of equal elements. Stability can be achieved with additional space and complexity.
  • Less Cache-Friendly: The jumps through the heap can be less cache-friendly than other algorithms like merge sort, which access memory sequentially. This can impact performance in practice.
  • More Complex than Simpler Algorithms: The implementation is more involved compared to simpler algorithms like insertion sort or bubble sort, which can be advantageous for small datasets.