Collections Framework

Introduction to the Java Collections Framework, covering ArrayList, LinkedList, HashMap, HashSet, and other essential data structures.


HashSet in Java

What is a HashSet?

A HashSet is a class in the Java Collections Framework that implements the Set interface. It represents a collection of elements that contains no duplicate elements. More formally, sets contain no pair of elements e1 and e2 such that e1.equals(e2), and at most one null element.

HashSet is backed by a HashMap instance. Essentially, the keys of the HashMap are the elements of the HashSet, and the values are dummy objects (usually just PRESENT, a private static final Object). This design leverages the HashMap's efficient lookup capabilities.

Key Characteristics of HashSet:

  • Unordered:HashSet does not guarantee any particular order of elements. The order can change over time as elements are added or removed. If you need a specific order, consider using LinkedHashSet or TreeSet.
  • No Duplicates: It ensures that there are no duplicate elements. If you try to add an element that already exists, it will be ignored (the add() method will return false).
  • Allows One Null Element: You can add one null element to a HashSet.
  • Not Synchronized: HashSet is not synchronized, meaning it's not thread-safe. If multiple threads access a HashSet concurrently, and at least one of the threads modifies it, it must be synchronized externally. Use Collections.synchronizedSet(new HashSet(...)) to create a synchronized HashSet.
  • Fast Performance: Provides excellent performance for adding, removing, and checking for the existence of elements, typically with O(1) average time complexity.

Creating a HashSet

You can create a HashSet using several constructors:

  • HashSet(): Creates an empty HashSet with the default initial capacity (16) and load factor (0.75).
  • HashSet(int initialCapacity): Creates an empty HashSet with the specified initial capacity and the default load factor (0.75).
  • HashSet(int initialCapacity, float loadFactor): Creates an empty HashSet with the specified initial capacity and load factor.
  • HashSet(Collection<? extends E> c): Creates a HashSet containing the elements of the specified collection.
import java.util.HashSet;
import java.util.Arrays;
import java.util.List;

public class HashSetExample {
    public static void main(String[] args) {
    // Creating an empty HashSet with default capacity and load factor
    HashSet<String> set1 = new HashSet<>();

    // Creating a HashSet with specified initial capacity
    HashSet<Integer> set2 = new HashSet<>(32);

    // Creating a HashSet with specified initial capacity and load factor
    HashSet<Double> set3 = new HashSet<>(64, 0.8f);

    // Creating a HashSet from a collection (e.g., a List)
    List<String> list = Arrays.asList("apple", "banana", "cherry");
    HashSet<String> set4 = new HashSet<>(list);

    System.out.println("Set 1: " + set1);
    System.out.println("Set 2: " + set2);
    System.out.println("Set 3: " + set3);
    System.out.println("Set 4: " + set4);
}
}

Adding Elements

You can add elements to a HashSet using the add() method. The add() method returns true if the element was successfully added (i.e., it wasn't already present), and false otherwise.

import java.util.HashSet;

public class HashSetAddExample {
    public static void main(String[] args) {
    HashSet<String> fruits = new HashSet<>();

    fruits.add("apple");
    fruits.add("banana");
    fruits.add("cherry");

    System.out.println("HashSet: " + fruits); // Output: HashSet: [banana, apple, cherry] (order is not guaranteed)

    boolean added = fruits.add("apple"); // Trying to add a duplicate
    System.out.println("Was 'apple' added again? " + added); // Output: Was 'apple' added again? false
    System.out.println("HashSet after attempting to add a duplicate: " + fruits); // Output: HashSet after attempting to add a duplicate: [banana, apple, cherry]

    fruits.add(null);
    System.out.println("HashSet with null: " + fruits);  // Output: HashSet with null: [null, banana, apple, cherry]

    fruits.add(null); //Trying to add another null
    System.out.println("HashSet with null: " + fruits);  // Output: HashSet with null: [null, banana, apple, cherry]
}
}

Removing Elements

You can remove elements from a HashSet using the remove() method. The remove() method returns true if the element was successfully removed, and false otherwise (if the element was not present in the set).

import java.util.HashSet;

public class HashSetRemoveExample {
    public static void main(String[] args) {
    HashSet<String> colors = new HashSet<>();
    colors.add("red");
    colors.add("green");
    colors.add("blue");

    System.out.println("HashSet before removal: " + colors); // Output: HashSet before removal: [blue, red, green]

    boolean removed = colors.remove("red");
    System.out.println("Was 'red' removed? " + removed); // Output: Was 'red' removed? true
    System.out.println("HashSet after removing 'red': " + colors); // Output: HashSet after removing 'red': [blue, green]

    removed = colors.remove("purple"); // Trying to remove an element that doesn't exist
    System.out.println("Was 'purple' removed? " + removed); // Output: Was 'purple' removed? false
    System.out.println("HashSet after attempting to remove 'purple': " + colors); // Output: HashSet after attempting to remove 'purple': [blue, green]

    colors.remove(null); // Trying to remove null that doesn't exist.  If it exists, it will be removed.
    System.out.println("HashSet after attempting to remove null: " + colors); // Output: HashSet after attempting to remove null: [blue, green]

    colors.add(null);
    System.out.println("HashSet before removing null: " + colors); //Output: [null, blue, green]
    colors.remove(null);
    System.out.println("HashSet after removing null: " + colors); //Output: [blue, green]
}
}

You can also remove all elements from the set using clear() method

import java.util.HashSet;

public class HashSetClearExample {
    public static void main(String[] args) {
    HashSet<String> names = new HashSet<>();
    names.add("Alice");
    names.add("Bob");
    names.add("Charlie");

    System.out.println("HashSet before clear: " + names); // Output: HashSet before clear: [Bob, Alice, Charlie]

    names.clear();
    System.out.println("HashSet after clear: " + names); // Output: HashSet after clear: []
}
}

Checking for Existence

You can check if an element exists in a HashSet using the contains() method. This method returns true if the element is present in the set, and false otherwise.

import java.util.HashSet;

public class HashSetContainsExample {
    public static void main(String[] args) {
    HashSet<String> animals = new HashSet<>();
    animals.add("dog");
    animals.add("cat");
    animals.add("bird");

    System.out.println("HashSet: " + animals); // Output: HashSet: [dog, cat, bird]

    boolean containsDog = animals.contains("dog");
    System.out.println("Contains 'dog'? " + containsDog); // Output: Contains 'dog'? true

    boolean containsElephant = animals.contains("elephant");
    System.out.println("Contains 'elephant'? " + containsElephant); // Output: Contains 'elephant'? false
}
}

Iterating over a HashSet

You can iterate over the elements of a HashSet using several techniques:

  • Iterator: Using an Iterator is the most general way to iterate over any collection.
  • Enhanced For Loop (for-each loop): This is a more concise and readable way to iterate.
  • Stream API (Java 8+): Using streams allows you to perform functional-style operations on the elements.
import java.util.HashSet;
import java.util.Iterator;

public class HashSetIterationExample {
    public static void main(String[] args) {
    HashSet<String> cities = new HashSet<>();
    cities.add("London");
    cities.add("Paris");
    cities.add("New York");
    cities.add("Tokyo");

    System.out.println("HashSet: " + cities); // Output: HashSet: [New York, Paris, London, Tokyo] (order not guaranteed)

    // 1. Using Iterator
    System.out.println("Iterating using Iterator:");
    Iterator<String> iterator = cities.iterator();
    while (iterator.hasNext()) {
        String city = iterator.next();
        System.out.println(city);
    }

    // 2. Using Enhanced For Loop (for-each loop)
    System.out.println("\nIterating using Enhanced For Loop:");
    for (String city : cities) {
        System.out.println(city);
    }

    // 3. Using Stream API (Java 8+)
    System.out.println("\nIterating using Stream API:");
    cities.stream().forEach(System.out::println);
}
}

Importance of hashCode() and equals() Methods

The correct functioning of a HashSet heavily relies on the proper implementation of the hashCode() and equals() methods of the objects stored in the set. Here's why:

  • hashCode():
    • When you add an element to a HashSet, the hashCode() method of the element is used to determine the bucket (index) in the underlying HashMap where the element should be stored.
    • A good hashCode() implementation should distribute objects evenly across the buckets to avoid collisions. A collision occurs when multiple objects have the same hash code, leading to multiple elements being stored in the same bucket.
    • If two objects are equal according to the equals() method, they must have the same hashCode() value. This is a critical contract!
  • equals():
    • When you try to add an element to a HashSet, and the hashCode() of the element matches the hashCode() of an existing element in the same bucket, the equals() method is used to determine if the two elements are actually equal.
    • If equals() returns true, the new element is considered a duplicate and is not added to the set. If equals() returns false, the new element is added to the same bucket (creating a collision, which can impact performance).
    • The equals() method defines what it means for two objects of a class to be considered equivalent.

Consequences of Incorrect Implementations:

  • If two equal objects have different hash codes, the HashSet might allow duplicates, violating the fundamental property of a set.
  • If the hashCode() method is poorly implemented and generates the same hash code for many different objects, the HashSet's performance will degrade significantly (approaching O(n) for operations like add() and contains()).
  • If equals() is implemented incorrectly, the HashSet will not accurately determine equality between objects, potentially leading to unexpected behavior.

Example: Overriding hashCode() and equals()

import java.util.HashSet;
import java.util.Objects;

class Point {
    private int x;
    private int y;

    public Point(int x, int y) {
    this.x = x;
    this.y = y;
}

public int getX() {
    return x;
}

public int getY() {
    return y;
}

@Override
public boolean equals(Object o) {
    if (this == o) return true;
    if (o == null || getClass() != o.getClass()) return false;
    Point point = (Point) o;
    return x == point.x && y == point.y;
}

@Override
public int hashCode() {
    return Objects.hash(x, y);
}

@Override
public String toString() {
    return "(" + x + ", " + y + ")";
}
}

public class HashCodeEqualsExample {
public static void main(String[] args) {
    HashSet<Point> points = new HashSet<>();
    points.add(new Point(1, 2));
    points.add(new Point(3, 4));
    points.add(new Point(1, 2)); // Duplicate

    System.out.println("HashSet of Points: " + points); // Output: HashSet of Points: [(1, 2), (3, 4)] (no duplicates)
}
}

In this example, the Point class overrides both equals() and hashCode(). Two Point objects are considered equal if their x and y coordinates are the same. The hashCode() method is implemented using Objects.hash(), which generates a hash code based on the x and y coordinates. This ensures that equal Point objects have the same hash code.

Time Complexity of Operations

The time complexity of operations on a HashSet is generally very efficient, provided that the hashCode() method is well-distributed and collisions are minimized.

  • add(element): Average case: O(1). Worst case: O(n) (rare, occurs with many hash collisions).
  • remove(element): Average case: O(1). Worst case: O(n) (rare, occurs with many hash collisions).
  • contains(element): Average case: O(1). Worst case: O(n) (rare, occurs with many hash collisions).
  • size(): O(1)
  • isEmpty(): O(1)
  • clear(): O(1)
  • Iteration (using Iterator or for-each loop): O(n), where n is the number of elements in the set.

The "average case" complexity assumes that the hashCode() method is well-distributed, resulting in a small number of collisions. The "worst case" complexity occurs when many elements have the same hash code, forcing the HashSet to search through a long chain of elements to find or add the correct one.