Module: Stream API

Collectors

Java Core: Stream API - Collectors

Collectors are a powerful tool in the Java Stream API, used to reduce a stream of elements to a single result. They define how the elements of a stream are accumulated into a collection or other data structure. They are essential for performing complex operations on streams beyond simple filtering and mapping.

1. What are Collectors?

  • Purpose: Collectors provide a way to accumulate the results of a stream pipeline into a meaningful result. This could be a List, Set, Map, a single value, or even a custom data structure.
  • java.util.stream.Collectors Class: The Collectors class provides static factory methods for common collector operations.
  • Collector Interface: Collectors are instances of the Collector interface. Implementing this interface directly allows for highly customized collection logic, but using the factory methods in Collectors is usually sufficient.

2. Key Methods of the Collector Interface

The Collector interface defines several methods that describe the accumulation process:

  • supplier(): Returns a Supplier that provides a container for accumulating the results. This is the starting point for the accumulation. (e.g., a new ArrayList or HashSet).
  • accumulator(): Takes the accumulator (the container from supplier()) and the current element from the stream and updates the accumulator. This is where the core logic of collecting the elements happens. It's a BiConsumer.
  • combiner(): Used for parallel streams. Combines two accumulators into a single accumulator. This is crucial for ensuring correct results when processing streams in parallel. It's a BiFunction.
  • finisher(): Transforms the accumulated result into the final result. This is a Function that takes the accumulator and returns the final collected object.
  • characteristics(): Returns a Set of Collector.Characteristics that describe the collector's behavior. These characteristics can be used by the stream pipeline for optimization (e.g., IDENTITY_FINISH if no finishing is needed).

3. Common Collector Factory Methods in Collectors

The Collectors class provides convenient factory methods for common collection scenarios:

  • toList(): Collects elements into a List.

    List<String> names = stream.collect(Collectors.toList());
    
  • toSet(): Collects elements into a Set.

    Set<String> uniqueNames = stream.collect(Collectors.toSet());
    
  • toMap(keyMapper, valueMapper): Collects elements into a Map. Requires functions to map elements to keys and values.

    Map<String, Integer> nameToAge = stream.collect(
        Collectors.toMap(person -> person.getName(), person -> person.getAge())
    );
    
    • Handling Key Collisions: If duplicate keys are encountered, toMap throws an IllegalStateException by default. You can provide a merge function to resolve collisions:
      Map<String, Integer> nameToAge = stream.collect(
          Collectors.toMap(person -> person.getName(), person -> person.getAge(),
                           (oldValue, newValue) -> oldValue + newValue) // Merge function
      );
      
  • toCollection(supplier): Collects elements into a collection created by the provided Supplier.

    LinkedList<String> names = stream.collect(Collectors.toCollection(LinkedList::new));
    
  • counting(): Counts the number of elements in the stream. Returns a Long.

    long count = stream.collect(Collectors.counting());
    
  • summingInt(toIntFunction) / summingLong(toLongFunction) / summingDouble(toDoubleFunction): Calculates the sum of elements. Requires a function to map elements to int, long, or double.

    int totalAge = stream.collect(Collectors.summingInt(person -> person.getAge()));
    
  • averagingInt(toIntFunction) / averagingLong(toLongFunction) / averagingDouble(toDoubleFunction): Calculates the average of elements.

    double averageAge = stream.collect(Collectors.averagingInt(person -> person.getAge()));
    
  • minBy(comparator) / maxBy(comparator): Finds the minimum or maximum element based on the provided Comparator. Returns an Optional.

    Optional<Person> youngestPerson = stream.collect(Collectors.minBy(Comparator.comparingInt(Person::getAge)));
    
  • groupingBy(classifier): Groups elements based on a classifier function. Returns a Map where the key is the result of the classifier and the value is a List of elements belonging to that group.

    Map<String, List<Person>> peopleByCity = stream.collect(
        Collectors.groupingBy(person -> person.getCity())
    );
    
  • partitioningBy(predicate): Partitions elements into two groups based on a predicate. Returns a Map with two keys: true (for elements that satisfy the predicate) and false (for elements that don't).

    Map<Boolean, List<Person>> adultsAndMinors = stream.collect(
        Collectors.partitioningBy(person -> person.getAge() >= 18)
    );
    
  • collectingAndThen(collector, finisher): Applies a collector and then a finishing function to the result.

    String namesConcatenated = stream.collect(
        Collectors.collectingAndThen(Collectors.toList(), list -> String.join(", ", list))
    );
    

4. Custom Collectors

You can create custom collectors by implementing the Collector interface. This is useful when you need very specific accumulation logic.

public class CustomCollectorExample {

    public static Collector<String> toUpperCaseCollector() {
        return new Collector<String>() {
            @Override
            public Supplier<List<String>> supplier() {
                return ArrayList::new;
            }

            @Override
            public BiConsumer<List<String>, String> accumulator() {
                return (list, str) -> list.add(str.toUpperCase());
            }

            @Override
            public BinaryOperator<List<String>> combiner() {
                return (list1, list2) -> { list1.addAll(list2); return list1; };
            }

            @Override
            public Function<List<String>, List<String>> finisher() {
                return Function.identity(); // No finishing needed
            }

            @Override
            public Set<Characteristics> characteristics() {
                return Collections.emptySet();
            }
        };
    }

    public static void main(String[] args) {
        List<String> words = Arrays.asList("apple", "banana", "cherry");
        List<String> upperCaseWords = words.stream()
                .collect(toUpperCaseCollector());
        System.out.println(upperCaseWords); // Output: [APPLE, BANANA, CHERRY]
    }
}

5. Characteristics

Collector.Characteristics are used to provide hints to the stream pipeline for optimization. Some common characteristics:

  • IDENTITY_FINISH: The finisher function simply returns the accumulator without modification.
  • CONCURRENT: The collector can be used in parallel without requiring synchronization.
  • UNORDERED: The order of elements is not important.
  • DISTINCT: The collector handles duplicate elements.

6. Best Practices

  • Choose the right collector: Select the collector that best suits your needs. Avoid creating custom collectors if a standard collector can achieve the same result.
  • Consider performance: For large streams, be mindful of the performance implications of your collector. Use characteristics to help the stream pipeline optimize the collection process.
  • Handle key collisions in toMap: Always provide a merge function when using toMap to handle potential key collisions.
  • Use Optional for minBy and maxBy: Remember that minBy and maxBy return Optional objects, so handle the case where the stream is empty.