Java Core: Stream API - Collectors
Collectors are a powerful tool in the Java Stream API, used to reduce a stream of elements to a single result. They define how the elements of a stream are accumulated into a collection or other data structure. They are essential for performing complex operations on streams beyond simple filtering and mapping.
1. What are Collectors?
- Purpose: Collectors provide a way to accumulate the results of a stream pipeline into a meaningful result. This could be a
List,Set,Map, a single value, or even a custom data structure. java.util.stream.CollectorsClass: TheCollectorsclass provides static factory methods for common collector operations.CollectorInterface: Collectors are instances of theCollectorinterface. Implementing this interface directly allows for highly customized collection logic, but using the factory methods inCollectorsis usually sufficient.
2. Key Methods of the Collector Interface
The Collector interface defines several methods that describe the accumulation process:
supplier(): Returns aSupplierthat provides a container for accumulating the results. This is the starting point for the accumulation. (e.g., a newArrayListorHashSet).accumulator(): Takes the accumulator (the container fromsupplier()) and the current element from the stream and updates the accumulator. This is where the core logic of collecting the elements happens. It's aBiConsumer.combiner(): Used for parallel streams. Combines two accumulators into a single accumulator. This is crucial for ensuring correct results when processing streams in parallel. It's aBiFunction.finisher(): Transforms the accumulated result into the final result. This is aFunctionthat takes the accumulator and returns the final collected object.characteristics(): Returns aSetofCollector.Characteristicsthat describe the collector's behavior. These characteristics can be used by the stream pipeline for optimization (e.g.,IDENTITY_FINISHif no finishing is needed).
3. Common Collector Factory Methods in Collectors
The Collectors class provides convenient factory methods for common collection scenarios:
toList(): Collects elements into aList.List<String> names = stream.collect(Collectors.toList());toSet(): Collects elements into aSet.Set<String> uniqueNames = stream.collect(Collectors.toSet());toMap(keyMapper, valueMapper): Collects elements into aMap. Requires functions to map elements to keys and values.Map<String, Integer> nameToAge = stream.collect( Collectors.toMap(person -> person.getName(), person -> person.getAge()) );- Handling Key Collisions: If duplicate keys are encountered,
toMapthrows anIllegalStateExceptionby default. You can provide a merge function to resolve collisions:Map<String, Integer> nameToAge = stream.collect( Collectors.toMap(person -> person.getName(), person -> person.getAge(), (oldValue, newValue) -> oldValue + newValue) // Merge function );
- Handling Key Collisions: If duplicate keys are encountered,
toCollection(supplier): Collects elements into a collection created by the providedSupplier.LinkedList<String> names = stream.collect(Collectors.toCollection(LinkedList::new));counting(): Counts the number of elements in the stream. Returns aLong.long count = stream.collect(Collectors.counting());summingInt(toIntFunction)/summingLong(toLongFunction)/summingDouble(toDoubleFunction): Calculates the sum of elements. Requires a function to map elements toint,long, ordouble.int totalAge = stream.collect(Collectors.summingInt(person -> person.getAge()));averagingInt(toIntFunction)/averagingLong(toLongFunction)/averagingDouble(toDoubleFunction): Calculates the average of elements.double averageAge = stream.collect(Collectors.averagingInt(person -> person.getAge()));minBy(comparator)/maxBy(comparator): Finds the minimum or maximum element based on the providedComparator. Returns anOptional.Optional<Person> youngestPerson = stream.collect(Collectors.minBy(Comparator.comparingInt(Person::getAge)));groupingBy(classifier): Groups elements based on a classifier function. Returns aMapwhere the key is the result of the classifier and the value is aListof elements belonging to that group.Map<String, List<Person>> peopleByCity = stream.collect( Collectors.groupingBy(person -> person.getCity()) );partitioningBy(predicate): Partitions elements into two groups based on a predicate. Returns aMapwith two keys:true(for elements that satisfy the predicate) andfalse(for elements that don't).Map<Boolean, List<Person>> adultsAndMinors = stream.collect( Collectors.partitioningBy(person -> person.getAge() >= 18) );collectingAndThen(collector, finisher): Applies a collector and then a finishing function to the result.String namesConcatenated = stream.collect( Collectors.collectingAndThen(Collectors.toList(), list -> String.join(", ", list)) );
4. Custom Collectors
You can create custom collectors by implementing the Collector interface. This is useful when you need very specific accumulation logic.
public class CustomCollectorExample {
public static Collector<String> toUpperCaseCollector() {
return new Collector<String>() {
@Override
public Supplier<List<String>> supplier() {
return ArrayList::new;
}
@Override
public BiConsumer<List<String>, String> accumulator() {
return (list, str) -> list.add(str.toUpperCase());
}
@Override
public BinaryOperator<List<String>> combiner() {
return (list1, list2) -> { list1.addAll(list2); return list1; };
}
@Override
public Function<List<String>, List<String>> finisher() {
return Function.identity(); // No finishing needed
}
@Override
public Set<Characteristics> characteristics() {
return Collections.emptySet();
}
};
}
public static void main(String[] args) {
List<String> words = Arrays.asList("apple", "banana", "cherry");
List<String> upperCaseWords = words.stream()
.collect(toUpperCaseCollector());
System.out.println(upperCaseWords); // Output: [APPLE, BANANA, CHERRY]
}
}
5. Characteristics
Collector.Characteristics are used to provide hints to the stream pipeline for optimization. Some common characteristics:
IDENTITY_FINISH: The finisher function simply returns the accumulator without modification.CONCURRENT: The collector can be used in parallel without requiring synchronization.UNORDERED: The order of elements is not important.DISTINCT: The collector handles duplicate elements.
6. Best Practices
- Choose the right collector: Select the collector that best suits your needs. Avoid creating custom collectors if a standard collector can achieve the same result.
- Consider performance: For large streams, be mindful of the performance implications of your collector. Use characteristics to help the stream pipeline optimize the collection process.
- Handle key collisions in
toMap: Always provide a merge function when usingtoMapto handle potential key collisions. - Use
OptionalforminByandmaxBy: Remember thatminByandmaxByreturnOptionalobjects, so handle the case where the stream is empty.