Custom Collectors in Java 8

DZone 's Guide to

Custom Collectors in Java 8

Streams in Java 8 are pretty cool out-of-the-box, but what happens if you need something a little more than what comes loaded with the JDK?

· Java Zone ·
Free Resource

Among the many features available in Java 8, streams seem to be one of the biggest game changers regarding the way to write Java code. Usage is quite straightforward: The stream is created from a collection (or from a static method of a utility class), it’s processed using one or many of the available stream methods, and the collected back into a collection or an object. One generally uses one of the static methods that the Collectors utility class offers:

  • Collectors.toList()
  • Collectors.toSet()
  • Collectors.toMap()
  • etc.

Sometimes, however, there’s a need for more. The goal of this post is to describe how to achieve that.

The Collector Interface

Every one of the above static methods returns a Collector. But what is a Collector? The following is a simplified diagram:

Interface From the JavaDocs
Supplier Represents a supplier of results. There is no requirement that a new or distinct result be returned each time the supplier is invoked.
BiConsumer Represents an operation that accepts two input arguments and returns no result. Unlike most other functional interfaces, BiConsumer is expected to operate via side-effects.
Function Represents a function that accepts one argument and produces a result.
BinaryOperator Represents an operation upon two operands of the same type, producing a result of the same type as the operands. This is a specialization of BiFunction for the case where the operands and the result are all of the same type.

The documentation of each dependent interface doesn’t tell much, apart from the obvious. Looking at the Collector documentation yields a little more:

Collector is specified by four functions that work together to accumulate entries into a mutable result container, and optionally perform a final transform on the result. They are:

  • Creation of a new result container (supplier())
  • Incorporating a new data element into a result container (accumulator())
  • Combining two result containers into one (combiner())
  • Performing an optional final transform on the container (finisher())

The Stream.collect() Method

The real insight comes from the Stream.collect() method documentation:

Performs a mutable reduction operation on the elements of this stream. A mutable reduction is one in which the reduced value is a mutable result container, such as an ArrayList, and elements are incorporated by updating the state of the result rather than by replacing the result. This produces a result equivalent to:

R result = supplier.get();
for (T element : this stream)
    accumulator.accept(result, element);
return result;

Note the combiner() method is not used — it is only used within parallel streams, and for simplification purpose, will be set aside for the rest of this post.


Let’s have some examples to demo the development of custom collectors.

Single-Value Example

To start, let’s compute the size of a collection using a collector. Though not very useful, it’s a good introduction. Here are the requirements for the four interfaces:

  1. Since the end result should be an integer, the supplier should probably also return some kind of integer. The problem is that neither int nor Integer are mutable, and this is required for the next step. A good candidate type would be MutableInt from Apache Commons Lang.
  2. The accumulator should only increment the MutableInt, whatever the element in the collection is.
  3. Finally, the finisher just returns the int value wrapped by the MutableInt.

The source is available on GitHub.

Grouping Example

The second example shall be more useful. From a collection of strings, let’s create an Apache Commons Lang multi-valued map:

  • The key should be a char
  • The corresponding values should be the strings that start with this char
  1. The supplier is pretty straightforward, it returns a MultiValuedMap instance
  2. The accumulator just calls the put method from the multi-valued map, using the above “specs”
  3. The finisher returns the map itself

The source is available on GitHub.

Partitioning Example

The third example matches a use-case I encountered this week: given a collection and a predicate, dispatch elements that match into a collection and elements that do not into another.

  1. As the supplier returns a single instance, a new data structure e.g. DoubleList should first be designed
  2. The accumulator must be initialized with the predicate, so that the accept() contract method signature is the same.
  3. As for the above example, the finisher should return the DoubleList itself

The source is available on GitHub.

Final Consideration

Developing a custom collector is not that hard, provided one understands the basic concepts behind it.

The real issue behind collectors is the whole Stream API. Streams need to be created first and then collected afterward. Newer languages, with the Functional Programming paradigm designed from the start — such as Scala or Kotlin — provide collections with such capabilities directly backed-in.

For example, to filter out something from a map in Java:

        .filter( entry -> entry.getKey().length() == 4)
        .collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue));

That would translate as the following in Kotlin:

map.entries.filter { it.key.length == 4 }
collectors, java, streams, tutorial

Published at DZone with permission of Nicolas Frankel . See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}