The Complete Guide to Stream API and Collectors in Java 8
What is the map/filter/reduce go to them about? Let’s take an example. We are going to make a list of Person instances.
Join the DZone community and get the full member experience.Join For Free
What Is the Map/Filter/Reduce Go To Them About?
Let’s take an example. We are going to make a list of Person instances.
Now, suppose we want to compute the average age of the age of the people from this list older than 20.
How Are We Going To Proceed?
- Mapping step.
- The mapping takes a List of Person and returns a List of Integer.
- The size of both lists is the same.
2. Filtering step
- Takes our list of ages, which is a list of integers, and returns, also, a list of integers.
- If my filtering is just a predicate edge greater than 20, then, in the return list, I have all the ages greater than 20.
3. Reduction step
- We’ll just say that it is equivalent to a SQL aggregation. Now, what is a SQL aggregation? It’s, for instance, the sum of the elements or the max, or the mean, or the average, something like that.
- It’s just a simple function that will resume all the integers in our example, in one integer.
What Is a Stream?
- The stream is a Java-type interface. This type is T. It means that we can have streams of integer, streams of person, streams of customer, streams of strings, etc., etc.
- It gives it a way to efficiently process data inside the JVM. It can efficiently process large amounts of data.
- It can process data in parallel to leverage the computing power of multiple CPUs.
- The process is conducted in a Pipeline, because it will avoid unnecessary intermediary computations.
Definition of a Stream in Java 8
Why Can’t a Collection Be a Stream?
- Stream is a new concept, and we don’t want to change the way the collection API works.
What Is a Stream?
- An object on which I can define operations, and by operations, you can think of a map, a filter, or a reduce operation.
- An object that does not hold any data.
- An object that a stream is not allowed to change the data it processes.
- An object able to process data in one pass.
- An object should be optimized from the algorithm point of view and able to process data in parallel.
Building and Consuming a Stream
How Can We Build a Stream?
Well, in fact, we have many patterns to build streams.
Let us see the first one, probably the most useful one. We have a stream method that has been added to the collection interface, so calling persons. stream will open a stream on the list of persons.
- The forEach method defined on the stream interface and pass it a consumer.
- Let us have a look at that Consumer interface. It’s a functional interface, so it has only one abstract method. This is the definition of a functional interface. It can be implemented by a lambda expression.
- It can also be written as a method reference, System. out::println.
- In fact, Consumer is a bit more complex.
- It will allow us to chain consumers.
- Because forEach() does not return anything .
2. Filtering a Stream
- It takes a stream, defined on a source of data, and it filters out part of that data following a predicate.
- The predicate we are using here, just checking that the age of that person is greater than 20.
- Takes a predicate as a parameter. Here is the Predicate.
- It’s a regular lambda expression, person. getAge > 20.
- Let’s have a look at that Predicate interface.It has a single method called test that takes an object as a parameter and that returns a boolean.
- We have to be a bit careful with this way of writing things, because the priority that is usually in action when I’m writing boolean operations, is not taken into account here.
- Warning: method is called do not handle priorities in that case.
- I also have a static method in the predicate interface called isEqual.
- Now, what does this isEqual method do? It creates a new Predicate by comparing the objects passed as a parameter.
- Let us notice that the of() method of the stream interface is a static method, so it’s a new way of declaring patterns of code on interfaces that are used here, that leverage the file in Java 8. I can write static methods in interfaces and is also another way of creating streams in Java.
- The stream that is written every time by the filter method is a new instance of stream, so the stream1 and the stream2 objects are different objects.
Example Consuming and Filtering a Stream
Lazy Operations on a Stream
What Do I Have in This New Stream, Stream2?
- It is nothing. I don’t have any object in the stream. This is the definition of a stream. A stream cannot hold any data.
- This code does not do anything, declaration of operation on a given stream, but no data is processed in this call.
- The call to the filter method is called a lazy call. It means that when I invoke that method, in fact, it’s only a declaration that is taken into account, but no data is processed.
- All the methods of Stream that return another Stream are lazy.
- Another way of saying it is, an operation on a Stream that returns a Stream is called an intermediary operation.
- If we go back to the consumer and consider another consuming method called peek.
- The peek method looks like the forEach method. The only difference is that a peek method returns the stream, whereas, a forEach method does not return anything. Since the peek method returns a stream, we can safely assume that it is an intermediary operation.
- Then I have a call to the filter method. Again, it’s only a declaration, and the final call is another peek method that is also a declaration.
- So, the answer is, this code does not do anything. First, it does not print anything. The System. out::println is never invoked, and the result list remains empty, too, because the result::add is never invoked in that code.
Example: Intermediary and Terminal Operations
Wrapping Up Intermediary and Terminal Operations
- This stream API defines intermediary operation.
- The forEach operation, that was executed, so it must not be an intermediary operation.
- Then the peek operation, that is indeed intermediary and the filter operation.
- On those operations, we can see that the forEach is not lazy.
- We saw that the peek and filter operations were lazy.
The Map Operation
- The map operation implements the first step of the map/filter/reduce algorithm we saw at the beginning of this article.
- The map operation returns a Stream, so we can safely assume that it is an intermediary operation.
- The mapper function is modeled by the Function interface. it does only one, in fact, method called apply. This method takes an object as a parameter, and returns another object.
- We also have a set of default methods to chain and compose mappings.
- Namely, we have two default methods, compose and andThen.
- Those are the complete signatures. Beware the generics! You have to be extra careful when you design such methods, if you want your call to be allowed by extension of the person class, for instance.
- And, I also have one static utility method called identity. What does an identity do? Well, it’s quite obvious. It takes an object and returns that same object.
The Flatmap Operation
- FlatMapping operation is a bit tricky to understand.
- Let us have a look at the signature of this method.
- A flatMap takes a function as an argument, the same kind of function as the map method.
- Now, if I check carefully I see that a map takes an object and returns another object, whereas the flatMap takes an object and returns as a return type, a stream of objects.
- So, the flatMapper takes an element of type T, returns an element of type Stream of something.
- If the flatMap was a regular map, it would return a Stream of those streams that return by the provided function, thus a stream of streams.
- But, since it is a flatMap, it returns a stream of streams that is flattened, and that becomes a single stream.
- What does it mean? It means that all the objects of the included streams are taken up in the holding stream.
Map and Flatmap Examples
Wrapping up Map and Filter on a Stream
We saw 3 categories of operations
- The forEach and the peek that take consumers as parameters
- The filter method that takes a predicate,
- The map on the flatMap method that takes mappers as parameters.
- The mapper is an instance of the functional interface function.
Reduction, Functions, and Bifunctions
- The last step of our map/filter/reduce algorithm is the Reduction step.
- There are Two kinds of reduction included in the Stream API.
- The 1st kind is the basic and classical SQL operation, like the min, the max, the sum, the average, etc.
- That reduction takes two Integers, age1, and age2, and returns the sum of them.
- The 1st argument, this 0 should be the identity element of the reduction operation.
- The 2nd argument is the reduction operation of type BinaryOperator<T>. Here, T is the type integer. In fact, BinaryOperator is a special case of a BiFunction.
- A BiFunction looks like a function. It takes two objects of type T and U here and returns an object of type R.
- The BinaryOperator is just an extension of a BiFunction where all those three types are, in fact, the same. So, a BiFunction takes two objects as parameters of the same type and returns an object, also, of the same type.
Reduction of the Empty Set: Identity Element
- The bifunction takes two arguments, so we may ask two questions.
- What happens if the Stream is empty? and, What happens if the Stream has only one element?
- If the Stream is empty the reduction of the empty Stream is the provided identity element
- If the Stream has only one element, then the reduction of this stream is that element.
- In fact, it is that element combined with the identity element that has been provided.
- The identity element for the sum is 0, then I can reduce my stream, by providing this identity element, 0 and the sum operation, modeled as this lambda expression.
- Let us take an empty stream. An empty stream can be built by calling the static method empty on the stream interface, and we can run that, and the reduction of that stream is indeed 0.
- Let us see another example where we provide a stream with only one element, by calling the off method, once again, a static method from the stream interface, and here, we can see that the reduction of this stream is indeed 1.
- And, as a last example, let us take a regular stream with several integers in it, 1, 2, 3, 4. Let us reduce that stream. It does print 10, which is, of course, the right answer.
- The problem is that the max operation doesn’t have an identity element for the reduction method.
- So, the max of an empty Stream cannot be defined in that way.
- So, what is the return type of this max method?
- If the return type is int, the primitive type from the Java language, then the default value is 0, and 0 is clearly not the identity element of the max method.
- If the return type is Integer, then the default value is null, and I certainly do not want to return a null value, in that case, because I will have to check ever in my code, if the return value is null to avoid null point exceptions.
- The return type, in fact, of this call, is called optional, Optional of Integer.
- What is Optional? It is a new concept in Java 8. It is a class and we can see it as a wrap type.
- Well, the optional of integer looks like a wrap type, the only difference being that, in a wrap type, I always have a value, whereas, in an optional, I might not have a value.
- Returning an optional means there might be no result, and this is exactly what I want to mean here, because if I take the max of an empty stream, I do not know what is the max.
Pattern for the Optionals
- The isPresent method will return true, if I have a value inside my optional, and false, if it not the case. And, if I have a value, I can get it by calling the get method on this optional object.
- The orElse method encapsulates both calls. In fact, it is just to call isPresent and will call the get method for me, if there is an object. But, I can also decide to throw an exception if I want to.
- This orElseThrow method that will build a new exception. I’m just providing a lambda expression that will create that exception for me in a lazy way. That is the exception. We’ll only build, if needed, and this method will return me the string of character, if it is present, and will throw this exception, if it is not.
Wrapping up Reduction Operations
- Available reductions — max(). min() and count() that will return me the number of elements in a stream.
- Boolean reductions — allMatch(), noneMatch(), anyMatch(). All those three methods take predicates as parameters, and the allMatch method will return true, if the predicate returns true for all the elements of that stream.
- Reductions that return an optional, other than max and mean. findFirst() and findAny() are such methods.
- Reductions are called terminal operations.
- They trigger the processing of data.
Example: Reductions, Optionals
Wrapping up Operations and Optionals
- The reduction is just a classical SQL operation.
- the difference between intermediary and terminal operation, and, namely, that intermediary adds just declarations of operation of a stream and that the terminal operation triggers the computation on the stream.
- This Optional notion is needed because default values can’t always be defined on the reduction step.
Collectors, Collecting in a String, in a List
- Let us see now the second kind of reduction. If you check the Java doc, you will see that this reduction is called the mutable reduction.
- Why? Because we are going to reduce the stream in a container that is mutable because we are going to add all the elements of the stream in that container.
- This collect method takes so collectors. joining with a string of character as a parameter.
- So as a result of the string, we saw the name of the people in the person list older than 20, separated by commas.
- We can also Collect in a List. This is basically the same kind of processing on the stream that we can do, but this time, instead of collecting all the names in a string, we collect them in a list.
Collecting in a Map
- We take this stream built on persons. We filter out all the people younger than 20, and we collect them in a map. Now, what is this map made of? Well, we just pass Person::getAge. Once, again, this is a method reference.
- It is possible to post-process the values.
Example: Processing Streams
- We had a quick explanation on the map/filter/reduce algorithm, and once again, this algorithm is not typical of the Java platform. It’s a general algorithm.
- Then we defined what a Stream is. We saw several patterns to build streams.
- We saw the difference between intermediary and final operations — the first one being lazy, and the second one triggers the processing of the data.
- We saw several consuming operations — the forEach operation, which is final, and the peek operation, which is an intermediary.
- We saw two mapping operations — first, map(), and then, flatMap().
- We saw the filter operation with the filter method.
- We saw the reduction step, with the reduction operations. We have two kinds of reduction operations — the first kind is the SQL aggregation, the max, the min, the sum, the count, etc., and the second kind, being the mutable reduction.
- The mutable reduction is an extremely powerful tool with the collect method, the Collector interface, and the Collectors class. It allows us to build very quickly, a complex structure to reduce the elements of our stream.
- Thank you for reading this article about map/filter/reduce, the stream API, and the Collectors in Java 8.
Opinions expressed by DZone contributors are their own.