Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Performance With Java8 Streams

DZone's Guide to

Performance With Java8 Streams

Here, we take a look at the mechanisms of the three main factors in Java8 Streams added to boost application performance.

· Performance Zone
Free Resource

Learn how real real-time monitoring is critical for DevOps. Because you can't build what you can't see.

Java8 Streams was a very big addition to the Java programming language which was introduced to achieve the best performance. In the last few years, the programming has been changed drastically per the way hardware evolved, where parallel processing, real-time, cloud, and several other approaches have been introduced to achieve higher performance.

In Java8 Streams, performance is achieved by parallelism, laziness, and using short-circuit operations, but there is a downside as well, and we need to be very cautious while choosing Streams, as it may degrade the performance of your application.

Let us look at these factors which are meant for Streams' performance.

Parallelism

Parallelism utilizes hardware capabilities at their best, as nowadays, more CPU cores are available on a computer, so it doesn't make sense to have a single thread in a multi-core system. Designing and writing multi-threaded applications is challenging and error-prone, hence Streams has two implementations: sequential and parallel. Using parallel Streams is easy and no expertise is needed for thread handling.

In Java Streams, parallelism is achieved by using the  Fork-Join principle. As per the Fork-Join principle, it divides larger tasks into smaller sub-tasks (known as forking), and then processes the sub-tasks in parallel to utilize all the available hardware, then combines the results together (known as Join) to form an integrated result.

You need to be very wise when you are choosing between sequential and parallel; as parallelism doesn't mean to be better always.

Let us look at an example.

StreamTest.java:

package test;
import java.util.ArrayList;
import java.util.List;

public class StreamTest {

 static List < Integer > myList = new ArrayList < > ();

 public static void main(String[] args) {

  for (int i = 0; i < 5000000; i++)
   myList.add(i);

  int result = 0;
  long loopStartTime = System.currentTimeMillis();
  for (int i: myList) {
   if (i % 2 == 0)
    result += i;
  }

  long loopEndTime = System.currentTimeMillis();

  System.out.println(result);

  System.out.println("Loop total Time = " + (loopEndTime - loopStartTime));

  long streamStartTime = System.currentTimeMillis();

  System.out.println(myList.stream().filter(value -> value % 2 == 0).mapToInt(Integer::intValue).sum());

  long streamEndTime = System.currentTimeMillis();
  System.out.println("Stream total Time = " + (streamEndTime - streamStartTime));

  long parallelStreamStartTime = System.currentTimeMillis();
  System.out.println(myList.parallelStream().filter(value -> value % 2 == 0).mapToInt(Integer::intValue).sum());

  long parallelStreamEndTime = System.currentTimeMillis();

  System.out.println("Parallel Stream total Time = " + (parallelStreamEndTime - parallelStreamStartTime));
 }
}

O/P -

820084320
Loop total Time = 17
820084320
Stream total Time = 81
820084320
Parallel Stream total Time = 30

As you can see, a for loop is really good in this case; hence, without proper analysis, don't replace for loop with streams.  Here we can see the good performance of parallel Streams over normal Streams. 

Note: Results may vary on different hardware.

Laziness

As we know, Java8 Streams have two types of operations, known as Intermediate and Terminal. These two operations are meant for processing and providing the end results, respectively. You might have seen that if a terminal operation is not associated with intermediate operations, it can't be executed.

In summary, intermediate operations just create another stream, but won't perform any processing until the terminal operation is called. Once the terminal operation is called, traversal of streams begins and the associated function is applied one by one. Intermediate operations are lazy operations, so Streams supports laziness.

Note: In the case of parallel streams, this won't traverse the streams one by one at the terminal, but in parallel, and depends upon the number of cores your machine has.

Consider a situation where we have a snippet of streams, but with intermediate operations only, and the terminal operations are placed later in the application (which may or may not be required and depends upon the user request). In this case, the streams intermediate operations will create another stream for terminal operations but will not perform the actual processing; which would be helpful to improve performance.

Let us look at the laziness example:

StreamLazinessTest.java:

package test;
import java.util.ArrayList;
import java.util.List;
import java.util.stream.Collectors;
import java.util.stream.Stream;

public class StreamLazinessTest {

 /** Employee class **/
 static class Employee {
  int id;
  String name;
  public Employee(int id, String name) {
   this.id = id;
   this.name = name;
  }

  public String getName() {
   return this.name;
  }
 }

 public static void main(String[] args) throws InterruptedException {

  List < Employee > employees = new ArrayList < > ();

  /** Creating the employee list **/
  for (int i = 1; i < 10000000; i++) {
   employees.add(new StreamLazinessTest.Employee(i, "name_" + i));
  }

  /** Only Intermediate Operations; it will just create another streams and 
   * will not perform any operations **/
  Stream < String > employeeNameStreams = employees.parallelStream().filter(employee -> employee.id % 2 == 0)
   .map(employee -> {
    System.out.println("In Map - " + employee.getName());
    return employee.getName();
   });

  /** Adding some delay to make sure nothing has happen till now **/
  Thread.sleep(2000);
  System.out.println("2 sec");

  /** Terminal operation on the stream and it will invoke the Intermediate Operations
   * filter and map **/
  employeeNameStreams.collect(Collectors.toList());
 }
}

If you run the above code, you can see the intermediate operations won't be executed until the terminal operation is invoked.

Short-Circuit Behavior

This is another way of optimizing the Streams processing. Short-circuiting will terminate the processing once condition met. There are a number of short-circuiting operations available. For e.g. anyMatch, allMatch, findFirst, findAny, limit, etc.

Let us look at an example.

StreamShortCircuitTest.java:

package test;
import java.util.ArrayList;
import java.util.List;
import java.util.stream.Collectors;
import java.util.stream.Stream;

public class StreamShortCircuitTest {

 /** Employee class **/
 static class Employee {
  int id;
  String name;
  public Employee(int id, String name) {
   this.id = id;
   this.name = name;
  }
  public int getId() {
   return this.id;
  }

  public String getName() {
   return this.name;
  }
 }

 public static void main(String[] args) throws InterruptedException {

  List < Employee > employees = new ArrayList < > ();

  for (int i = 1; i < 10000000; i++) {
   employees.add(new StreamShortCircuitTest.Employee(i, "name_" + i));
  }

  /** Only Intermediate Operations; it will just create another streams and 
   * will not perform any operations **/
  Stream < String > employeeNameStreams = employees.stream().filter(e -> e.getId() % 2 == 0)
   .map(employee -> {
    System.out.println("In Map - " + employee.getName());
    return employee.getName();
   });

  long streamStartTime = System.currentTimeMillis();

  /** Terminal operation with short-circuit operation: limit **/
  employeeNameStreams.limit(100).collect(Collectors.toList());

  System.out.println(System.currentTimeMillis() - streamStartTime);
 }
}

If you run the above code, you will see the huge performance boost, as it took just 6 milliseconds on my machine. Here, the limit()  method will terminate the condition once met.

Last but not least, there are two types of intermediate operations as per the state - Stateful and Stateless intermediate operations.

Stateful Intermediate Operations

These intermediate operations need to store the state, and hence can cause bad performance of your applications, e.g. distinct(), sort(), limit(), etc.

Stateless Intermediate Operations

These intermediate operations can be processed independently as they don't need to remember the state, e.g. filter(), map(), etc.

Here we have learned that Stream is meant for better performance, but it's not always effective, and we need to use it wisely.

Happy learning!

Get real-time alerts and visualizations across your cloud infrastructure for real real-time cloud monitoring. Try it FREE now

Topics:
java ,java8 ,streams ,performance

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}