Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Should I Parallelize Java 8 Streams?

DZone 's Guide to

Should I Parallelize Java 8 Streams?

What do we need to consider before parallelizing Java streams?

· Java Zone ·
Free Resource

Java parallel streams

[Java] parallel streams

In Java 8, the streams API is easy to iterate over collections, and it's easy to parallelize a stream by calling the parallelStream() method. But should we be using parallelStream() wherever we can? What are the considerations? 

You may also like: Think Twice Before Using Java 8 Parallel Streams

Look at the following ParallelStreamTester class to generate collections of different sizes for the purpose of testing parallel streams performance against a sequential stream.  

public class ParallelStreamTester {
	static int COLLECTION_SIZE = 100000;

	private static Collection <Person> getPersonCollection (){
	List <Person> personList = new ArrayList <Person> ();

	String [] names = {"David", "Marry", "Satya", "Matt", "Patrick", "Bill", "Mike", "Jake", "Amber", "Dianne"};
	int [] age = {10, 20, 30, 40, 50, 60, 70, 80, 90, 100};
	String [] states = {"NY", "MA", "MO", "CA", "TX", "MN", "WA", "PE", "NE", "NH", "OH"};

	for (int i=0; i< COLLECTION_SIZE; i++){
		personList.add(new Person (names [getRandom()], age[getRandom()], states [getRandom()]));
	}

	System.out.println ("Generated the collection \n");
	return personList;
}


  // more code


Now, consider the following code snippet to test the performance of the sequential. Get all of the Persons who are older than 50 from “NY” or "TX" with names that start with “M”.

private static void sequentialStreamPerformance (Collection <Person> persons){
    long t1 = System.currentTimeMillis(), count;

    count = persons.stream().
    	filter(x-> (x.getState().equals("NY") || x.getState().equals("TX")))
    		.filter(x-> x.getAge() > 50)
    			.filter(x-> x.getName().startsWith("M"))
    				.count();

    long t2 = System.currentTimeMillis();
    System.out.println("Count = " + count + " Normal Stream Takes " + (t2-t1) + " ms\n");
}


And for parallel stream performance:

private static void parallelStreamPerformance (Collection <Person> persons){
    long t1 = System.currentTimeMillis(), count;

    count = persons.parallelStream().
    	filter(x-> (x.getState().equals("NY") || x.getState().equals("TX")))
    		.filter(x-> x.getAge() > 50)
    			.filter(x-> x.getName().startsWith("M"))
   					 .count();

    long t2 = System.currentTimeMillis();
    System.out.println("Count = " + count + " Parallel Stream takes " + (t2-t1) + " ms\n");
}


Now, let's run some tests by varying the value of COLLECTION_SIZE.  Start with a value of 100 and steadily increase the value up to 10000000 each time, taking note of the time taken. Here is my observed result:

Regular V. Parallel Streams - time taken

  • Sequential streams outperformed parallel streams when the number of elements in the collection was less than 100,000.
  • Parallel streams performed significantly better than sequential streams when the number of elements was more than 100,000.

What about synchronization problems when using parallel Streams?

If a shared resource is used by the predicate, and functions are used in the process, we need to make sure the access is controlled and thread-safe.

A parallel stream has a much higher overhead compared to a sequential stream. Coordinating the threads takes a significant amount of time. Sequential streams sound like the default choice unless there is a performance problem to be addressed.

The code used in this POC can be found on GitHub.

Further Reading

Think Twice Before Using Java 8 Parallel Streams

Dipping Into Java 8 Parallel Streams

What's Wrong in Java 8, Part III: Streams and Parallel Streams

Topics:
peperformance ,parallel streams ,java stream api ,java ,java 8 ,streams ,streams api

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}