Parallelism in ConcurrentHashMap

ConcurrentHashMap is used extensively in multi-threaded applications. In this article, we will learn more about parallelism in Concurrent Hashmaps.

Unni Mana

Sep. 13, 23 · Tutorial

Likes (4)

Comment

Save

5.4K Views

ConcurrentHashMap is used extensively in multi-threaded applications. Examples of multi-threaded applications are online gaming applications and chat applications, which add the benefit of concurrency to the application. To make the application more concurrent in nature, ConcurrentHashMap introduces a concept called ‘Parallelism.’

In this article, we will learn more about parallelism in Concurrent Hashmaps.

What Is Parallelism?

Basically, parallel computing divides a problem into subproblems, solves those subproblems parallelly, and finally joins the results of the subproblems. Here, the subproblems will run in separate threads.

Java Support to Parallelism in `ConcurrentHashMap`

In order to make use of parallelism in ConcurrentHashMap, we need to use Java 1.8 version onwards. Parallelism is not supported in Java versions less than 1.8.

Common Framework for Parallel Processing

Java has introduced a framework called ‘fork and join’ that will enable parallel computing. It makes use of java.util.concurrent.ForkJoinPool API to achieve parallel computing. This API is used to implement parallelism in ConcurrentHashMap.

Parallel Methods in `ConcurrentHashMap`

ConcurrentHashMap effectively uses parallel computing with the help of parallelism threshold. It is a numerical value, and the default value is two.

These are the following methods that have parallelism capabilities in ConcurrentHashMap.

forEach()
reduce()
reduceEntries()
forEachEntry()
forEachKey()
forEachValue()

The concurrentHashMap deals with parallelism slightly differently, and you will understand that if you look at the arguments of these above methods. Each of these methods can take the parallelism threshold as an argument.

First of all, parallelism is an optional feature. We can enable this feature by adding the proper parallel threshold value in the code.

Usage of `ConcurrentHashMap` Without Parallelism

Let us take an example of replacing all the string values of a concurrenthashmap. This is done without using parallelism.

Example:

concurrentHashMap.forEach((k,v) -> v=””);

It is pretty straightforward, and we are iterating all the entries in a concurrenthashmap and replacing the value with an empty string. In this case, we are not using parallelism.

Usage of `ConcurrentHashMap` With Parallelism

Example:

concurrentHashMap.forEach(2, (k,v) -> v=””);

The above example iterates a ConcurrentHashMap and replaces the value of a map with an empty string. The arguments to the forEach() method are parallelism threshold and a functional interface. In this case, the problem will be divided into subproblems.

The problem is replacing the concurrent hashmap's value with an empty string. This is achieved by dividing this problem into subproblems, i.e., creating separate threads for subproblems, and each thread will focus on replacing the value with an empty string.

What Happens When Parallelism Is Enabled?

When the parallelism threshold is enabled, JVM will create threads, and each thread will run to solve the problem and join the results of all the threads. The significance of this value is that if the number of records has reached a certain level (threshold), then only JVM will enable parallel processing in the above example. The application will enable parallel processing if there is more than one record in the map.

This is a cool feature; we can control the parallelism by adjusting the threshold value. This way, we can take advantage of parallel processing in the application.

Take a look at another example below:

concurrentHashMap.forEach(10000, (k,v) -> v=””);

In this case, the parallelism threshold is 10,000, which means that if the number of records is less than 10,000, JVM will not enable parallelism when replacing the values with an empty string.

Fig: Full code example without parallelism

Fig: Full code example with parallelism

In the above example, the parallelism threshold is 10,000.

Performance Comparison of Parallel Processing

The following code replaces all the values in the map with an empty string. This concurrenthash map contains more than 100,000 entries in it. Let’s compare the performance of the below code without and with parallelism.

Fig: Comparison of the code both with and without parallelism

After running the above code, you can see there is a little performance improvement in the case of normal forEach operation.

time without parallelism->20 milliseconds

time with parallelism->30 milliseconds

This is because the number of records on the map is fairly low.

But if we add 10 million records to the map, then parallelism really wins! It takes less time to process the data. Take a look at the code in the below image:

Fig: Threshold of the code with and without parallelism

The above code replaces all the values in the concurrenthashmap with an empty string without using parallelism. Next, it uses parallelism to replace all the values of the concurrenthashmap with string one. This is the output:

time without parallelism->537 milliseconds

time with parallelism->231 milliseconds

You can see that in the case of parallelism, it only takes half of the time.

Note: The above values are not constant. It may produce different results in different systems.

Thread Dump Analysis for Parallelism

JVM uses the ForkJoinPool framework to enable parallel processing when we enable parallelism in the code. This framework creates a few worker threads based on the demand in the current processing. Let’s take a look at the thread dump analysis with parallelism enabled using the fastthread.io tool for the above code.

Fig: fastThread report showing the thread count with parallelism enabled

Fig: fastThread report showing the identical stacktrace by enabling parallelism

You can understand from the above picture that it is using more threads.

The reason for too many running threads is that it is using ForkJoinPool API. This is the API that is responsible for implementing the 'parallelism' behind the scenes. You will understand this difference when you look at the next section.

View the report.

Thread Dumps Analysis Without Parallelism

Let us understand the thread dump analysis without enabling parallelism.

Fig: fastThread report showing thread count without parallelism enabled

Fig: fastThread report showing the identical stacktrace without enabling parallelism

If you look closely at the above image, you can understand that only a few threads are used. In this case, there are only 35 threads as compared to the previous image. There are 32 runnable threads in this case. But, waiting and timed_waiting threads are 2 and 1, respectively. The reason for the reduced number of runnable threads, in this case, is that it is not calling the ForkJoinPool API.

View the report.

This way, the fastthread.io tool can provide a good insight into the thread dump internals very smartly.

Summary

We focused on parallelism in the concurrenthashmap and how this feature can be used in the application. Also, we understood what happens with the JVM when we enable this feature. Parallelism is a cool feature that can be used well in modern concurrent applications.

Java virtual machine Parallel computing Strings Java (programming language)

Opinions expressed by DZone contributors are their own.

Related

Trending