{{announcement.body}}
{{announcement.title}}

A New Concurrent HashMap

DZone 's Guide to

A New Concurrent HashMap

Check out a developer's simplified implementation of Java's Concurrent HashMap that is more efficient for writes.

· Java Zone ·
Free Resource

The following shows an algorithm for a simple concurrent HashMap, which, for writes, scales better than java.util.concurrent.ConcurrentHashMap. java.util.concurrent.ConcurrentHashMap is fast, sophisticated, and rather complicated. And I needed a similar data structure for vmlens, a tool to test concurrent Java. 

I cannot use java.util.concurrent.ConcurrentHashMap, since I also want to trace the internals of java.util.concurrent.ConcurrentHashMap  with this tool.

So, I decided to implement only the bare minimum, the method computeIfAbsent. This led to a simple, yet scalable concurrent HashMap.

You can download the source code it from GitHub here.

The Algorithm

The algorithm uses open addressing with linear probing. It is based on work from Jeff Preshing, which again is based on work from Dr. Cliff Click. This GitHub repository contains the source code of the HashMap from Dr. Cliff Click.

Open addressing means that the key-value pairs are stored in a single array. The index to store a key-value pair is given by the hash code modulo the array size.

Linear probing means that if this array element is already occupied, we use the next array element. And so on, until we have found an empty slot. Here is an example for an insert after two filled slots:

HashMap implementation diagram

To make the algorithm concurrent, we need to fill the empty array element using compareAndSet. By using compareAndSet, we make the checking for null and the setting of the new element atomic. If compareAndSet succeeds, we have successfully inserted a new element. 

If it fails, we need to check if the key inserted by the other threads equals our key. If yes, we are done and can return the value of this element. If not, we need to find a new empty slot and try again.

Here is the source code for the algorithm:

Java


Why Does It Work?

The trick is that if a thread has read a filled array element, the array element will stay filled and the key will stay the same. So, the only time we need to check if another thread has modified an already read value is when we read null. We do this by using compareAndSet as outlined above.

Therefore, our algorithm is possible because we do not allow to remove keys and the keys are immutable.

Resizing

During resizing, we need to make sure that newly added elements do not get lost. So, we need to block the current array for updates during resizing. We do this by setting each empty array element to a special value MOVED_NULL_KEY. If a thread sees such a value, it knows that a resize is running. And will wait with an insert till the resize is done.

The resize of the hash map consists of the following steps:

  1. Set all null values in the current array to the special value MOVED_NULL_KEY.
  2. Create a new array.
  3. Copy all values from the current array to the new array.
  4. Set the current array to the new array.

Here is the source code:

Java


Benchmark Results

The performance of the different HashMaps depends on the specific workload and the quality of the hash function. So, take the following results with a grain of salt. To measure the performance, I call the method computeIfAbsent with a random key using JMH. Here is the source code for the benchmark method:

Java


You can download the source for the benchmark code from GitHub here. Here are the results for Intel Xeon Platinum 8124M CPU @ 3.00GHz with two sockets with eighteen cores per socket each and two hardware threads per core. I used Amazon Corretto 11 as JVM.

Concurrent HashMap performance comparison

The size of the map is similar to the size of ConcurrentHashMap. For five million random keys, java.util.concurrent.ConcurrentHashMap needs 285 megabytes and the new map 253 megabytes.

Differences in Behavior

The new map retries to insert an element if another thread has updated an empty slot. java.util.concurrent.ConcurrentHashMap uses the array bins as monitors for a synchronized block. So only one thread at a time can modify the content of those bins.

This leads to the following difference in behavior: The new map might call the callback method compute multiple times. Every failed update leads to a method call. java.util.concurrent.ConcurrentHashMap on the other side only calls compute at maximum once.

Conclusion

By implementing only the computeIfAbsent method, we can write a concurrent hash map which for writes scales better than java.util.concurrent.ConcurrentHashMap. The algorithm is so easy because we avoided the deletion and the change of keys. I use this map to store classes together with related metadata.

Topics:
concurrency, data structures, hashmap, java, performance, tutorial

Published at DZone with permission of Thomas Krieger , DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}