A New Concurrent HashMap
A New Concurrent HashMap
Check out a developer's simplified implementation of Java's Concurrent HashMap that is more efficient for writes.
Join the DZone community and get the full member experience.Join For Free
The following shows an algorithm for a simple concurrent HashMap, which, for writes, scales better than
java. is fast, sophisticated, and rather complicated. And I needed a similar data structure for vmlens, a tool to test concurrent Java.
I cannot use
java., since I also want to trace the internals of
java. with this tool.
So, I decided to implement only the bare minimum, the method
computeIfAbsent. This led to a simple, yet scalable concurrent HashMap.
You can download the source code it from GitHub here.
The algorithm uses open addressing with linear probing. It is based on work from Jeff Preshing, which again is based on work from Dr. Cliff Click. This GitHub repository contains the source code of the HashMap from Dr. Cliff Click.
Open addressing means that the key-value pairs are stored in a single array. The index to store a key-value pair is given by the hash code modulo the array size.
Linear probing means that if this array element is already occupied, we use the next array element. And so on, until we have found an empty slot. Here is an example for an insert after two filled slots:
To make the algorithm concurrent, we need to fill the empty array element using
compareAndSet. By using
compareAndSet, we make the checking for null and the setting of the new element atomic. If
compareAndSet succeeds, we have successfully inserted a new element.
If it fails, we need to check if the key inserted by the other threads equals our key. If yes, we are done and can return the value of this element. If not, we need to find a new empty slot and try again.
Here is the source code for the algorithm:
Why Does It Work?
The trick is that if a thread has read a filled array element, the array element will stay filled and the key will stay the same. So, the only time we need to check if another thread has modified an already read value is when we read null. We do this by using
compareAndSet as outlined above.
Therefore, our algorithm is possible because we do not allow to remove keys and the keys are immutable.
During resizing, we need to make sure that newly added elements do not get lost. So, we need to block the current array for updates during resizing. We do this by setting each empty array element to a special value
MOVED_NULL_KEY. If a thread sees such a value, it knows that a resize is running. And will wait with an insert till the resize is done.
The resize of the hash map consists of the following steps:
- Set all null values in the current array to the special value
- Create a new array.
- Copy all values from the current array to the new array.
- Set the current array to the new array.
Here is the source code:
The performance of the different HashMaps depends on the specific workload and the quality of the hash function. So, take the following results with a grain of salt. To measure the performance, I call the method
computeIfAbsent with a random key using JMH. Here is the source code for the benchmark method:
You can download the source for the benchmark code from GitHub here. Here are the results for Intel Xeon Platinum 8124M CPU @ 3.00GHz with two sockets with eighteen cores per socket each and two hardware threads per core. I used Amazon Corretto 11 as JVM.
The size of the map is similar to the size of
ConcurrentHashMap. For five million random keys,
java. needs 285 megabytes and the new map 253 megabytes.
The new map retries to insert an element if another thread has updated an empty slot.
java. uses the array bins as monitors for a synchronized block. So only one thread at a time can modify the content of those bins.
This leads to the following difference in behavior: The new map might call the callback method compute multiple times. Every failed update leads to a method call.
java. on the other side only calls compute at maximum once.
By implementing only the computeIfAbsent method, we can write a concurrent hash map which for writes scales better than
java.. The algorithm is so easy because we avoided the deletion and the change of keys. I use this map to store classes together with related metadata.
Published at DZone with permission of Thomas Krieger , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.