How to Write Thread-Safe-Yet-Scalable Classes
The main issue with thread-safe, scalable classes is being able to separate data into multiple independent parts.
Join the DZone community and get the full member experience.Join For Free
When writing thread-safe classes, the main issue is to separate the data into multiple independent parts — and to choose the right size for those parts. If the part is too small, our class is not thread-safe. If the part is too large, the class is not scalable.
You may also like: 7 Techniques for Thread-Safe Classes
Let us look at an example that further illustrates this scenario:
Suppose we want to track how many people live in a city. We want to support two methods, one to get the current count of people living in a city and one to move a person from one city to another. So we have the following interface:
You can download the source code of all examples from GitHub here.
Since we want to use this interface from multiple threads in parallel we have to options to implement this interface. Either use the class
java.util.concurrent. ConcurrentHashMap or uses the class
java.util.HashMap and a single lock. Here is the implementation using the class
The method move uses the thread-safe method compute to decrement the count in the source city. Then, compute is used to increment the count in the target city. The count method uses the thread-safe method
And here is the implementation using the class
move also uses the method
compute to increment and decrement the count in the source and target city. Only this time, since the
compute method is not thread-safe, both methods are surrounded by a synchronized block. The
count method uses the
get method again surrounded by a synchronized block.
Both solutions are thread-safe.
But in the solution using
ConcurrentHashMap, multiple cities can be updated from different threads in parallel. And in the solution using a
HashMap, since the lock is around the complete
HashMap, only one thread can update the
HashMap at a given time. So, the solution using
ConcurrentHashMap should be more scalable. Let us see.
To compare the scalability of the two implementations, I use the following benchmark:
The benchmark uses jmh, an OpenJDK framework for micro-benchmarks. In the benchmark, I move people from one city to another. Each worker thread updates different cities. The name of the source city is simply the thread id and the target city the thread id plus two. I ran the benchmark on an Intel i5 4 core CPU with these results:
As we can see, the solution using
ConcurrentHashMap scales better. Starting with two threads, it performs better than the solution using a single lock.
Now I want an additional method to get the complete count overall cities. Here is this method for the implementation using the class
To see if this solution is thread-safe, I use the following test:
I need two threads to test if the method
completeCount is thread-safe. In one thread, I move one person from Springfield to South Park. In the other thread, I get the
completeCount and check if the result equals the expected result.
To test all thread interleavings, we put the complete test in a while loop iterating over all thread interleavings using the class
AllInterleavings from vmlens, line 7. Running the test, I see the following error:
The vmlens report shows what went wrong:
As we see, the problem is that the calculation of the complete count is done while the other thread still moves a person from Springfield to South Park. The decrement for Springfield was already executed but not the increment for South Park.
By allowing the parallel update of different cities, the combination between
move leads to the wrong results. If we have methods, which operate over all cities, we need to lock all cities during this method. So, to support such a method, we need to use the second solution using a single lock. For this solution, we can implement a thread-safe
countComplete method, as shown below:
This simple example surely does not reflect the complexity of your data structure. But what is true in the example is also true in the real world. There is no way to update multiple dependent fields in a thread-safe way except to update them one thread after the other.
There is no way to update multiple dependent fields in a thread-safe way except to update them one thread after the other.
So, the only way to achieve scalability and thread safety is to find independent parts in your data. And then, update them in parallel from multiple threads.
Published at DZone with permission of Thomas Krieger, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.