JVM Thread Pooling Trends
Multithreading has always been an area of interest for most developers. They have been trying hard to find out the most optimal strategy to solve this problem. In the past, various attempts have been made to standardize such solutions. Especially with the rise of new problem domains like Big Data, real time analytics, etc. new challenges have been introduced. One of the steps taken in this direction was work (great work) by “Doug Lea”, available to us in form of concurrency framework (JSR 166).
Now we have started distinguishing between concurrency and parallelism. These are different strategies and a number of frameworks are available in the market, which enable us to achieve the same. While making such choices, we can benefit a lot if we also know about their internal implementation details. In this article we will explore some of the well-established options available for thread pooling/sharing in the JVM. Also, with the availability of multicore processors new issues have crept up. Developers have started to think and exploit “mechanical sympathy” to gain performance from superior hardware.
In my opinion, following are the main mechanisms that are currently wide spread when we start discussing thread pooling:
1. Thread pools available in Executor framework
2. Ring Buffer concept by LMAX
3. Actor (event) based implementations
Pool options under Concurrency framework:
First of all, I would personally disagree with the widespread term threadpool and instead would term it as a worker queue. In a nutshell, all different kinds of pooling options available in an executor framework are based on some kind of sequential data structure like array or queue (blocking or non-blocking) for example ConcurrentLinkedQueue, ArrayBlockingQueue, LinkedBlockingQueue, etc. Their documentation reveals that they are meant to be used under different circumstances but their underlying fact/data structure has the same property i.e., sequential insertion and traversal.
a. Delay introduced in thread creation is reduced.
b. By proper tuning of thread count, we can address resource thrashing.
These can be used in rendering applications and server applications to improve response time. Using thread pool might seem as acceptable solution but these suffer from a fundamental flaw i.e., sequential contention. Here is a good discussion on some pooling options available under concurrency frameworks in Java.
Disruptor (Ring Buffer):
Developers at LMAX tried to address this issue of sequential contention with a disruptor framework based upon a data structure called the “ring buffer”. It is a way of sending messages between threads in the most efficient manner possible. It can be used as an alternative to a queue, but it also shares a number of features with SEDA and Actors. Putting messages into the Disruptor is a 2-phase process, first a slot is claimed in the ring buffer, which provides the user with the Entry that can be filled with the appropriate data. Then the entry must be committed, this 2-phase approach is necessary to allow for the flexible use of memory mentioned above. It is the commit that makes the message visible to the consumer threads. The figure below depicts the data structure ring buffer (core of disruptor).
Disruptor achieves very low latency and high throughput on multicore platforms even if threads are sharing data and passing across messages.
What makes it so unique is its lock and contention free architecture. It doesn’t even employ CAS or memory barriers. For more details about this, here is a good article and the official website. One of the shortcomings (not actually a drawback) of using Disruptor is, you need to upfront tell Disruptor about the approximate number of threads the application will need to complete the task.
One of the powerful alternatives of traditional thread pooling mechanisms is based on the event model. Event based thread polling/pooling/scheduling mechanism is very common in functional programming. One of the very popular implementation of this concept is the actor based systems, where “Akka” has become the de-facto standard.
Actors are very lightweight concurrent entities. They process messages asynchronously using an event-driven receive loop. Pattern matching against messages is a convenient way to express an actor's behavior. They raise the abstraction level and make it much easier to write, test, understand and maintain concurrent and/or distributed systems. You focus on workflow—how the messages flow in the system—instead of low level primitives like threads, locks and socket IO. One thread can be assigned multiple or single actor only, both models can be exploited as per choice.
Some of the benefits of using actor based systems like Akka are:
· Configurable execution
· Location transparency
· Re-try mechanism
Note: Debugging an actor based system could be challenging.
The Disruptor uses a 1 thread - 1 consumer model, where Actors use an N:M model i.e. you can have as many actors as you like and they will be distributed across a fixed numbers of threads (generally 1 per core), otherwise the actor model is very close to disruptor; especially when used for batch processing.
Also, my initial search on the internet revealed that there are contributions in open source space towards benchmarking such options on the JVM. One such option is ExecutorBenchmark.It is an open-source test framework for parallelizable tasks. It is written in Scala, and can be used with Java and Scala loads.
In a nutshell, the evolving software and hardware industry has presented us new challenges but also has given us a wide range of solutions to make our applications more responsive and fault-resistant. For unpredictable and small numbers of threads I recommend the use of pooling mechanisms available in concurrency framework (part of JDK). For a large number of similar sized tasks, I recommend use of Disruptor. Disruptor has a slight learning curve but the benefits gained in terms of performance and scalability over run the cost of time invested. In case your application requires some sort of re-try mechanism or supervision and distributed nature of tasks, I recommend use of the “Actor” model (Akka). Although decisions might also be influenced by other factors like, for a distributed application you might choose something like map reduce or fork/join model or some custom implementation.