Is Spring Reactive Already Obsolete? A Look at Inversion of Thread Coupling
Is Spring Reactive already obsolete? Let's take a look.
Join the DZone community and get the full member experience.
Join For FreeBeyond Spring's dependency injection that only solves 1/5 of the Inversion of Control problem, Spring Reactive bases itself on the event loop. While there are other popular event loop driven solutions (NodeJS, Nginx), the single threaded event loop is a pendulum swing in the other direction from thread-per-request (thread pools). With event loops competing against thread-per-request, is there not some pattern that underlies both of them? Well, actually, yes!
But before getting to this, let's look at the issues regarding event loops and thread-per-request. If you are more interested in the solution, you can skip the next two sections.
Thread Coupling Problems
Event Loop
First of all, "thread coupling"? Why is this a concern? Well, for event loops, the single threaded nature requires all I/O to be undertaken asynchronously. Should a database or HTTP call need to block, it will block the single event loop thread and hold up the system. This restriction is, in itself, a big coupling problem, as to go Reactive, all your I/O is coupled to now go asynchronous. This means there are no more ORMs, like JPA, to make access to databases easier (as JPA requires blocking database calls). Yep, something that used to remove 40-60 percent of the boilerplate code in applications is now not usable (enjoy writing this all over again!)
Beyond the restrictive I/O in your decision to use Reactive patterns, the ability to use multiple processors is restricted as there is only one thread. OK, instances of the Reactive engine are duplicated to each CPU; however, they can not share state. The multi-threaded implications of sharing state between two event loops are difficult. Reactive programming is hard enough — let alone adding multi-threading into it. Yes, communication between event loops can be via events. However, using this to keep duplicated copies of shared state in sync across event loops creates problems that are just avoided. Basically, you are told to design your Reactive systems to avoid this with immutability.
Therefore, you are stuck coupled to the one thread. So what? Well if you have computationally expensive operations, such as security cryptography (JWT), it creates scheduling problems. By being on a single thread, this operation must be completed before anything else can be undertaken. With multiple threads, other threads can be time sliced in by the operating system to progress other less CPU intensive requests. However, you only have the one thread so all that lovely operating system thread scheduling is now lost. You're stuck waiting for the expensive CPU intensive operations to complete before servicing anything else.
Oh please, just ignore these problems! We, developers, like performance. Reactive is all in the aim of greater performance and improved scalability. Lesser threads allow reduced overheads to allow improved throughput. Ok, yes, I'll have better-performing production systems potentially reducing hardware costs. However, it's going to be a lot slower to build and enhance that production system due to coupling restrictions that come from single-threaded event loops. Not to mention, having to rewrite algorithms to not hog the CPU. Given the scarcity of developers compared to the overabundant supply of cloud hardware, arguing about costs of scale may only be for those rare significantly large systems.
We do lose a lot going Reactive. This is, possibly, to the point that we have not thought it through enough. Hence, this is possibly why Reactive frameworks warn against changing to it wholesale. They usually indicate Reactive patterns only work for smaller less complicated systems.
Thread-per-Request (Thread Pools)
On the flip side, thread-per-request patterns (such as Servlet 2.x) use thread pools to handle scale. They assign a thread to service the request and scale out by having multiple (typically pooled) threads.
We can probably read many articles touting Reactive over the scale limitations of thread-per-request, but the main issue with thread-per-request is not actually in performance nor scale. The issue with thread-per-request is a lot more permissive to your application and can actually pollute your whole architecture.
To see this problem, just look at invoking a method:
Response result = object.method(paramOne, paramTwo);
Should the implementation of the method be as follows:
@Inject Connection connection;
@Inject HttpClient client;
public Result method(Long identifier) {
ResultSet resultSet = connection.createStatement()
.executeQuery("<some SQL> where id = " + identifier);
resultSet.next();
String databaseValue = resultSet.getString("value");
HttpResponse response = client.send("<some URL>/" + databaseValue);
return new Result(response.getEntity());
}
This creates a coupling problem to the thread of the request that can pollute out to your whole architecture. Yes, you've just placed a coupling on the request thread out to your other systems.
While the database call is synchronous, the HTTP call is also forcing the downstream system to respond synchronously. We can't change the HTTP call to be asynchronous, because the request thread wants to continue with a result to return from the method. This synchronous coupling to the request thread not only limits the call but also limits the downstream system to have to provide a synchronous response. Hence, the thread-per-request thread coupling can pollute out to your other systems and possibly across your entire architecture. No wonder the REST micro-service pattern of synchronous HTTP calls are so popular! It is a pattern that forces itself top down on your system. Sounds like thread-per-request and Reactive share this same opinion on forcing everything top down to support themselves.
Threading to Support I/O
In summary, the problems are as follows.
Single threaded event loops:
couple you to asynchronous communication only (simple JPA code is no longer available)
avoids multi-threading, as two threads executing events from the event queue would create considerable synchronization problems (likely slowing solution and causing concurrency bugs that are hard to code against for the best of developers)
lose the advantage of the thread scheduling that operating systems have spent considerable effort optimizing
While thread-per-request solutions:
couples you to synchronous communication only (as the result is expected immediately; and not sometime later via callback)
have higher overheads (to single thread event loops) due to managing more threads and therefore less scalable
The pendulum swing between thread pools and Reactive single threaded can actually be considered going from synchronous communication (thread-per-request) to asynchronous communication (single threaded event loops). The remaining problems are actually implementation constraints of a threading model built specifically to support each type of communication. Plus, given the coupling on downstream systems that synchronous communication poses, this pendulum swing to asynchronous communication is not all a bad thing.
So the question is: why are we forced to choose only one communication style? Why can't we use synchronous and asynchronous communication styles together?
Well, we can't put asynchronous calls inside synchronous method calls. There is no opportunity for callbacks. Yes, we can block waiting on the callback, but Reactive will consider itself superior in scale due to additional threading overheads involved in this. Therefore, we need asynchronous code to allow synchronous calls.
However, we can't put synchronous calls inside event loops, as it halts the event loop thread. Hence, we need extra threads to undertake the synchronous calls to allow the event loop thread to carry on with other events.
Reactive has the answer. Use a Scheduler
:
Mono blockingWrapper = Mono.fromCallable(() -> {
return /* make a remote synchronous call */
}).subscribeOn(Schedulers.elastic());
NOTE: Code is taken from http://projectreactor.io/docs/core/release/reference/#faq.wrap-blocking.
Yay, now we can do synchronous calls within the event loop. Problem solved (well sort of).
Well, it's sorted if you can trust that you properly wrapped all synchronous calls in Callable
s. Get one wrong, and well, you are blocking your event loop thread and halting your application. At least in multi-threaded applications, only the particular request suffered, not the whole application.
This seems, to me anyway, more a workaround than an actual solution to the problem. Oh wait, everything needs to be Reactive top down so that solves this problem. Just don't do blocking calls and change all your drivers and your whole technology stack to Reactive. The whole "change everything to suit us, in a way that only integrates with us" seems very close to technology vendor lock-in — in my opinion anyway.
Therefore, can we consider a solution that allows synchronous calls and does not rely so heavily on the developer getting it right? Why, yes!
Inverting the Thread Coupling
The asynchronous communication driven Reactive single-threaded event loop (excuse the mouth full) is identified as the right solution. Synchronous communication is solved by developers using Scheduler
s. In both cases, the Reactive functions are run with a thread dictated for them:
asynchronous functions are executed with the thread of the event loop
synchronous functions executed with thread from the
Scheduler
The control of the function's executing thread is heavily dependent on the developer getting it right. The developer has enough on their plate focusing on building code to meet feature requirements. Now, the developer is intimately involved in the threading of the application (something thread-per-request always somewhat abstracted away from the developer). This intimacy to threading significantly increases the learning curve for building anything Reactive. Plus, it will have the developer lose a lot of hair when they pull it out at 2 am, trying to get the code working for that deadline or production fix.
So, can we remove the developer from having to get the threading right? Or, more importantly, where do we give control of selecting the thread?
Let's look at a simple event loop:
public interface AsynchronousFunction {
void run();
}
public void eventLoop() {
for (;;) {
AsynchronousFunction function = getNextFunction();
function.run();
}
}
Well, the only thing we can target for control is the asynchronous function itself. Using an Executor
to specify the thread, we can enhance the event loop as follows:
public interface AsynchronousFunction {
Executor getExecutor();
void run();
}
public void eventLoop() {
for (;;) {
AsynchronousFunction function = getNextFunction();
function.getExecutor().execute(() -> function.run());
}
}
This now allows the asynchronous function to specify its required threading, as:
using the event loop thread is via a synchronous
Executor
:getExecutor() { return (runnable) -> runnable.run(); }
using a separate thread for synchronous calls is via
Executor
backed by thread pool:getExecutor() { return Executors.newCachedThreadPool(); }
Control is inverted so that the developer is no longer responsible for specifying the thread. The function now specifies the thread for executing itself.
But how do we associate an Executor
to a function?
We use the ManagedFunction of Inversion of Control:
public interface ManagedFunction {
void run();
}
public class ManagedFunctionImpl
implements ManagedFunction, AynchronousFunction {
@Inject P1 p1;
@Inject P2 p2;
@Inject Executor executor;
@Override
public void run() {
executor.execute(() -> implementation(p1, p2));
}
private void implementation(P1 p1, P2 p2) {
// Use injected objects for functionality
}
}
Note that only the relevant ManagedFunction
details have been included. Please see Inversion of (Coupling) Control for more details of the ManagedFunction
.
By using the ManagedFunction
, we can associate an Executor
to each function for the enhanced event loop. (Actually, we can go back to the original event loop, as the Executor
is encapsulated within the ManagedFunction
).
So now, the developer is no longer required to use Scheduler
s, as the ManagedFunction
takes care of which thread to use for executing the function's logic.
But this just moves the problem of the developer getting it right from code to configuration. How can we make it possible to reduce developer error in specifying the correct thread (Executor
) for the function?
Deciding the Executing Thread
One property of the ManagedFunction
is that all objects are dependency-injected. Unless dependency-injected, there are no references to other aspects of the system (and static references are highly discouraged). Hence, the dependency injection meta-data of the ManagedFunction
provides details of all the objects used by the ManagedFunction
.
Knowing the objects used by a function helps in determing the asynchronous/synchronous nature of the function. To use JPA with the database, a Connection
(or DataSource
) object is required. To make synchronous calls to micro-services, a HttpClient
object is required. Should none of these be required by the ManagedFunction
, it is likely safe to consider no blocking communication being undertaken. In other words, if the ManagedFunction
does not have a HttpClient
injected, it can't make HttpClient
synchronous blocking calls. The ManagedFunction
is, therefore, safe to be executed by the event loop thread and not halt the whole application.
We can, therefore, identify a set of dependencies that indicate if the ManagedFunction
requires execution by a separate thread pool. As we know, in regards to all dependencies in the system, we can categorize them as asynchronous/synchronous. Or, more appropriately, whether the dependency is safe to use on the event loop thread, if the dependency is not safe, then the ManagedFunction
s requiring that dependency are executed by a separate thread pool. But what thread pool?
Do we just use a single thread pool? Well, Reactive Scheduler
s give the flexibility to use/ re-use varying thread pools for the various functions involving blocking calls. Hence, we need similar flexibility in using multiple thread pools.
We use multiple thread pools by mapping thread pools to dependencies. OK, this is a little bit to get your head around. So, let's illustrate with an example:
public class ManagedFunctionOne implements ManagedFunction {
// No dependencies
// ... remaining omitted for brevity
}
public class ManagedFunctionTwo implements ManagedFunction {
@Inject InMemoryCache cache;
// ...
}
public class ManagedFunctionThree implements ManagedFunction {
@Inject HttpClient client;
// ...
}
public class ManagedFunctionFour implements ManagedFunction {
@Inject EntityManager entityManager;
// meta-data also indicates transitive dependency on Connection
// ...
}
Now, we have the thread configuration as follows:
Dependency | Thread Pool |
HttpClient | Thread Pool One |
Connection | Thread Pool Two |
We then use the dependencies to map ManagedFunction
s to Thread Pools:
ManagedFunction | Dependency | Executor |
ManagedFunctionOne, ManagedFunctionTwo |
(none in thread pool table) | Event Loop Thread |
ManagedFunctionThree | HttpClient | Thread Pool One |
ManagedFunctionFour | Connection (as transitive dependency of EntityManager) | Thread Pool Two |
The decision of the thread pool (Executor
) to use for the ManagedFunction
is now just mapping configuration. Should a dependency invoke blocking calls, it is added to the thread pool mappings. The ManagedFunction
using this dependency will no longer be executed on the event thread loop, avoiding the application halting.
Furthermore, the likelihood of missing blocking calls is significantly reduced. As it is relatively easy to categorise the dependencies, it leaves less chance of missing blocking calls. Plus, if a dependency is missed, it is only a configuration change to the thread pool mappings. It is fixed without code changes. Something especially useful as the application grows and evolves. This is unlike Reactive Scheduler
s that require code changes and significant thought by the developer.
As the executing thread to execute a ManagedFunction
is now controlled by the framework (not the application code), it effectively inverts control of the executing thread. No longer does the developer code threading. The framework configures it based on the dependency characteristics of the ManagedFunction
s.
OfficeFloor
This is all good in theory, but show me the working code!
OfficeFloor is an implementation of the inversion of thread control patterns discussed in this article. We find frameworks are too rigid with their threading models that cause workarounds, such as Reactive Scheduler
s. We are looking for the underlying patterns to create a framework that does not require such workarounds. Code examples can be found in the tutorials, and we value all feedback.
Note that while OfficeFloor follows inversion of thread control, it's actual threading model is more complex to take other aspects into consideration (e.g. dependency context, mutating state, thread locals, thread affinity, back pressure, and reduced locking to increase performance). These, however, are topics for other articles. But, as this article highlights, the threading for OfficeFloor applications is a simple configuration file based on dependency mappings.
Conclusion
Inversion of control for the thread allows the function to specify it's own thread. As the thread is controlled by the injected Executor
, this pattern is named Thread Injection
. By allowing the injection, the choice of thread is determined by configuration rather than code. This relieves the developer of the potentially error-prone, buggy task of coding threading into applications.
The side benefit of Thread Injection
is that thread mapping configurations can be tailored to the machine the application is running on. On a machine with many CPUs, more thread pools can be configured to take advantage of thread scheduling by the operating system. On smaller machines (e.g. embedded), there can be more re-use of thread pools (potentially even none for single-purpose applications that can tolerate blocking to keep thread counts down). This would involve no code changes to your application, just configuration changes.
Furthermore, computationally expensive functions that may tie up the event loop can also be moved to a separate thread pool. Just add in a dependency for this computation to the thread pool mappings and all ManagedFunction
s undertaking the computation are now not holding up the event loop thread. The flexibility of Thread Injection is beyond just supporting synchronous/asynchronous communication.
As Thread Injection is all driven from the configuration, it does not require code changes. It actually does not require any threading coding by the developer at all. This is something Reactive Scheduler
s are incapable of providing.
So the question is: do you want to tie yourself to the single-threaded event loop that really is just a single purpose implementation for asynchronous I/O? Or, do you want to use something a lot more flexible?
Published at DZone with permission of Daniel Sagenschneider. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments