Non-Blocking SQL in Scala
The JDBC API is helpful, but it's not very concurrence-friendly. But there are drivers and techniques, such as Fork/Joins in Scala, that can help your IO.
Join the DZone community and get the full member experience.
Join For Free
To be reactive, according to The Reactive Manifesto, you have to be Responsive, Resilient, Elastic, and Message Driven. The last criteria in this list caused big movement into asynchronous way of communications. This includes asynchronous RPC and messaging libraries, database drivers, and more. RDBMs are quite powerful and useful. The official instruments for database access on JVM provided by database vendors are drivers implementing JDBC API. This is true for the Scala world as well. But JDBC is designed to be blocking and consumes threads per database call. You will not find in the API itself methods or interfaces allowing you to get query results in another thread, and they don’t wait for database responses. The question is what shall we do when we are building applications on top of a SQL database with reactivity in mind.
Threads Overhead
Non-blocking communication allows recipients to only consume resources while active, leading to less system overhead.- The Reactive Manifesto
CPUs can switch to other threads while some thread is waiting for a database response. This is definitely true and scales to some extent. But after a while, overhead from each thread starts hurting. The more threads you have, the less performant you become. On the other hand, there is a database suffering from a similar issue in regard to the number of connections. Large numbers of connections doesn’t help scale for the same reason as large numbers of threads. Check this awesome explanation why. The rule of thumb is to keep the number of threads and connections small on either side. But if we do so and keep blocking IO inside threads, this will lead to the CPU being idle because all the threads quickly become blocked. So blocking IO inside the thread doesn’t scale well.
Green Threads Approach
Another reason why we should not block is the green thread approach. Green threads emulate multithreaded environments without using OS threads directly. Starting a green thread is faster and cheaper than starting a native OS thread. Switching from one green thread to another is much faster than switching between OS threads. Green threads, under the hood rely, on a small number of OS threads. And if you block — you block not a green thread, but the underlying OS thread. This means you cannot benefit from the green thread approach if you do a lot of blocking calls.
Fork/Join in Scala
In July 2011, Java SE 7 introduced the Fork/Join framework for concurrent execution of lightweight, non-blocking tasks as a kind of green thread approach. In 2012, Scala brought Fork/Join on board together with Future abstraction support. This changes the rules of concurrency in Scala. ForkJoinPool is now used as the default ExecutionContext, and blocking calls are recommended to be avoided. If that's not possible, wrap them in a blocking{} construct, and if there are too many of those, the calls should be executed on another ExecutionContext backed by a dedicated thread pool. As we know, blocking calls don’t allow ForkJoinPool to let it do it’s magic (which is explained here by the author of Fork/Join in Java) As for how this can change your performance, I think one of the most impressive examples are the results shared by the Akka team.
Avoid Blocking IO in Global ExecutionContext
Avoiding blocking IO in the main ExecutionContext is a high-throughput recipe for Scala in general and frameworks like Akka or Finagle. And main reasons for this in my opinion are:
- Blocking in threads doesn’t not scale to high numbers.
- Blocking breaks the Fork/Join magic.
Non-Blocking SQL Drivers
The solution to avoid blocking IO is non-blocking IO. The non-blocking approach allows you to release the thread while IO is in progress and execute a callback when IO is finished. This allows you to reduce the number of threads close to the number of CPU cores and multiple IO requests in fewer connections. And this is definitely not a new technology. J2SE 1.4 introduced NIO as non-blocking IO enabler for JVM in early 2002.
To access a SQL database on JVM, you don’t necessary need blocking JDBC. This is the idea behind a number of asynchronous database drivers for JVM that don’t follow JDBC's specs. Some examples include here and here.
However, you should note that none of these drivers is officially supported by database vendors yet. Also, I was not able to find any benchmarks to compare these drivers with JDBC implementations in terms of performance.
“Non-Blocking” JDBC
We know that the JDBC API is blocking. But nobody prevents us from implementing this idea on top of JBDC. We could wrap a database call into a Future with dedicated ExecutionContext for blocking calls. The threads allocated in ExecutionContext for blocking calls should be set equal to the number of connections in the connection pool. We don’t need more. This will allow us to reduce the overall number of threads and will let the CPU serve non-blocking tasks in the main ExecutionContext while waiting for database responses. An additional benefit of separate ExecutionContext for blocking calls is better resiliency and failure isolation. You get an additional layer of protection against unexpected latencies and query or socket timeouts.
However, there are difficulties with this approach. The main one is transaction management. In JDBC, transactions are possible only within a single java.sql.Connection. To make several operations in one transaction, they have to share a connection. If we want to make some calculations in between them, we have to keep the connection. This is not very effective, as we keep a limited number of connections idle while doing calculations in between. Another issue is that the java.sql.Connection documentation does not define thread-safety requirements. That means you should not compose your database operations in Futures and let them share the connection to run transactions unless your particular java.sql.Connection implementation usage is thread safe.
This idea of an asynchronous JDBC wrapper is implemented in Slick 3, where the only available API is asynchronous with dedicated ExecutionContext for blocking calls. But nobody prevents you from using this approach on top of synchronous Scala JDBC wrappers and implementing asynchrony by yourself.
Finally, non-blocking JDBC may come along on the Java roadmap. As it was announced at JavaOne in September 2016, and it is possible that we will see it in Java 10.
Design Impact
Non-blocking on the database access level influences your design significantly. To be fully non-blocking, you have to make your whole execution cycle as a set of functional compositions around concurrent structures, such as Futures. Changing existent blocking applications to non-blocking database access might result in complete refactoring. You must be sure that this effort is really required and which option of non-blocking implementation you need.
Published at DZone with permission of Grygoriy Gonchar, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Trending
-
Design Patterns for Microservices: Ambassador, Anti-Corruption Layer, and Backends for Frontends
-
Auto-Scaling Kinesis Data Streams Applications on Kubernetes
-
Cypress Tutorial: A Comprehensive Guide With Examples and Best Practices
-
Micro Frontends on Monorepo With Remote State Management
Comments