Multiverse: Open Source Software Transactional Memory for Java. Interview with creator
Join the DZone community and get the full member experience.
Join For FreeToday we have pleasure to talk with Peter Veentjer creator and lead developer of very interesting open source project called Multiverse
Hi Peter! Thank you for taking time to talk with DZone. Why don't you start with telling us a bit about what Multiverse is
Hi Alex, Multiverse is a Software Transactional Memory implementation for the Java platform I have been working on for the last 18 months. It has 2 important mission statements:
1) To create an STM that seamlessly integrates with the Java language
2) To create an STM that can integrated with other JVM like Scala, Groovy or JRuby
Software Transactional Memory. That sounds cool. What does it do?
Software transactional memory is something new for Java and the idea is that it combines the power of traditional concurrency control, with the ease of programming of databases. With a database you need to worry less about race problems, partial writes and deadlocks, so this makes writing a concurrent system much easier. And the advantage of using more traditional concurrency control is that you have a very high performance because everything can be done in memory (or even within cache), it integrates seamlessly in Java, and that you have advanced features like blocking operations.
With the ever increasing demand of writing concurrent code, programmers need to deal with concurrency control more often. My experience so far is that most developers find this a very complicated subject and often code is filled with all kinds of bugs. Ranging from race problems to more advanced problems like visibility and reordering issues. The idea of STM is to make life easy again for mainstream software developers.
Maybe you want to share with us some piece of code to understand concept a little bit better?
This is a simple example of a bank account:
public class Account{
private int balance;
public Account(int balance){
setBalance(balance);
}
public int getBalance(){ return balance; }
public void setBalance(int newBalance){
if(newBalance < 0){
throw new IllegalStateException("negative balance not allowed");
}
balance = newBalance;
}
}
If this account is used by multiple threads, strange things could start to happen. If 2 threads want to transfer money from the account, both read the balance, remove an amount and write the new balance. If the account has 10 euro and both threads want to transfer 5 euro, the result eventually would be 0 euro. But if this is done concurrently, it could happen that the final result is 5 euro. This is caused by a race problem and that is one of the most hard parts of concurrent programming: figuring out what needs to be locked. The standard solution would be to sprinkle the code with synchronized, eg:
public class Account{
private int balance;
public Account(int balance){
setBalance(balance);
}
public synchronized int getBalance(){
return balance;
}
public synchronized void setBalance(int newBalance){
if(newBalance < 0){
throw new IllegalStateException("negative balance not allowed");
}
balance = newBalance;
}
}
But this still doesn't solve the race problem because it still could happen that the final result is 5 instead of 0. What needs to be done is to acquire a lock on the Account to do a safe transfer:
synchronized(account){
int balance = account.getBalance();
account.setBalance(balance-amount);
}
The problem with this approach is that the account object needs to expose the internal lock. So you need to expose your implementation details. It even gets more complicated when 2 accounts are used and money is transferred from one to another, because the system could get into a deadlock.
OK, so how does STM helps here?
An example explains more than a thousand words
@TransactionalObjectBy adding the annotation "TransactionalObject" the STM will make sure that no race problems can happen. All instance methods of the Account have become atomic, and the cool thing is that they also can be composed without being scared of deadlocks, e.g.:
public class Account{
private int balance;
public Account(int balance){
setBalance(balance);
}
public int getBalance(){return balance;}
public void setBalance(int newBalance){
if(newBalance < 0){
throw new IllegalStateException("negative balance not allowed");
}
balance = newBalance;
}
}
@TransactionalMethod
public static void transfer(Account from, Account to, int amount){
from.setBalance(from.getBalance()-amount);
to.setBalance(to.getBalance()+amount);
}
So as a developer you can work in a declarative mode; you specify what needs to be transactional and the STM will make sure that it is implemented. This is similar to the database programming model where you can define your isolation levels and the database make sure they are not violated. What will happen if one of operations failed? It can happen that an operation fails, for example because a deadlock was detected, or some kind of conflict was encountered. The default behaviour is that the operation is retried. If the instrumentation or the TransactionalTemplate is used, retrying automatically is done for you. The behaviour can be controlled by setting the maximum number of retries (default 1000) and configuring the backoff policy that prevents an overload on the system. Because operations can be retried, it is very important to realise what could happen to non transactional objects/resources being used inside transactions because these resources are not rolled back. So doing IO for example is a good example of something that could be problematic inside a transaction. That sounds too good. What is a price we pay for such goodies? Performance? That really depends. There is an impact in performance because there is no hardware support for transactional memory. So violations needs to be checked in the software so this causes an increase in cpu usage. But there also is an increase in memory usage because more information needs to be tracked. I don't expect that STM is going to replace traditional concurrency control, I see them more as a hybrid between databases and traditional concurrency control. You use Java annotations and told us about code instrumentation, so probably you need some agent or post-compile bytecode preprocessing. How does it work? A very good question. If you want to have a seamless integration in the Java programming language, some form of instrumentation is needed. For the 0.4 release a Javaagent is available, but for the 0.5 release compiletime instrumentation will be provided as well. Javaagents are 'great' for development environments, but you don't want to have them in production. One of the mission statements of Multiverse is to provide an STM implementation that can easily be integrated with other languages, so the actual STM implementation only cares about interfaces, so you can bypass instrumentation completely. And to make it even more easy, I provided a managed reference that doesn't rely on instrumentation. This is the approach being used in the Scala based Akka project of Jonas Boner. This term remind me another question: how would you compare STM with another approaches to concurrency problems, such as async message passing from Erlang or managed references to persistent data structures popularised by Clojure? That is a more tricky question. If you use pure actor based model, there is no shared state (apart from the mailbox), so there is no need for an STM. But if these actors also touch shared state, they will need some form of concurrency control. This is the approach taken in the Akka project. The cool thing about STM (depending on the implementation) is that you get persistent datastructures (persistence not in the sense of durability, but having a consistent view in a transactional without interference of other transactions) for free. Multiverse essentially is a Multi Version Concurrency Control implementation (hence the name), and it can happen that at any given moment multiple versions of the same object are in memory. I think that the Clojure STM shares a lot of features with Multiverse since both a build on similar concepts. The big difference is that Clojure STM is made for Clojure and Multiverse for Java and other JVM based languages. And certain problems are also solved in a different way, for example blocking operations or dealing with isolation anomalies like the writeskew problem. I expect that in the future we will see more pollination. Do people develop real applications with Multiverse? Please tell us a bit about community around the project At the moment there is a small set of committers on Multiverse. However it is being used in the Akka project of Jonas Boner so I keep spamming his maillist as well. The 0.4 release of Multiverse is the first step to move from a 'cool pet project' to a usable open source product. The site has been completely replaced, a big manual is written and the goal is to make more noise so other developers start to realise that concurrent programming doesn't need to be as hard. You do uncommonly interesting things. Tell us a little bit about your background please. I am in the Java business for more than 10 years and I have worked with various companies under different roles. Started as a developer, did consultancy for a few years where I helped clients with all kinds of problems (performance, concurrency, databases) and 5 months ago I started to work as a hands on architect for a startup that is building a distributed application environment. Unfortunately I can't go in much detail about that. From the beginning I liked complex but explainable problems. I started with expertsystems and Prolog compilers (written a few as well), and 5 years ago I started to focus on everything related to concurrency: databases, normal concurrency control, application architecture and since 18 months I have been working on Multiverse. What should we expect? What is next major milestone for Multiverse? Multiverse 0.5 is expected in 10 weeks and will get
more transactional implementations of the java collections framework:TransactionalTreeMap, TransactionalTreeSet, TransactionalArray and TransactionalArrayList
compiletime instrumentation
annotations that work on methods of interfaces; normally they don't inherit but I'll make sure they do. Transactional settings are part of the interface and not part of the implementation.
a lot of performance optimizations in the instrumentation
and if I have enough time I'll also add commuting operations. This makes it possible to create better scaling transactional datastructures. This is very important for the transactional collection implementations in Multiverse. A simple example would the size field of a collection. If you take a linked blocking queue with independent head and tail, it is possible to put and take items concurrently. But the size field of the (transactional) linked blocking queue is going to cause a conflict when one of the transactions commits.
The longer term goals are:
provide a transparent persistence solution. So no need to deal with setting up a database, caching, or mapping, sql queries etc.
distributed transactional objects and distributed transactions. At the moment I'm inspecting Terracotta and see if that is a good fit. The idea is that I'm going to bypass their original instrumentation completely (so no need to deal with a bootclasspath) and hook up the the internal terracotta api's directly.
profiler support so you have feedback on performance problems (contention, deadlocks, livelocks etc). Information is going to be available on transaction level (so you can see which transactions are causing problems) but also on class level
inferencing optimal settings on transactions.
pessimistic locking and contention management to provide fairness guarantees. This is very important for longer running transactions because without contention management, the chance of successfully committing, decreases if the duration of the transaction increases.
Where can our curious readers find more infromation about Multiverse?
The multiverse website can be found at http://multiverse.codehaus.org
The mailinglist can be found at http://groups.google.com/group/googlemultiverse
But if you have any questions about how to use Multiverse or how Multiverse can be integrated with your language/platform, you can always send me an email directly at alarmnummer at gmail dot com.
Peter, thanks a lot for your time and we will keep eye on your really cool project
Opinions expressed by DZone contributors are their own.
Comments