Call for concurrency benchmarks of JVM languages
Join the DZone community and get the full member experience.
Join For FreeWe live in the era of polyglot programming on JVM. For good (as I believe) or for bad we have now many really nice languages to write for JVM. Clojure, Groovy/Groovy++, JRuby/Mirah, Jython, Scala - just to mention most popular ones in lexicographical order. And of course old good Java, which is still far from being dead despite of all efforts of owners. Very often it is really hard to choose, which language is the best fit for the job.
What is even more interesting and important it that most of these languages suggest some concurrency story. I believe that nobody today needs to be convinced in importance of multicore programming. Just two facts to remind - processor speed stopped to double every 18-24 months but number of cores per processor still grow and 4 cores per desktop/laptop already almost became commodity. So we need to utilize processor power we have, otherwise we just waste money.
But what is the best tool for that? What is right combination of language expressiveness and runtime performance? What language/approach can make concurrency really available not only for gurus but for everybody?
This is really important question. It is so easy to do concurrency wrongly. Not only because of potential deadlocks (which is not such a big deal as most of the people think) but mostly because almost every naive approach kills performance and you still utilize only one core.
Of course, I have my own answer. I believe that Groovy/Groovy++ gives the best combination of expressiveness and performance. But I am 100% sure that Scala people believes the same about Scala and that there belief is based on experience and not just personal preferences. Adepts of Clojure, of course, would say that Closure is the best tool. The same I guess will happen for any language.
How to find the truth?
We have very good toolset of algorithms - java.util.concurrent, different kinds of actors and agents, data flow variables/streams, message channels, blocking and non-blocking queues, persistent data structures, software transactional memory (just to mention few)
Which one should be used for which task? What language is most expressive for particular class of algorithms?
The whole idea of this article came from post of my friend Vaclav Pech on Sieve of Eratosthenes using GPars actors. GPars is brilliant library for Groovy concurrency and the implementation in the post is really elegant. The problem is that actors by itself is very wrong tool for the job. Such code in real application is performance killer. And I don't mean performance of Groovy dynamic dispatch (you can always do it as fast as Java with Groovy++) what I mean is algorithmic performance and right utilization of your processor power.
So, what do I suggest
I invite gurus of different JVM languages to participate in development of set of benchmarks showing different languages/tools applied solving different kinds of concurrent problems using their favorite tool.
To initiate the process I want to suggest fiirst benchmark - prime factorization of integer numbers.
For different number of concurrently running threads we split 2000000 of largest integers and factorize all of them by prime factors.
For benchmarking purposes it is important that we always factorise the same numbers as time complexity of factorization depends on algebraic structure of the number.
Here is sketch of agorithm on Groovy++
def pool = Executors.newFixedThreadPool(256)
for(threads in [1,2,4,8,16,32,64,128,256,128,64,32,16,8,4,2,1]) {
def numbersPerThreads = 2000000 / threads
CountDownLatch cdl = [threads]
println "$threads started"
def start = System.currentTimeMillis()
def sieve = new SieveImpl ()
for(i in 0..<threads) {
pool {
def baseNum = Integer.MAX_VALUE - threads * numbersPerThreads + i * numbersPerThreads
for(j in 0..<numbersPerThreads) {
def number = baseNum + j
def factors = sieve.factor(number)
}
cdl.countDown()
}
}
cdl.await()
println """done in ${System.currentTimeMillis()-start} ms
"""
}
pool.shutdown()
Class SieveImpl mentioned in the code above should be concrete implementation of the following abstract class
abstract class Sieve {
/**
* Returns list of prime factors for given integer
*/
abstract List<Integer> factor(int value)
}
The only requirement for the implementation is that it must use The Sieve of Erathosphenes for checking if number is prime or not. I don't provide implementation because it depends on language and concurrency framework in use. I've experimented with three different aproaches and results are noticably different depending on algorithm in use.
Some one can argue that this benchmark is not good for dynamic languges as it requires a lot of computation. Fortunately, Groovy has Groovy++ and JRuby has Mirah, so calculations itself can be done fastly. But I claim that main thing in the benchmark is concurrent access to shared sieve, which can be done both fast and slow.
Please suggest some other concurrent benchmarks (we definitely need something on message passing) and of course implement this one.
I've also created Google Group and GitHub project (still empty) - please feel free to join.
Thank you for reading and till next time.
Opinions expressed by DZone contributors are their own.
Comments