How come that Groovy++ overperform Java?
Join the DZone community and get the full member experience.
Join For FreeThis article continues my writings on statically typed groovy. More information on motivations and goals of the project can be found in my previous articles published on DZone.
To make a long story short "statically typed Groovy" (aka Static Groovy, aka Groovy Booster, aka Groovy++) is an atempt to combine performace and compile-time checks with the expressiveness of the Groovy programming language and to develop high-performant runtime libraries utilizing this combination.
I started the project as a research/hobby, but due to huge interest in the community my company MB Technologies decided to open source it and sponsor development.
Let me start with a short update of what happened since the last post:
- Public preview of statically typed Groovy is available now at Google Code
- There is discussion group at Google Groups
- The project is not open-sourced yet but announced to be open-sourced soon
Now let's go to our main subject.
We will deal with some non-trivial benchmarks and provide several different implementions. The purpose of our exercise will be to see how the power of expressive langauge and statically typed compilations play together.
The benchmark under investigation is counting word frequency of over 20000 text files of 4-20K each. The benchmark was initiated in blog posts of Zach Cox and continued by Lau B. Jensen.
All code below is available now at Google Code.
Let us start with a pure Groovy solution. I want to use the opportunity and express my gratitude to Paul King, who brought this story to Groovy++ discussion group, provided pure Groovy implementation below and helped a lot with benchmarking.
package org.mbte.groovypp.examples.wordcount
for (i in 0..<10) {
t1 = System.currentTimeMillis()
counts = [:]
new File("./20_newsgroups").eachFileRecurse{ f ->
if (!f.isDirectory() && !f.path.contains(".svn")) {
f.text.toLowerCase().eachMatch(/\w+/) { w ->
def c = counts[w] ?: 0
counts[w] = c + 1 } } }
new File("counts-descreasing-groovy").withWriter { out ->
counts.sort { a, b -> b.value <=> a.value }.each { k, v -> out << "$k\t$v\n" } }
new File("counts-alphabetical-groovy").withWriter { out ->
counts.sort { it.key }.each { k, v -> out << "$k\t$v\n" } }
println "Finished in ${System.currentTimeMillis() - t1} millis"
}
First of all, I think it's nice. Short and clear. Secondly, it does the job, which is always important.
The only problem is it is a little bit slow. On average it takes 51.1sec to process files. Here I should probably say that all benchmarks run on 2 core Mac Book Pro with "-Xmx512m -server". To calculate the average, I take 10 measurements done by the script, drop off highest and lowest number and average the rest.
Let us now see statically typed solution:
@Typed package org.mbte.groovypp.examples.wordcount
for (i in 0..<10) {
def t1 = System.currentTimeMillis()
def counts = [:]
new File("./20_newsgroups").eachFileRecurse{ f ->
if (!f.isDirectory() && !f.path.contains(".svn")) {
f.text.toLowerCase().eachMatch(/\w+/) { w ->
def c = counts[w] ?: 0
counts[w] = c + 1 } } }
new File("counts-descreasing-groovy").withWriter { out ->
counts.sort { a, b -> b.value <=> a.value }.each { k, v -> out << "$k\t$v\n" } }
new File("counts-alphabetical-groovy").withWriter { out ->
counts.sort { it.key }.each { k, v -> out << "$k\t$v\n" } }
println "Finished in ${System.currentTimeMillis() - t1} millis"
}
Truly speaking, the difference with previous script is minor. We just added @Typed annotation in the line 1 and def-keyword in lines 4 and 5.
Amazingly, it was enough to achieve 6 times performance gain. Yes, the statically types script averagely process all our files in 8.6 seconds
Now is a good time to try to utilize multiple cores we have. By the nature of the problem there are two groups of activities and each of them can be run in parallel - accumulation of statistic and generating reports. Let us see what can we do with Groovy++.
Here is the code.
@Typed package org.mbte.groovypp.examples.wordcount
import java.util.concurrent.*
import groovy.util.concurrent.*
for (i in 0..<10) {
def t1 = System.currentTimeMillis()
def pool = Executors.newFixedThreadPool(30)
def counts = new AtomicIntegerMap ()
new File("./20_newsgroups").recurseFileIterator().filter{ file ->
!file.directory && !file.path.contains(".svn")
}.each(pool,4) { file ->
file.charSequence.eachMatch(/\w+/) { String w -> w = w.toLowerCase()
counts[w].incrementAndGet ()
}
}.whenBound {
pool.execute {
new File("counts-descreasing-groovy").withWriter { Writer out ->
counts.asList().sort { a, b -> b.get() <=> a.get() }.each { e -> out << "$e.key\t${e.get()}\n" }
}
} {
new File("counts-alphabetical-groovy").withWriter { Writer out ->
counts.asList().sort { a, b -> b.key <=> a.key }.each { e -> out << "$e.key\t${e.get()}\n" }
}
}
pool.shutdown()
}
pool.awaitTermination(30,TimeUnit.SECONDS)
println "Finished in ${System.currentTimeMillis() - t1} millis"
}
Obviously, it is more lengthy. Fortunately, it is much faster - averagely it processes files in 5.5 seconds.
Here is brief explaination of tools used. We create a thread pool, start with recursive iterator over our files, apply filter to skip unneeded directories and files and then start concurrent iteration of filtered iterator using our thread pool. It is done with each(pool, 4){...} call, which returns immiditely something called BindLater.
BindLater is specialized and high performance version of java.util.concurrent.Future, which also supports adding listeners of calculation completion. We add such listener with whenBound{...} call. When calculation complete the listener starts concurrent execution of report generation. Voila.
Last important thing to notice is AtomicIntegerMap used for accumulation results. AtomicIntegerMap is specialized and high performant concurrent map of AtomicIntegers. It based on the same implementation, which in the past allowed us to noticably increase performance of normal Groovy runtime. Standard library of static Groovy also provide Atomic(Boolean|Long|Reference)Map, which are handy in many situations.
Obviously, there is more place for improvement. Two things, which come to mind immidiately is that we can probably precompile regular expression pattern instead of doing it for each file and we can separate reading of files from accumulating statistics by doing it also in parallel.
Now, we come to the original question of the article - why does Groovy++ overperform Java?
Truly speaking, I don't know. The best average I was able to achive (only once) with non-concurrent Java code was 10.5sec to process all files. I am sure that it has nothing to do with bytecode generation by javac but with a little bit different algorithm used by Java code. As this code wasn't written neither by me nor by Paul and none of us did any performance tuning of it the comparision itself is probably not very correct.
It will be very interesting if someone will be curios enough to do really fastest possible Java implementation.
But does it really matter? Even if statically typed groovy code performs exactly as slow as java one we have 16LOC instead of 65 and this 16LOC is concentrated minds, no boilerplate code.
Thank you for reading and give your own try to Groovy++!
Opinions expressed by DZone contributors are their own.
Comments