Groovy++ in action: how to make $5000 in one hour
Join the DZone community and get the full member experience.
Join For FreeThere are two dimensions of this slowness. First of all being fully dynamic language Groovy makes every method invocation (and any operation in reality) dispached dynamicly. It means we need to check what is the class of an instance, to find meta class, to see types of parameters, to find method most specific for these parameters and finally make invocation itself. BTW, the invocation itself can also involve some on-the-fly code generation. The second dimension is somehow derived from the first one. In mult-threaded environment some of this checks requires volatile reads/writes (for each method call etc.), which slows things down noticeably.
Of course Groovy Core team, which I have honor to belong to, does everything possible to speed things up. We did very speed-up serious job in Groovy 1.5 & 1.6, which was a bit improved in 1.7, and right now extremely promising development going on in Groovy 1.8 brunch (beta4 was released just few days ago).
Nevertheless, as of today Groovy can not compete with Java in performance. My expirience with Groovy 1.7.x is that it is 4 to 10 times slower than Java code. You can try https://github.com/alextkachman/fib-benchmark project, which runs the same benchmarks against Groovy 1.7/1.8, Java and Groovy++.
The situation is really pity - we have buitiful general purpose language, which is slow only because dynamic dispatch - extremely important feature but very specific one - requires it.
Of course, you know that solution exists. It is called Groovy++ - statically typed extension of Groovy. You can find more information at Groovy++ project page To make really long story short Groovy++ allows you to annotate piece of code (class, method or whole module) being statically or dynamically compiled (there is also mixed mode where compiler allows to mix static and dynamic calls together). Such approach gives you best of the both worlds - performance where it is important and dynamic features where you need it.
My personal expirience with Groovy++ is pretty simple - approximately 80% of Groovy code can be made statically typed with zero or minimal efforts. Thanks to extremely powerful type inference implemented in Groovy++. Saying the same thing in other words - 80% of Groovy code can be speed up 4 to 10 times with zero or minimal effort. BTW, the only required effort usually is adding type to dome method parameter or local variable, the rest usually happens magically.
I want to show you one example of such benchmark, which I was using recently to prove to my collegue that using of regular Groovy for multi-threaded application is not so good idea.
class LockPerf {
public static void main(String[] args) {
def processors = Runtime.runtime.availableProcessors()
for(def threadNum = 1; threadNum <= 1024; threadNum = threadNum < 2*processors ? threadNum+1 : threadNum*2) {
def counter = new AtomicInteger ()
def cdl = new CountDownLatch(threadNum)
def lock = new ReentrantLock()
def start = System.currentTimeMillis()
for(i in 0..<threadNum) {
Thread.start {
for(;;) {
lock.lock()
try {
if(counter.get() == 100000000) {
cdl.countDown()
break
}
else {
counter.incrementAndGet()
}
}
finally {
lock.unlock()
}
}
}
}
cdl.await()
println "$threadNum ${System.currentTimeMillis() - start}"
}
}
}
What does this benchmark do?
Pretty much nothing.
We can think about this benchmark as emulation of synchronization logic in complex muti-threaded algorithm and what we try to evaluate is how much overhead we bring by using Groovy instead of Java/Scala/Groovy++
- It iterates over some sequence of numbers (from one to double number of cores and then double it till 1024)
- For each such number it starts concurrent threads
- Each thread competes for shared lock
- When lock aquired it increment shared atomic value and release the lock
- If shared counter reaches 100000000 the thread stops
- When all threads stopped the iteration of benchmark is completed and "$threadNum $iterationElapsedTimeInMillis" printed
Just for completeness here is Java code implementing the same benchmark. I can not miss the opportunity to notice that it is more verbose than Groovy.
public class LockPerf {
public static void main(String[] args) {
int processors = Runtime.getRuntime().availableProcessors();
for (int threadNum = 1; threadNum <= 1024; threadNum = threadNum < 2 * processors ? threadNum + 1 : threadNum * 2) {
final AtomicInteger counter = new AtomicInteger();
final CountDownLatch cdl = new CountDownLatch(threadNum);
final ReentrantLock lock = new ReentrantLock();
long start = System.currentTimeMillis();
for (int i = 0; i < threadNum; ++i) {
new Thread(new Runnable() {
public void run() {
for (;;) {
lock.lock();
try {
if (counter.get() == 100000000) {
cdl.countDown();
break;
} else {
counter.incrementAndGet();
}
} finally {
lock.unlock();
}
}
}
}).start();
}
try {
cdl.await();
} catch (InterruptedException e) {//
}
System.out.println(threadNum + " " + (System.currentTimeMillis() - start));
}
}
}
You probably want to ask me what about Groovy++ code. It is interesting (and I am not tricking you) that the only change needed in original Grovy code was to add @Typed annotation to the class
@Typed
class LockPerf {
// all the rest is exactly as above
}
Here are benchmark's results on 4-core MacBook Pro
Groovy 1.7.7 | Groovy 1.8.0-beta-4 | Java 1.6 | Groovy++ 0.4.155 |
1 9221 2 19889 3 18256 4 18701 5 18036 6 18630 7 19837 8 19271 16 19166 32 19823 64 21144 128 22217 256 22848 512 23197 1024 25301 | 1 8804 2 22306 3 19549 4 19376 5 20753 6 20078 7 20267 8 20511 16 21014 32 21496 64 22402 128 23754 256 24919 512 25399 1024 27179 | 1 2243 2 9513 3 3762 4 3808 5 3981 6 3863 7 3742 8 4112 16 3967 32 4064 64 3964 128 3753 256 4039 512 3658 1024 4017 | 1 2791 2 5332 3 3696 4 3560 5 3785 6 3684 7 3528 8 3555 16 3616 32 3751 64 3807 128 3884 256 3942 512 3851 1024 3706 |
What can we notice in benchmark results
- Groovy versions are more than 6 times slower than Java
- Java and Groovy++ competes very strongly (which is correct as theoretically both should perform equally)
- Groovy 1.8 is a bit slower than 1.7 (which is fair for unreleased yet version)
My conclusion is obvious and predictable - use Groovy and extend it's power with Groovy++ where needed. It will make your code stronger and faster.
Hope you enjoyed it and till next time!
Ah, I promised to tell you about $5000 in one hour
Usually I don't do consulting. But sometimes you have challenge or an offer you can not reject.
Last week I was in Russia and old friend approached me with very urgent problem - they have important milestone and customer presentation of new version for big Grails application and one of most important new reports took more than seven seconds (totally unacceptable) to generate (mostly because of wrong database design done on earlier stages, which required to do a lot of data aggregation on application level). He knew that performance is my hobby and asked if I can help.
I said "you know what - I can seat with you for few hours and let us see what can we do. If we have great result you pay well if not we had fun together and you earn me good dinner" That was the deal and truly speaking I did not expect much. My plan was to review algorithms and see what can be optimized or targeted to be rewritten in Java.
The problematic code was more than 3000 lines of Groovy code splitted in 12 classes. Fortunately, well separated from report generation logic and reasonably covered with tests (in my opinion the only way to go with any code especially non-statically typed).
It became very clear that it is not possible for me to understand what's going on in short time, so I used simple trick - tried to apply Groovy++ (meaning installed Groovy++ Grails plugin and put @Typed here and there).
THAT WAS AMASING!
We had exactly 10 compilation errors:
This iteration (except fixing bugs in uncovered code, which I left to original author) took less than 30 mins (of course it would not be possible without IntelliJ and guys who knew all codebase). As result of this iteration we decreased 7 seconds to 2.5 seconds.
- 3 errors were fixed by adding type to method parameter
- 2 errors were fixed by setting correct generic type to collection
- 1 was about setting correct type of local variable
- 2 was real dynamic code required mixed mode of compilation
- and the rest 2 was absolute bugs in uncovered code, which static compiler helped us to find
What was even more important is that two pieces of really dynamic code we discovered was really strange. Two components were communicating between each other by generating and storing large intermediate XML document in database. As nobody could explain why is it done this way we simply replaced all this XML generation, storing, loading and parsing with direct in memory call and 2.5 seconds became 1.2 seconds All that took another 40 minutes (mostly spent in smoking area in philosiphical discussion about programmers and idiots).
The last 0.3 seconds was caught by my local friend who noticed in one compiler error that part of calculation unnecessary done in BigDecimals instead of doubles because of use of 1.0 notation instead of 1.0d Here I have huge feature request for Groovy++ to have reasonable warnings in such cases. BTW, I must admit that during the exercise we had one NPE crash of Groovy++ compiler, which fortunately was easy to fix.
That's it. In a bit less than hour and a half we (myself and two local guys) managed to speed up critical part of application from unacceptable to required performance using Groovy++, IntelliJ IDEA and a bit of common sense. What is important here is that going to Java was not an option due time constraints.
I must tell you it was one of the best paid hour and a half in my life :)
I must also mention that yesterday my friends wrote me that they made presentation successfully and got approval from customer for continuation of the project till the end of the year.
Opinions expressed by DZone contributors are their own.
Comments