{{ !articles[0].partner.isSponsoringArticle ? "Platinum" : "Portal" }} Partner

Contrasting Performance : Languages, styles and VMs – Java, Scala, Python, Erlang, Clojure, Ruby, Groovy, Javascript

There’s a better place to specifically look at performance comparisons across languages than this post – The computer languages benchmarks game. But this post attempts look at performance comparisons a little differently. Based on coding idioms as well. And for a much narrower range of problems (namely one).

There are languages which are tightly opinionated on a particular way of doing things. And there are languages which allow you to implement a given logic in multiple ways. Yet, depending upon the language (and as we shall see, the runtime), the performance could vary quite substantially based on the nature of the code we write. This post attempts to take a small piece of logic, and implements in upto 3 different styles in 8 languages (10 if you count the runtime variations as well).

This is also a sequel to a post I wrote 3 years ago Performance Comparison – C++ / Java / Python / Ruby/ Jython / JRuby / Groovy. In effect it supercedes the findings in that post by using more contemporary versions and hardware and also shuffling the languages a bit. Gone is C++. After Alberto Bignotti suggested a modification in the memory allocation model, the C++ code became too fast, way too fast to be interesting. I really did not think it worthwhile spending time to write it in different styles, when it was already so incredibly fast to begin with. Gone is PHP. Its a useful language, but I really don’t see it competing in the minds of the programmers who are interested by alternative coding idioms and their implications on performance. In comes Javascript using the V8 engine. It has since become a more interesting server side language thanks to v8 and node.js. In come Erlang, Scala and Clojure. All three exciting languages for different reasons. I would’ve wished to incorporate Haskell, but decided not to be too adventurous. And as far as runtimes go, Jython and Psyco go away for python to be replaced by PyPy – an exciting new runtime for python (Quora as an example now uses PyPy in production). I saw Charles Nutter’s tweets about a new jRuby runtime using invokedynamic. But apparently there’s still stuff being shaken out between the Java 7 update 1 and jRuby version using invokedynamic. I hope to update this post once both these are released.


Quoting from The Josephus Problem,

Flavius Josephus was a roman historian of Jewish origin. During the Jewish-Roman wars of the first century AD, he was in a cave with fellow soldiers, 40 men in all, surrounded by enemy Roman troops. They decided to commit suicide by standing in a ring and counting off each third man. Each man so designated was to commit suicide…Josephus, not wanting to die, managed to place himself in the position of the last survivor.

In the general version of the problem, there are n soldiers numbered from 1 to n and each k-th soldier will be eliminated. The count starts from the first soldier. What is the number of the last survivor. In the code I benchmarked, n = 40 and k = 3.


I have considered three idioms :

  • Object Oriented : This code has classes reflecting a person (or a soldier) and the chain. The objects of person maintain reference to their prior and next people in the cirlce (a doubly linked list, and as the counting progresses, whenever they need to eliminate themselves, they do so by updating the next / prev references in the prev / next objects. This style results perhaps in the least operations involving mutation or memory allocation / deallocation. One would’ve imagined it to be the fastest, but as you will see that is not necessarily true.
  • List reduction :This code starts with a list of integers, each element representing a soldier. It performs an operation which effectively creates a subset of the list by removing every third soldier. The result of one such pass is a smaller list. Rinse and repeat if the smaller list is more than 1 element long. It emphasises looping over lists (using comprehension or other constructs) and focuses on reducing the list by conducting an operation on the entire list, every pass.
  • Element recursion :This is a more fine grained logic which emphasises recursion (and often accumulation) for every element in the list. This is particularly apt scenario to use pattern matching (both the erlang and scala code use pattern matching). One would imagine this to be always slower than list reduction since it is much more fine grained and involves many more function calls.

I’ve attempted to implement code in all languages using the styles above as long as reasonably feasible and appropriate. Since (barring C/C++), Java continues to be the language to beat from a performance perspective, I’ve attempted to implement roughly equivalent logic in all styles using Java as well. All programs typically run the code once to print the results (to verify correctness), and then 100000 or a million iterations to warmup, and then again repeat the iterations and measuring the elapsed time. There is a slight inconsistency between the various code snippets. The counter either varies between 0 to 39 or between 1 to 40.


I can’t write the fastest possible code across all these languages. This is the best I could do. However if you can find a better way to implement the code, do let me know in the comments (or send me a pull request on github). I shall certainly include better solutions here if and as they are identified. At the point in time of publishing this, at least two authors had contributed to the code. I imagine (based on my experience with the prior post), more might be interested in suggesting tweaks to further improve performance. These are all listed here.

  • Paddy3118 had suggested some python code in the comments in last blog post, which I have substantially reused for the python list-reduction logic
  • Rahul Göma Phuloré (missingfaktor) contributed substantial improvments to the scala code
  • Viktor Klang contributed a improved version for the scala element recursion code
  • David Nolen (swannodette) contributed a substantially improved version for clojure element recursion
  • Fred Hebert suggested native compilation by adding “compile(native).” and a couple of other minor improvements over github
  • Isaac Guoy offered an improved version for Java List Reduction

Hardware / Software

The specs of the machine used for measurement are as follows

OS : Ubuntu Maveric Meerkat 10.10
Kernel : 2.6.35-28-server 64 bit
CPU : Intel(R) Core(TM)2 Quad CPU Q8300 @ 2.50GHz

Metrics :

Here are the results. All timings are in microseconds per iteration, clearly lower metrics are better than higher metrics :

  Object Oriented List Reduction Element Recursion
Java 1.6 0.637 1.435 2.816
Scala 0.647 1.676 2.367
Erlang R14B03 HiPE   3.489 3.192
Javascript v8   2.984 9.029
PyPy 1.5.0α0 4.387 6.559 30.291
Jruby 1.6.3 29.174 25.776 22.441
Clojure 1.2.1   25.966 28.753
Groovy 1.8.1 50.271 14.776 27.490
Ruby 1.9.1 73.933 34.950 41.234
CPython 2.6.6 119.243 29.985 61.420

Observations : (Updated)

Some interesting observations (in no particular order).

  • These are observations based only on a particular problem. For broader coverage of cross-language benchmarks, I encourage you to refer to The computer languages benchmarks game
  • Java and Scala performance is pretty close.
  • One would’ve imagined OO code to be generally the fastest due to minimal mutations or memory allocations. Yet that remains true only for the statically typed languages (and one notable exception PyPy). For all others, the list reduction approach is faster.
  • Erlang and Javascript V8 also shows some very good performance. Erlang seems to especially excel in deep recursion and pattern matching, where it is faster than all but Java.
  • The real surprise in the pack comes from PyPy. It performs outstandingly well on the OO and List Reduction approaches. It still is a bit slow in the element recursion approach. I would hope given the very young age of the VM, that the same would get further optimised in times and versions to come. Interestingly, the pause it undergoes (perhaps when jit’ting the code) is quite noticeable.
  • jRuby consistently shows superior performance to Ruby 1.9.1. Coupled with further expected improvements arising out of invokedynamic in JDK 7, this is one runtime to watch. Though as the metrics show, it still has some ways to go to catch up.
  • The groovy performance using the list reduction is quite nice.
  • Its surprising to see clojure roughly similar to jRuby in performance.
  • It is interesting to see multiple runtimes do well. JVM, Erlang BEAM, Javascript V8 and Python PyPy. At the same time it is also apparent that just targeting a particular runtime does not imply strong performance since a lot still is dependent on the nature of the bytecode produced, and assumptions about speed being high given a runtime being say on the JVM might not necesarily hold.

Full Source code is available on github at https://github.com/dnene/josephus

Finally, thanks to a number of folks I had a chance to preview the post with and especially to Saager Mhatre to suggest moving the code from a attached zip file to github.


  • Updated metrics for groovy 1.8.1 (instead of earlier groovy 1.7)
  • Updated code to reflect suggestions by Eric Rozendaal and another almost similar one by Viktor Klang – Viktor’s code was very marginally faster. Leads to a reduction in Scala Element Recursive benchmark from 5.213 to 2.334 microseconds
  • Updated clojure element recursion code as per suggestion by David Nolen. Now time down from 135.36 to 29.170 microseconds
  • Thanks to the persistent questioning by Isaac, upgraded the metrics to jRuby 1.6.3. That turned out to be a very good step. There is a substantial improvements in the performance metrics which are now updated in the numbers above.
  • Fred Hebert submitted a pull request to turn on native compilation which required native compilation – which in turn required HiPE which Isaac had suggested earlier. After verifying that Erlang-HiPE is a valid synaptic target (thus a different readily available VM), I built the same and updated the readings
  • Isaac Gouy offered some helpful suggestions in terms of converting the main block also into a function. Also he demonstrated some potential issues in terms of whether the resulting performance was stable. I have made across the board changes now to run all the benchmarks ten times each for a million iterations and used the last 5 readings after visually ensuring that the readings did not vary much


From http://blog.dhananjaynene.com/2011/08/cperformance-comparison-languages-styles-and-vms-java-scala-python-erlang-clojure-ruby-groovy-javascript/

{{ tag }}, {{tag}},

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}
{{ parent.authors[0].realName || parent.author}}

{{ parent.authors[0].tagline || parent.tagline }}

{{ parent.views }} ViewsClicks