The Notorious SunSpider Examples
bitops-bitwise-and.js performance test.
bitwiseAndValue will be
0 after the first loop iteration and will remain
bitwiseAndValue is either a regular property of the global object or not present before you execute the script, there must be no interceptor on the global object or it’s prototypes, etc., but if you really want to win this benchmark, and you are willing to go all in, then you can execute this test in less than 1ms. However this optimization would be limited to this special case, and slight modifications of the test would probably no longer trigger it.
Ok, so that
bitops-bitwise-and.js test was definitely the worst example of a micro-benchmark. Let’s move on to something more real-world-ish in SunSpider, the
string-tagcloud.js test, which essentially runs a very early version of the
json.js polyfill. The test arguably looks a lot more reasonable that the bitwise and test, but looking at the profile of the benchmark for some time immediately reveals that a lot of time is spent on a single
eval expression (up to 20% of the overall execution time for parsing and compiling plus up to 10% for actually executing the compiled code):
Looking closer reveals that this
eval is executed exactly once, and is passed a JSONish string that contains an array of 2501 objects with
Obviously parsing these object literals, generating native code for it and then executing that code, comes at a high cost. It would be a lot cheaper to just parse the input string as JSON and generate an appropriate object graph. So, one trick to speed up this benchmark is to mock with
eval and try to always interpret the data as JSON first and only fallback to real parse, compile, execute if the attempt to read JSON failed (some additional magic is required to skip the parenthesis, though). Back in 2007, this wouldn’t even be a bad hack, since there was no
...yields an immediate performance boost, dropping runtime from 36ms to 26ms for V8 LKGR as of today, a 30% improvement!
This is a common problem with static benchmarks and performance test suites. Today, no one would seriously use
eval to parse JSON data (also for obvious security reasons, not only for the performance issues), but rather stick to
JSON.parse for all code written in the last five years. In fact using
eval to parse JSON would probably be considered a bug in production code today! So the engine writers effort of focusing on performance of newly written code is not reflected in this ancient benchmark, instead it would be beneficial to make
complex to win on
Ok, so let’s look at yet another example: the
3d-cube.js. This benchmark does a lot of matrix operations, where even the smartest compiler can’t do a lot about it, but just has to execute it. Essentially, the benchmark spends a lot of time executing the
Loop function and functions called by it.
- 0.05235987755982989, and
...obviously. So, one thing you could do here to avoid recomputing the same sine and cosine values all the time is to cache the previously computed values, and in fact, that’s what V8 used to do in the past, and other engines like SpiderMonkey still do. We removed the so-called transcendental cache from V8 because the overhead of the cache was noticeable in actual workloads where you don’t always compute the same values in a row, which is unsurprisingly very common in the wild. We took serious hits on the SunSpider benchmark when we removed these benchmark-specific optimizations back in 2013 and 2014, but we totally believe that it doesn’t make sense to optimize for a benchmark while at the same time penalizing the real world use case in such a way.
Obviously, a better way to deal with the constant sine/cosine inputs is a sane inlining heuristic that tries to balance inlining and take into account different factors like prefer inlining at call sites where constant folding can be beneficial, like in the case of the
RotateZ call sites. But this was not really possible with the Crankshaft compiler for various reasons. With Ignition and TurboFan, this becomes a sensible option, and we are already working on better inlining heuristics.
Garbage Collection Considered Harmful
Besides these very test-specific issues, there’s another fundamental problem with the SunSpider benchmark: The overall execution time. V8 on decent Intel hardware runs the whole benchmark in roughly 200ms currently (with the default configuration). A minor GC can take anything between 1ms and 25ms currently (depending on live objects in new space and old space fragmentation), while a major GC pause can easily take 30ms (not even taking into account the overhead from incremental marking), that’s more than 10% of the overall execution time of the whole SunSpider suite! So any engine that doesn’t want to risk a 10-20% slowdown due to a GC cycle has to somehow ensure it doesn’t trigger GC while running SunSpider.
There are different tricks to accomplish this, none of which has any positive impact in the real world as far as I can tell. V8 uses a rather simple trick: Since every SunSpider test is run in a new
<iframe>, which corresponds to a new native context in V8 speak, we just detect rapid
<iframe> creation and disposal (all SunSpider tests take less than 50ms each), and in that case perform a garbage collection between the disposal and creation, to ensure that we never trigger a GC while actually running a test. This trick works pretty well, and in 99.9% of the cases doesn’t clash with real uses; except, every now and then, it can hit you hard if for whatever reason you do something that makes you look like you are the SunSpider test driver to V8, then you can get hit hard by forced GCs, and that can have a negative effect on your application. So rule of thumb: Don’t let your application look like SunSpider!
I could go on with more SunSpider examples here, but I don’t think that’d be very useful. By now it should be clear that optimizing further for SunSpider above the threshold of good performance will not reflect any benefits in the real world. In fact, the world would probably benefit a lot from not having SunSpider anymore, as engines could drop weird hacks that are only useful for SunSpider and can even hurt real-world use cases. Unfortunately, SunSpider is still being used heavily by the (tech) press to compare what they think is browser performance, or even worse, to compare phones! So there’s a certain natural interest from phone makers and also from Android, in general, to have Chrome look somewhat decent on SunSpider (and other nowadays meaningless benchmarks FWIW). The phone makers generate money by selling phones, so getting good reviews is crucial for the success of the phone division or even the whole company, and some of them even went as far as shipping old versions of V8 in their phones that had a higher score on SunSpider, exposing their users to all kinds of unpatched security holes that had long been fixed and shielding their users from any real-world performance benefits that come with more recent V8 versions!Source: Galaxy S7 and S7 Edge review: Samsung's finest get more polished, www.engadget.com.
I always loved this in Myles Borins’ talks, so I had to shamelessly steal his idea. Anyway, hope that refreshed your pallate. And now that we've recovered from the SunSpider rant, we can all get back to whatever it is we were doing. Tomorrow, we'll continue on looking into the other classic benchmarks…
Stay tuned for part three. Here's part one in case you missed it.