The Truth About Traditional JavaScript Benchmarks (Part 1 - Introduction)

Hate it or love it, JavaScript is at the core of the web today and continuing to spread. So, how do you measure the performance of JS? Benedikt Meurer, tech lead of the JavaScript Execution Optimization at Google, says that traditional JS benchmarks are not the answer.

· Performance Zone · Analysis

it is probably fair to say that javascript is the most important technology these days when it comes to software engineering. to many of us who have been into programming languages, compilers and virtual machines for some time, this still comes a bit as a surprise, as javascript is neither very elegant from the language designers point of view, nor very optimizable from the compiler engineers point of view, nor does it have a great standard library. depending on who you talk to, you can enumerate shortcomings of javascript for weeks and still find another odd thing you didn’t know about. despite what seem to be obvious obstacles, javascript is at the core of not only the web today, but it’s also becoming the dominant technology on the server-/cloud-side (via node.js ), and even finding its way into the iot space.

that raises the question, why is javascript so popular/successful? there is no one great answer to this that i’m aware of. there are many good reasons to use javascript today, probably most importantly the great ecosystem that was built around it and the huge amount of resources available out there. but, all of this is actually a consequence to some extent. why did javascript become popular in the first place? well, it was the lingua franca of the web for ages, you might say. but that was the case for a long time, and people hated javascript with passion. looking back in time, it seems the first javascript popularity boosts happened in the second half of the last decade. unsurprisingly, this was the time when javascript engines accomplished huge speed-ups on various different workloads, which probably changed the way that many people looked at javascript.

back in the days, these speed-ups were measured with what is now called traditional javascript benchmarks , starting with apple’s sunspider benchmark , the mother of all javascript micro-benchmarks, followed by mozilla’s kraken benchmark and google’s v8 benchmark. later the v8 benchmark was superseded by the octane benchmark and apple released its new jetstream benchmark . these traditional javascript benchmarks drove amazing efforts to bring a level of performance to javascript that no one would have expected at the beginning of the century. speed-ups up to a factor of 1000 were reported, and all of a sudden using <script> within a website was no longer a dance with the devil, and doing work client-side was not only possible, but even encouraged.

measuring performance, a simplified history of benchmarking js source: advanced js performance with v8 and web assembly , chrome developer summit 2016, @s3ththompson .

now in 2016, all (relevant) javascript engines have reached a level of performance that is incredible, and web apps are as snappy as native apps (or at least can be as snappy as native apps). the engines ship with sophisticated optimizing compilers that generate short sequences of highly optimized machine code by speculating on the type/shape that hit certain operations (i.e. property access, binary operations, comparisons, calls, etc.) based on feedback collected about types/shapes seen in the past. most of these optimizations were driven by micro-benchmarks like sunspider or kraken, and static test suites like octane or jetstream. thanks to javascript-based technologies like asm.js and emscripten it is even possible to compile large c++ applications to javascript and run them in your web browser without having to download or install anything. for example, you can play angrybots on the web out-of-the-box, whereas in the past gaming on the web required special plugins like adobe flash or chrome’s pnacl.

the vast majority of these accomplishments were due to the presence of these micro-benchmarks and static performance test suites, and the vital competition that resulted from having these traditional javascript benchmarks. you can say what you want about sunspider, but it’s clear that without sunspider, javascript performance would likely not be where it is today. okay, so much for the praise… now on to the flip side of the coin: any kind of static performance test—be it a micro-benchmark or a large application macro-benchmark—is doomed to become irrelevant over time! why? because the benchmark can only teach you so much before you start gaming it. once you get above (or below) a certain threshold, the general applicability of optimizations that benefit a particular benchmark will decrease exponentially. for example, we built octane as a proxy for performance of real-world web applications, and it probably did a fairly good job at that for quite some time, but nowadays the distribution of time in octane vs. the real world is quite different, so optimizing for octane beyond where it is currently, is likely not going to yield any significant improvements in the real world (neither general web nor node.js workloads).

distribution of time in benchmarks vs. real world source: real-world javascript performance , blinkon 6 conference, @tverwaes .

since it became more and more obvious that all the traditional benchmarks for measuring javascript performance, including the most recent versions of jetstream and octane, might have outlived their usefulness, we started investigating new ways to measure real-world performance at the beginning of the year, and added a lot of new profiling and tracing hooks to v8 and chrome. we especially added mechanisms to see where exactly we spend time when browsing the web, i.e. whether it’s script execution, garbage collection, compilation, etc., and the results of these investigations were highly interesting and surprising. as you can see from the slide above, running octane spends more than 70% of the time executing javascript and collecting garbage, while browsing the web you always spend less than 30% of the time actually executing javascript, and never more than 5% collecting garbage. instead, a significant amount of time goes to parsing and compiling, which is not reflected in octane. so, spending a lot of time to optimize javascript execution will boost your score on octane, but won’t have any positive impact on loading . in fact, spending more time on optimizing javascript execution might even hurt your real-world performance since the compiler takes more time, or you need to track additional feedback, thus eventually adding more time to the compile, ic and runtime buckets.


there’s another set of benchmarks, which try to measure overall browser performance, including javascript and dom performance, with the most recent addition being the speedometer benchmark . the benchmark tries to capture real world performance more realistically by running a simple todomvc application implemented with different popular web frameworks (it’s a bit outdated now, but a new version is in the works). the various tests are included in the slide above next to octane (angular, ember, react, vanilla, flight, and backbone), and as you can see these seem to be a better proxy for real-world performance at this point in time. note however, that this data is already six months old at the time of this writing and things might have changed as we optimized more real-world patterns (for example, we are refactoring the ic system to reduce overhead significantly, and the parser is being redesigned ). also note that while this looks like it’s only relevant in the browser space, we have very strong evidence that traditional peak performance benchmarks are also not a good proxy for real world node.js application performance.

speedometer vs. octane source: real-world javascript performance , blinkon 6 conference, @tverwaes .

all of this is probably already known to a wider audience, so i'll use the coming posts to highlight a few concrete examples—why i think it’s not only useful, but crucial for the health of the javascript community to stop paying attention to static peak performance benchmarks above a certain threshold. come back to this series tomorrow, and let me run you through a couple of examples of how javascript engines can and do game benchmarks.

stay tuned for part two !

Published at DZone with permission of Benedikt Meurer, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.