Over a million developers have joined DZone.

Stacked Area Charts and Mathematical Approximations

A review of the usefulness of communicating data via stacked area charts.

· Big Data Zone

Hortonworks DataFlow is an integrated platform that makes data ingestion fast, easy, and secure. Download the white paper now.  Brought to you in partnership with Hortonworks

I've previously noted that I think stacked area charts are frequently used when a conventional line chart would be a better option. Here is the (fictional) example I used previously and the conventional line chart alternative.

In short, if you want people to be able to make reasonably accurate judgments of the magnitudes of the individual components, and how they change depending on some other variable (such as time), the conventional line chart design is almost always going to be the best option. The lack of a steady baseline for all but the bottom component makes this task difficult for the stacked area chart.

Stacked area charts can be useful if you want to illustrate an ordered sum of components that change with another variable. While previously I suggested how the cost of milk production from farm to shop might change with time might be suitable, here I'd like to consider something very different: selected mathematical series.

You're probably familiar with trigonometric functions like sine and cosine and you may also know about the exponential function and hyperbolic functions. It's fairly easy to draw graphs of these functions if you have a calculator of some sort. When tied up in complicated equations, these functions may become awkward to deal with. Consequently, alternative ways of approximating these functions can come in very handy.

The functions mentioned above are all analytic functions. What this really means is quite complicated to attempt to explain so I won't try to do so here. Instead, I'll just stick to the following: these functions can all be written as a sum of powers of their argument (typically denoted x), that is, as a polynomial. Being explicit helps, so here is a way of rewriting the exponential function:

In a similar manner, here is another way of expressing the cosine function:

And, here is the hyperbolic cosine function (typically written as cosh):

In general, to get an exact value for one of these functions using summation, we need to sum to infinity. This is not the case at the origin where all but the first term will equal 0. Close to the origin we will also get a good approximation as x is small. But, how close and how good? We can plot the first few terms of, for example, the exponential function expression and see. The black line in the GIF below shows the exact exponential function, the blue wedges show the result of adding more and more terms from the right-hand side of the equation (from the zeroth power of x up to the 8th) for the exponential function above. The translucent red wedge indicates the area not covered by the polynomial approximation.

Below about x=1 we can see that the first three terms of the polynomial are a pretty good approximation for the exponential function. To get a good approximation around x=3 we need to go up to the sixth or seventh power of x (i.e. seven or eight terms of the polynomial). As the GIF below shows, even going to the eighth power of x isn't sufficient around x=6.

We can look at the hyperbolic cosine function in a similar way, though there are no terms with odd powers of x.

As you might expect, when we look at large distances from the origin, we need more and more terms of the polynomial in order to closely match the exact function. At x=±6, adding up terms up to the 8th power of x is not sufficient to get a good approximation.

I think these are cases where stacked area charts can be of real use. We're genuinely interested in the progressive sums of components, not the individual parts and that's where stacked charts excel.

You probably noticed that I skipped over producing charts for the cosine function. That's because stacked charts fail. Why? Because successive terms have opposite signs. While including more and more terms in the polynomial approximation does get you closer and closer to the exact function, you can't show this as a simple stack because some terms add to the total while others subtract. This also a problem for the exponential function when x is negative: terms involving even powers of x will be positive while those involving odd powers of x will be negative. This is a purely visual issue that doesn't crop up when we plot lines instead of stacks.

Hopefully, I've shown that stacked area charts can be useful when it is the ordered sums of components that are of interest and if the conditions are right. For the conditions to be right then all components of the stack must share the same sign (or be 0) at each (visible) point along the horizontal axis.

Hortonworks Sandbox is a personal, portable Apache Hadoop® environment that comes with dozens of interactive Hadoop and it's ecosystem tutorials and the most exciting developments from the latest HDP distribution, brought to you in partnership with Hortonworks.


Published at DZone with permission of Josh Anderson, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}