Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

DZone's Guide to

# Some Fun with R Visualization

· Performance Zone ·
Free Resource

Comment (0)

Save
{{ articles[0].views | formatCount}} Views

Learn how error monitoring with Sentry closes the gap between the product team and your customers. With Sentry, you can focus on what you do best: building and scaling software that makes your users’ lives better.

Originally posted by

In my previous post
, I finished with a graph with unstable results. Now let's explore some different ways to present those results. I enjoy working with R, and though I'm not even close to being proficient in it, I want to share some graphs you can build with R + ggplot2.

The conditions of the benchmark are the same as in the previous post, with the difference being that there are results for 4 and 16 tables cases running MySQL 5.5.20.

Let me remind you how I do measurements. I run benchmarks for 1 hours, with measurements every 10 seconds.
So we have 360 points – metrics.

If we draw them all, it will look like:

I will also show my R code:

```m <- ggplot(dv.ver,
aes(x = sec, Throughput, color=factor(Tables)))
m + geom_point()```

The previous graph is not very representative, so we may add some lines to see a trend.

`m + geom_point() + geom_line()`

This looks better, but still you may have hard time answering: which case shows the better throughput? what number we should take as the final result?

Jitter graph may help:

```m <- ggplot(dv.ver,
aes(x = factor(Tables), Throughput, color=factor(Tables)))
m + geom_jitter(alpha=0.75)```

With jitter we see some dense areas, which shows "most likely" throughput.

So let's build density graphs:

```m <- ggplot(dd,
aes(x = Throughput,fill=factor(Tables)))
m+geom_density(alpha = 0.7)```

or

`m+geom_density(alpha = 0.7)+facet_wrap(~Tables,ncol=1)`

In these graphs Axe X is Throughput and Axe Y represents density of hitting given Throughput.

That may give you an idea how to compare both results, and that the biggest density is around 3600-3800 tps.

And we are moving to numbers, we can build boxplots:

```m <- ggplot(dd,
aes(x = factor(Tables),y=Throughput,fill=factor(Tables)))
m+geom_boxplot()```

That may not be easy to read if you never saw boxplots. There's good reading on this way to represent data. In short - the middle line inside a box is median (line that divides top 50% and bottom 50%), the line that limits the top of a box - 75% quantile (divides 75% bottom and 25% top results), and correspondingly - the line at the bottom of a box - 25% quantile (you should have an idea already what does that mean). You may decide what measurements you want to take to compare the results - median, 75%, etc.

And finally we can combine jitter and boxplot to get:

```m <- ggplot(dd,
aes(x = factor(Tables),y=Throughput,color=factor(Tables)))
m+geom_boxplot()+geom_jitter()```

That's it for today.

The full script sysbench-4-16.R with data you can get on benchmarks launchpad

If you want to see more visualizations idea, you may check out Brendan's blog:

And if you're wondering what to do with such unstable results in MySQL, stay tuned. There is a solution.

What’s the best way to boost the efficiency of your product team and ship with confidence? Check out this ebook to learn how Sentry's real-time error monitoring helps developers stay in their workflow to fix bugs before the user even knows there’s a problem.

Topics:

Comment (0)

Save
{{ articles[0].views | formatCount}} Views

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.