# Some Fun with R Visualization

Join the DZone community and get the full member experience.

Join For Freeoriginally posted by

in my previous post
, i finished with a graph with unstable results. now let's explore some different ways to present those results. i enjoy working with r, and though i'm not even close to being
proficient in it, i want to share some graphs you can build with r +
ggplot2.

the conditions of the benchmark are the same as in the previous post, with the difference being that there are results for 4 and 16 tables cases running mysql 5.5.20.

let me remind you how i do measurements. i run benchmarks for 1 hours, with measurements every 10 seconds.

so we have 360 points – metrics.

if we draw them all, it will look like:

i will also show my r code:

m <- ggplot(dv.ver, aes(x = sec, throughput, color=factor(tables))) m + geom_point()

the previous graph is not very representative, so we may add some lines to see a trend.

m + geom_point() + geom_line()

this looks better, but still you may have hard time answering: which case shows the better throughput? what number we should take as the final result?

jitter graph may help:

m <- ggplot(dv.ver, aes(x = factor(tables), throughput, color=factor(tables))) m + geom_jitter(alpha=0.75)

with jitter we see some dense areas, which shows "most likely" throughput.

so let's build density graphs:

m <- ggplot(dd, aes(x = throughput,fill=factor(tables))) m+geom_density(alpha = 0.7)

or

m+geom_density(alpha = 0.7)+facet_wrap(~tables,ncol=1)

in these graphs axe x is throughput and axe y represents density of hitting given throughput.

that may give you an idea how to compare both results, and that the biggest density is around 3600-3800 tps.

and we are moving to numbers, we can build boxplots:

m <- ggplot(dd, aes(x = factor(tables),y=throughput,fill=factor(tables))) m+geom_boxplot()

that may not be easy to read if you never saw boxplots. there's good reading on this way to represent data . in short - the middle line inside a box is median (line that divides top 50% and bottom 50%), the line that limits the top of a box - 75% quantile (divides 75% bottom and 25% top results), and correspondingly - the line at the bottom of a box - 25% quantile (you should have an idea already what does that mean). you may decide what measurements you want to take to compare the results - median, 75%, etc.

and finally we can combine jitter and boxplot to get:

m <- ggplot(dd, aes(x = factor(tables),y=throughput,color=factor(tables))) m+geom_boxplot()+geom_jitter()

that's it for today.

the full script sysbench-4-16.r with data you can get on benchmarks launchpad

if you want to see more visualizations idea, you may check out brendan's blog :

- http://dtrace.org/blogs/brendan/2011/12/18/visualizing-device-utilization/
- http://dtrace.org/blogs/brendan/2012/02/06/visualizing-process-snapshots/
- http://dtrace.org/blogs/brendan/2012/02/12/visualizing-process-execution/

and if you're wondering what to do with such unstable results in mysql, stay tuned. there is a solution.

Published at DZone with permission of Peter Zaitsev, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Comments