Some Fun with R Visualization
Join the DZone community and get the full member experience.
Join For Freeoriginally posted by
in my previous post
, i finished with a graph with unstable results. now let's explore some different ways to present those results. i enjoy working with r, and though i'm not even close to being
proficient in it, i want to share some graphs you can build with r +
ggplot2.
the conditions of the benchmark are the same as in the previous post, with the difference being that there are results for 4 and 16 tables cases running mysql 5.5.20.
let me remind you how i do measurements. i run benchmarks for 1 hours, with measurements every 10 seconds.
so we have 360 points – metrics.
if we draw them all, it will look like:
i will also show my r code:
m <- ggplot(dv.ver, aes(x = sec, throughput, color=factor(tables))) m + geom_point()
the previous graph is not very representative, so we may add some lines to see a trend.
m + geom_point() + geom_line()
this looks better, but still you may have hard time answering: which case shows the better throughput? what number we should take as the final result?
jitter graph may help:
m <- ggplot(dv.ver, aes(x = factor(tables), throughput, color=factor(tables))) m + geom_jitter(alpha=0.75)
with jitter we see some dense areas, which shows "most likely" throughput.
so let's build density graphs:
m <- ggplot(dd, aes(x = throughput,fill=factor(tables))) m+geom_density(alpha = 0.7)
or
m+geom_density(alpha = 0.7)+facet_wrap(~tables,ncol=1)
in these graphs axe x is throughput and axe y represents density of hitting given throughput.
that may give you an idea how to compare both results, and that the biggest density is around 3600-3800 tps.
and we are moving to numbers, we can build boxplots:
m <- ggplot(dd, aes(x = factor(tables),y=throughput,fill=factor(tables))) m+geom_boxplot()
that may not be easy to read if you never saw boxplots. there's good reading on this way to represent data . in short - the middle line inside a box is median (line that divides top 50% and bottom 50%), the line that limits the top of a box - 75% quantile (divides 75% bottom and 25% top results), and correspondingly - the line at the bottom of a box - 25% quantile (you should have an idea already what does that mean). you may decide what measurements you want to take to compare the results - median, 75%, etc.
and finally we can combine jitter and boxplot to get:
m <- ggplot(dd, aes(x = factor(tables),y=throughput,color=factor(tables))) m+geom_boxplot()+geom_jitter()
that's it for today.
the full script sysbench-4-16.r with data you can get on benchmarks launchpad
if you want to see more visualizations idea, you may check out brendan's blog :
- http://dtrace.org/blogs/brendan/2011/12/18/visualizing-device-utilization/
- http://dtrace.org/blogs/brendan/2012/02/06/visualizing-process-snapshots/
- http://dtrace.org/blogs/brendan/2012/02/12/visualizing-process-execution/
and if you're wondering what to do with such unstable results in mysql, stay tuned. there is a solution.
Published at DZone with permission of Peter Zaitsev, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments