DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
  1. DZone
  2. Data Engineering
  3. Data
  4. Some Fun with R Visualization

Some Fun with R Visualization

Peter Zaitsev user avatar by
Peter Zaitsev
·
Sep. 18, 12 · Interview
Like (0)
Save
Tweet
Share
4.73K Views

Join the DZone community and get the full member experience.

Join For Free
originally posted by vadim tkachenko


in my previous post
, i finished with a graph with unstable results. now let's explore some different ways to present those results. i enjoy working with r, and though i'm not even close to being proficient in it, i want to share some graphs you can build with r + ggplot2.

the conditions of the benchmark are the same as in the previous post, with the difference being that there are results for 4 and 16 tables cases running mysql 5.5.20.

let me remind you how i do measurements. i run benchmarks for 1 hours, with measurements every 10 seconds.
so we have 360 points – metrics.

if we draw them all, it will look like:

i will also show my r code:

m <- ggplot(dv.ver,
            aes(x = sec, throughput, color=factor(tables)))
m + geom_point()

the previous graph is not very representative, so we may add some lines to see a trend.


m + geom_point() + geom_line()

this looks better, but still you may have hard time answering: which case shows the better throughput? what number we should take as the final result?

jitter graph may help:


m <- ggplot(dv.ver,
            aes(x = factor(tables), throughput, color=factor(tables)))
m + geom_jitter(alpha=0.75)

with jitter we see some dense areas, which shows "most likely" throughput.

so let's build density graphs:


m <- ggplot(dd,
            aes(x = throughput,fill=factor(tables)))
m+geom_density(alpha = 0.7)

or


m+geom_density(alpha = 0.7)+facet_wrap(~tables,ncol=1)

in these graphs axe x is throughput and axe y represents density of hitting given throughput.

that may give you an idea how to compare both results, and that the biggest density is around 3600-3800 tps.

and we are moving to numbers, we can build boxplots:


m <- ggplot(dd,
            aes(x = factor(tables),y=throughput,fill=factor(tables)))
m+geom_boxplot()

that may not be easy to read if you never saw boxplots. there's good reading on this way to represent data . in short - the middle line inside a box is median (line that divides top 50% and bottom 50%), the line that limits the top of a box - 75% quantile (divides 75% bottom and 25% top results), and correspondingly - the line at the bottom of a box - 25% quantile (you should have an idea already what does that mean). you may decide what measurements you want to take to compare the results - median, 75%, etc.

and finally we can combine jitter and boxplot to get:


m <- ggplot(dd,
            aes(x = factor(tables),y=throughput,color=factor(tables)))
m+geom_boxplot()+geom_jitter()

that's it for today.

the full script sysbench-4-16.r with data you can get on benchmarks launchpad

if you want to see more visualizations idea, you may check out brendan's blog :

  • http://dtrace.org/blogs/brendan/2011/12/18/visualizing-device-utilization/
  • http://dtrace.org/blogs/brendan/2012/02/06/visualizing-process-snapshots/
  • http://dtrace.org/blogs/brendan/2012/02/12/visualizing-process-execution/

and if you're wondering what to do with such unstable results in mysql, stay tuned. there is a solution.

R (programming language) Visualization (graphics)

Published at DZone with permission of Peter Zaitsev, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Connecting Your Devs' Work to the Business
  • Top 12 Technical Skills Every Software Tester Must Have
  • DevOps Roadmap for 2022
  • Data Mesh vs. Data Fabric: A Tale of Two New Data Paradigms

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: