DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
Securing Your Software Supply Chain with JFrog and Azure
Register Today

Trending

  • Tomorrow’s Cloud Today: Unpacking the Future of Cloud Computing
  • Exploring the Capabilities of eBPF
  • Java Concurrency: Condition
  • Grow Your Skills With Low-Code Automation Tools

Trending

  • Tomorrow’s Cloud Today: Unpacking the Future of Cloud Computing
  • Exploring the Capabilities of eBPF
  • Java Concurrency: Condition
  • Grow Your Skills With Low-Code Automation Tools
  1. DZone
  2. Data Engineering
  3. Data
  4. Tableau + R: Back Your Data Visualizations With Statistical Testing

Tableau + R: Back Your Data Visualizations With Statistical Testing

We look at how R and Tableau can work together to allow you to gather insights and visualize your findings in a graph.

Abhijit Telang user avatar by
Abhijit Telang
CORE ·
Updated Mar. 01, 19 · Tutorial
Like (5)
Save
Tweet
Share
12.26K Views

Join the DZone community and get the full member experience.

Join For Free

To speak bluntly, when it comes to its visualization capabilities, Tableau, while it appears so promising, astonishingly lacks in its ability to integrate seamlessly with statistical, hypothesis-driven testing. You may be let down constantly if you feel the need to not only visualize but compare your set of observations between groups on hard statistical grounds.

Hence, one must admit that there is still a strong value gap between visualization tools like Tableau, and pure statistical software such as Minitab, SPSS, SAS, and, of course, the humble yet tremendously powerful and open source workhorse, R.

Tables and corresponding computations, at least at the time of writing this piece, are not able to support statistical testing, such as testing for normality, pairwise comparisons, accounting for interactions between variables, linear regression, logistics regression modeling, and, in general, statistical modeling capabilities. As of now, only basic statistical measures (central tendencies and measures of variation) can be computed.

Here's some example data I've been playing with: Tableau As I look at the above shipping costs, a question comes to mind: How do I form a hypothesis about whether shipping cost varies by sub-category or not?

All I can get from Tableau is perhaps a box plot to visually compare costs. Is that enough? Of course not.

What I would like to do is to find the average or mean shipping cost for each category or sub-category and then form a simple yes/no hypothesis.

Sadly, Tableau falls silent on this question. Maybe we can try different types of charts or compare data visually/manually across or down a table.

Typical Tableau users may find themselves constrained if they have to conduct such simple statistical comparisons, but there is a way out of this through Tableau's scripting interface.

For this discussion, I am using the popular (and great) open source statistical computing environment, R. 

If you are not familiar with R (R users can skip to the next section directly), here is the link to learn about, install, and get going with R: https://www.r-project.org/.

  • Install the R runtime environment depending on your OS: https://cran.r-project.org/bin/windows/base/

  • Install an Rserve package that will install and run the server component, which can listen to your requests about finding a way out of this chasm between mere visualization and hard statistical testing.

  • Type library (Rserve)

  • Start Rserve by issuing the following command: 

    • Rserve(debug = FALSE, args = NULL, quote=(length(args) > 1))  

So, coming back to answering the question on multiple, simultaneous comparisons on average shipping cost between categories.

The test you can use is called the Tukey test. What you need is:

  • a script in R that can run just fine on its own, and then...

  • retool that script so it can be invoked from within the Tableau Script interface (which, considering its limitations on return types supported and the clumsy format adjustments, can take some time). 

But here is a screenshot: Image title

The approach I have taken is to simply define a boolean calculated variable that takes True or False, which is sufficient to indicate whether the R script for invoking the Tukey Test has successfully returned or not. 

Now, given the poor support for data format conversions between R and Tableau, a safer and cleaner approach would be to have the script save the results as a delimited file (typically CSV) and then either import or paste the contents back into a Tableau environment. 

This is what I have chosen to do, and the results look like this:

Tableau

The key from now on is to have some way of recognizing the significant pairs visually, and that's why these results are back in a Tableau environment. For this, we will use the magnitude of adjusted p-values.  

This is where transformation functions become relevant. Taking a negative logarithm of p-values for each comparison is one approach in which smaller p-values get amplified and can be used to segment the differences into binary classifications.

From this point on, it is fairly straightforward. You can apply the transformed values to create a data visualization as shown below:

Tableau

It should not be difficult now to weed out the insignificant sub-category pairs as far as the difference between average shipping costs is concerned.

Reference: I have used the Global superstore sales data which can be found here.

R (programming language) Data visualization

Opinions expressed by DZone contributors are their own.

Trending

  • Tomorrow’s Cloud Today: Unpacking the Future of Cloud Computing
  • Exploring the Capabilities of eBPF
  • Java Concurrency: Condition
  • Grow Your Skills With Low-Code Automation Tools

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com

Let's be friends: