DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

How are you handling the data revolution? We want your take on what's real, what's hype, and what's next in the world of data engineering.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

SBOMs are essential to circumventing software supply chain attacks, and they provide visibility into various software components.

Related

  • Profiling Big Datasets With Apache Spark and Deequ
  • Useful Tips and Tricks for Data Scientists
  • Data Analytics Trends To Watch in 2024
  • Simplify Big Data Analytics With AirMettle

Trending

  • Building V1 Gen-AI Products at Scale: Technical Product Patterns That Work
  • Modernizing Apache Spark Applications With GenAI: Migrating From Java to Scala
  • Zero-Trust AI: Applying Cybersecurity Best Practices to AI Model Development
  • Stop Building Monolithic AI Brains, Build a Specialist Team Instead
  1. DZone
  2. Data Engineering
  3. Data
  4. Exploratory and Confirmatory Analysis: What's the Difference?

Exploratory and Confirmatory Analysis: What's the Difference?

Learn about the differences and uses of exploratory data analysis and confirmatory analysis by considering the process a detective goes through.

By 
Shelby Blitz user avatar
Shelby Blitz
·
Nov. 29, 17 · Opinion
Likes (1)
Comment
Save
Tweet
Share
18.8K Views

Join the DZone community and get the full member experience.

Join For Free

How does a detective solve a case? She pulls together all the evidence she has, all the data that's available to her, and she looks for clues and patterns.

At the same time, she takes a good hard look at individual pieces of evidence. What supports her hypothesis? What bucks the trend? Which factors work against her narrative? What questions does she still need to answer... and what does she need to do next in order to answer them?

Then, adding to the mix her wealth of experience and ingrained intuition, she builds a picture of what really took place — and perhaps even predicts what might happen next.

But that's not the end of the story. We don't simply take the detective's word for it that she's solved the crime. We take her findings to a court and make her prove it.

In a nutshell, that's the difference between exploratory and confirmatory analysis.

Data analysis is a broad church, and managing this process successfully involves several rounds of testing, experimenting, hypothesizing, checking, and interrogating both your data and approach.

Putting your case together, and then ripping apart what you think you're certain about to challenge your own assumptions, are both crucial to Business Intelligence.

Before you can do either of these things, however, you have to be sure that you can tell them apart.

What Is Exploratory Data Analysis?

Exploratory data analysis (EDA) is the first part of your data analysis process. There are several important things to do at this stage, but it boils down to this: figuring out what to make of the data, establishing the questions you want to ask and how you're going to frame them, and coming up with the best way to present and manipulate the data you have to draw out those important insights.

That's what it is, but how does it work?

As the name suggests, you're exploring — looking for clues. You're teasing out trends and patterns, as well as deviations from the model, outliers, and unexpected results, using quantitative and visual methods. What you find out now will help you decide the questions to ask, the research areas to explore and, generally, the next steps to take.

Exploratory data analysis involves things like: establishing the data's underlying structure, identifying mistakes and missing data, establishing the key variables, spotting anomalies, checking assumptions and testing hypotheses in relation to a specific model, estimating parameters, establishing confidence intervals and margins of error, and figuring out a "parsimonious model" — i.e. one that you can use to explain the data with the fewest possible predictor variables.

In this way, your exploratory data analysis is your detective work. To make it stick, though, you need confirmatory data analysis.

What Is Confirmatory Data Analysis?

Confirmatory data analysis is the part where you evaluate your evidence using traditional statistical tools such as significance, inference, and confidence.

At this point, you're really challenging your assumptions. A big part of confirmatory data analysis is quantifying things like the extent any deviation from the model you've built could have happened by chance, and at what point you need to start questioning your model.

Confirmatory data analysis involves things like testing hypotheses, producing estimates with a specified level of precision, regression analysis, and variance analysis. In this way, your confirmatory data analysis is where you put your findings and arguments to trial.

Uses of Confirmatory and Exploratory Data Analysis

In reality, exploratory and confirmatory data analyses aren't performed one after another, but continually intertwine to help you create the best possible model for analysis.

Let's take an example of how this might look in practice.

Imagine that in recent months, you'd seen a surge in the number of users canceling their product subscription. You want to find out why this is so that you can tackle the underlying cause and reverse the trend.

This would begin with exploratory data analysis. You'd take all of the data you have on the defectors, as well as on happy customers of your product, and start to sift through looking for clues. After plenty of time spent manipulating the data and looking at it from different angles, you notice that the vast majority of people that defected had signed up during the same month.

On closer investigation, you find out that during the month in question, your marketing team was shifting to a new customer management system and as a result, introductory documentation that you usually send to new customers wasn't always going through. This would have helped to troubleshoot many teething problems that new users face.

Now you have a hypothesis: people are defecting because they didn't get the welcome pack (and the easy solution is to make sure they always get a welcome pack!).

But first, you need to be sure that you were right about this cause. Based on your exploratory data analysis, you now build a new predictive model that allows you to compare defection rates between those that received the welcome pack and those that did not. This is rooted in confirmatory data analysis.

The results show a broad correlation between the two. Bingo! You have your answer.

Exploratory Data Analysis and Big Data

Getting a feel for the data is one thing, but what about when you're dealing with enormous data pools?

After all, there are already so many different ways you can approach exploratory data analysis, by transforming it through nonlinear operators, projecting it into a difference subspace and examining your resulting distribution, or slicing and dicing it along different combinations of dimensions... add sprawling amounts of data into the mix and suddenly the whole "playing detective" element feels a lot more daunting.

The important thing is to ensure that you have the right tech stack in place to cope with this, and to make sure you have access to the data you need in real time.

Two of the best statistical programming packages available for conducting exploratory data analysis are R and S-Plus; R is particularly powerful and easily integrated with many BI platforms. That's the first thing to consider.

The next step is ensuring that your BI platform has a comprehensive set of data connectors, that — crucially — allow data to flow in both directions. This means that you can keep importing Exploratory Data Analysis and models from, for example, R to visualize and interrogate results and also send data back from your BI solution to automatically update your model and results as new information flows into R.

In this way, you not only strengthen your exploratory data analysis, you incorporate confirmatory data analysis, too — covering all your bases of collecting, presenting and testing your evidence to help reach a genuinely insightful conclusion.

Your honor, we rest our case.

Big data Data analysis Exploratory data analysis

Published at DZone with permission of Shelby Blitz, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Profiling Big Datasets With Apache Spark and Deequ
  • Useful Tips and Tricks for Data Scientists
  • Data Analytics Trends To Watch in 2024
  • Simplify Big Data Analytics With AirMettle

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • [email protected]

Let's be friends: