DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Last call! Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • Unleashing the Power of Gemini With LlamaIndex
  • Machine Learning With Python: Data Preprocessing Techniques
  • Data Analytics Using Python
  • LSTM Single Variate Implementation Approach: Forecasting

Trending

  • Revolutionizing Financial Monitoring: Building a Team Dashboard With OpenObserve
  • My LLM Journey as a Software Engineer Exploring a New Domain
  • Concourse CI/CD Pipeline: Webhook Triggers
  • Breaking Bottlenecks: Applying the Theory of Constraints to Software Development
  1. DZone
  2. Data Engineering
  3. Data
  4. What, When, and How of Scatterplot Matrix in Python - Data Analytics

What, When, and How of Scatterplot Matrix in Python - Data Analytics

In this post, you will learn about some of the following in relation to scatterplot matrix.

By 
Ajitesh Kumar user avatar
Ajitesh Kumar
·
Sep. 10, 20 · Tutorial
Likes (4)
Comment
Save
Tweet
Share
22.5K Views

Join the DZone community and get the full member experience.

Join For Free

In this post, you will learn about some of the following in relation to scatterplot matrix. Note that scatter plot matrix can also be termed as pairplot. Later in this post, you would find Python code example in relation to using scatterplot matrix/pairplot (seaborn package).

  • What is scatterplot matrix?
  • When to use scatterplot matrix/pairplot?
  • How to use scatterplot matrix in Python?

What Is Scatterplot Matrix?

Scatter plot matrix is a matrix (or grid) of scatter plots where each scatter plot in the grid is created between different combinations of variables. In other words, scatter plot matrix represents bi-variate or pairwise relationship between different combinations of variables while laying them in grid form. Here is a sample scatter plot matrix created using Sklearn Iris dataset.


Fig 1. Scatter plot matrix/pairplot for Sklearn Iris Dataset


Scatter plot matrix is also referred to as pair plot as it consists of scatter plots of different variables combined in pairs. In above matrix of scatter plots, pay attention to some of the following:

  • Diagonally from top left to right, the plots represent univariate distribution of data for the variable in that column.
  • Other plots represent the pairwise scatter plots between sepal length and petal length.

Here is another representation of pair plots comprising three different variables.


Fig 2. Pairwise relationships between three different variables in SKlearn IRIS datasets


When to use Scatterplot Matrix/Pairplot?

Scatterplot matrix can be used when you would like to assess some of the following:

  • Features correlation: Assess pairwise relationships between three or more variables. This is important to understand relationships between different features when building machine learning model
  • Multicollinearity: Assess the collinearity / multi-collinearity by analyzing the correlation between two or more variables. Recall that multi-collinearity can result in two or more predictor variables that might be providing the same information about the response variable thereby leading to unreliable coefficients of the predictor variables (especially for linear models).
  • Data is linearly separable?: Assess whether the data is linearly separable or not. The data which is linearly separable can be separated using a linear line. The data which isn't linearly separable would need to be applied with kernel methods. Thus, it may help determine machine learning algorithm one would want to use.

One can analyse the pairwise relationship at several stages of machine learning model pipeline including some of the following:

  • Data analysis
  • Before and after feature transformations
  • Feature engineering
  • Feature selection

Scatterplot Matrix Python Code Example

In this section, the usage of seaborn package's pairplot method is represented. By default, the pairplot function creates a grid of Axes such that each numeric variable in data is shared in the y-axis across a single row and in the x-axis across a single column. Here is the sample code representing pairplot:

Java
 




x
20


 
1
import pandas as pd
2
import numpy as np
3
import matplotlib.pyplot as plt
4
import seaborn as sns
5
from sklearn import datasets
6
#
7
# Load iris dataset
8
#
9
iris = datasets.load_iris()
10
#
11
# Create dataframe using IRIS dataset
12
#
13
df = pd.DataFrame(iris.data)
14
df.columns = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width']
15
df['class'] = iris.target
16
#
17
# Create pairplot of all the variables with hue set to class
18
#
19
sns.pairplot(df, hue='class')  
20
plt.show()



Fig 3. Scatter plot matrix/pairplot of all variables with hue parameter


Pay attention to the usage of hue parameter which is passed categorical variable and used to map plot aspects to different colors. It is also possible to show a subset of variables or plot different variables on the rows and columns. Usage of vars parameter helps plot only a subset of variables as shown in the code below. The plots in fig1 and fig 2 represents usage of subset of variables for pairplot.

Java
 




xxxxxxxxxx
1


 
1
sns.pairplot(df, hue='class', vars=['sepal_length', 'sepal_width', 'petal_length'])   
2
plt.show()


References

Conclusions

Here are some learning from this post:

  • Use scatter plot matrix or pairplot for assessing pairwise or bi-variate relationship between different predictor variables
  • Use scatter plot matrix or pairplot for analyzing the multicollinearity between predictor variables
  • Use scatter plot matrix or pairplot for assessing whether the data is linearly separable or otherwise.
Matrix (protocol) Data (computing) Python (language) Machine learning Analytics

Published at DZone with permission of Ajitesh Kumar, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Unleashing the Power of Gemini With LlamaIndex
  • Machine Learning With Python: Data Preprocessing Techniques
  • Data Analytics Using Python
  • LSTM Single Variate Implementation Approach: Forecasting

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!