DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Because the DevOps movement has redefined engineering responsibilities, SREs now have to become stewards of observability strategy.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Related

  • How To Create a Resource Chart in JavaScript
  • Creating a Polar Chart to Measure Electromagnetic Field Strength
  • JS Animated Radar Chart With TypeScript and NodeJS
  • Create a Millisecond-Precision Time Ticks Chart with NodeJS

Trending

  • Code Reviews: Building an AI-Powered GitHub Integration
  • Efficient API Communication With Spring WebClient
  • How To Introduce a New API Quickly Using Quarkus and ChatGPT
  • Introducing Graph Concepts in Java With Eclipse JNoSQL
  1. DZone
  2. Data Engineering
  3. Data
  4. How to Use Charts to Display Data Relationships

How to Use Charts to Display Data Relationships

In this article, take a look at how to use charts to display data relationships.

By 
Mustapha Mekhatria user avatar
Mustapha Mekhatria
·
Updated Jan. 06, 20 · Presentation
Likes (2)
Comment
Save
Tweet
Share
17.7K Views

Join the DZone community and get the full member experience.

Join For Free

Man in suite working on laptop

How to Use Charts to Display Data Relationships

Have you ever found yourself in life, business or academia, explaining relationships or correlations between data? Chances are, you resorted to a visual aid such as a chart made on your computer or drawn on the back of a napkin when trying to get others to grasp what you are trying to convey.

When doing so, it is important to understand that a chart is always subjective. When you try to visualize data, you are — sometimes unconsciously — deciding what information to highlight and obscure, as various chart types “force” the data into specific data-stories. The charts you use may also exclude certain audiences if they require specialized knowledge (e.g., some charts used in science or finance).

So picking the right chart is actually important if you want to communicate clearly. Once you become aware of your charting options, the appropriate choices often become apparent. Here is a quick guide to help you pick the right chart when your goal is to illustrate relationships between data.

The article is divided into three main sections according to the different types of data sets to visualize:

  1. Categorical data sets
  2. Continuous data sets
  3. Mixed data sets (categorical and continuous)

1. Categorical Datasets

Bar and Column Chart

Bar, stacked bar, column, and stacked column charts are commonly used to visualize relationships between categorical data sets.

The following stacked bar chart shows the top 10 female-dominated professions in Sweden in 2017.

The chart shows the relationship between two categorical data: the data on the x-axis represents the professions, and the data on the y-axis represents the percentage of women vs. men in each profession.

The stacked bar chart can be easily adapted to suit the color-blind community (see below), by using a monochrome chart:

Another way to visualize the data sets above is by using a bar chart, where we use the number of men vs women employed in each profession instead of their ratio (percentage).

Take a look at these two charts. The underlying data are the same, but the story they tell (the data they make easy to compare) is different. Among other things, the Stacked Bar chart doesn’t tell us about the size of the universe, i.e., the number of people in each profession. It could lead us to believe that all of these professions are well-staffed.

The Bar Chart makes it clear that there are more “Home care nurses” than “Accountants” in the job market, and that there are also more male Home Care Nurses than there are male Accountants in the job market. The question to ask when choosing your chart is what point are you trying to make?

Network Chart

The network chart is popular for visualizing complex relationships and connections that involve categorical data.

The chart below shows relationships (domestic connections) between different airports in South Korea. The links connect the airports, and the bubbles with colors refer to the number of destinations (domestic and international), ranked by size into three groups.

  • Red bubble: airports with more than 50 direct destinations
  • Yellow bubble: airports with more than 10 and less than 50 direct destinations
  • Green bubble: airports with less than 10 direct destinations

Maps

Geographical maps are used to display categorical as well as continuous data.

The chart below shows the same previous network chart (categorical data) on a map:

Maps offer the possibility to view the connections on a real geographic area, where the users can compare actual distances between each point. Nevertheless, from the accessibility perspective, it is not an optimal visualization since the colors that distinguish the number of international connections for each city could be a big challenge for the colorblind community. One way to solve the issue is by combining the network chart and the map chart to produce a sort of hybrid chart (see chart below)

None colorblind and colorblind alike can read the map thanks to the following features:

  • Big size red circles: for airports with more than 50 direct destinations.
  • Medium size yellow circles: for airports with more than 10, but less than 50 direct destinations.
  • Small size green circles: for airports with less than 10 direct destinations.
  • Solid lines: for Jeju flight routes.
  • Dot lines: for Busan flight routes.
  • Dash lines: for Seoul flight routes.

Now you might ask: Which map to choose? Well, it depends on your audience, and the details/information you would like highlights. The first map is simple and easy to read since the focus is mainly on the domestic connections between the different Korean airports; this map is not convenient for the colorblind community.

The second map is more complex and details oriented. Every element on the map is highlighted, such as the airport’s size by destination using different colors/size, and the domestic flight for the main airport using different dash styles; this map suits a broad audience (colorblind and none colorblind).

2. Both Data Sets Are Continuous

Scatter Chart

A scatter chart is commonly used to display correlations between data.

This example illustrates the relationship between the height and weight of athletes competing in the 2012 Summer Olympics.

Height and weight displayed respectively along the x-axis and y-axis, are continuous data. The red dot represents the average height and weight of all athletes. When the height increases, the weight increases as well; the pattern looks pretty consistent with some outliers (it can be fun to hypothesize about what sports the outliers compete in!) In statistics, this is called a positive correlation, and it is easily determined with a quick glance at this chart. However, note that in some cases identifying outliers may be the purpose of the analysis.

This chart type also allows you to identify data that doesn’t “fit” easily. A scatter chart comes with some disadvantages, such as novice could have some issues to get insight from a scatter chart, as it is rarely found outside of scientific materials. Also, keep in mind that an overplotted scatter chart could make patterns and relationships hard to detect, so be sure to not overplot the chart to the point where the insights get lost. In this case, like all others, make sure your charts are aligned with your audience’s level of proficiency. 

3. Data Sets Are a Mix (Categorical and Continuous)

Box Plot

The Box plot offers a practical way to visualize complex datasets in a compact form. It is also colorblind-friendly, as it is monochrome by default. While it allows for easy identification of outliers, like the Scatter Plot, the Box plot does a better job dealing with categorical data. A Box Plot visualizes useful statistical information such as the maximum/minimum values, the 1st, and the 3rd quartile, and the median (see demo below)

The Box Plot demo below shows the EU road accident fatalities by category in 2017. Whereas a bar chart may have shown only the average across the EU countries, with a box plot, we can see that there is great variation between countries. (In case you are curious, the country with the most passenger car fatalities, is Croatia, with 45 fatalities per million, and the county with the least, is Switzerland with 9.5 fatalities per million :) )

The box plot above visualizes the data without indicating including extreme values/outliers. (In case you are curious: The “Min” and “Max” value in the box plot is actually a calculated value, not the absolute highest or lowest value. In statistics, outliers are data points that are above Q3 + 1.5.IQR, or lower than Q1 - 1.5.IQR; where IQR is the interquartile range or Q3-Q1.) Outliers could be a good way to detect errors either during the data editing or during the data collecting process; outliers could also point out patterns.

Below is a boxplot with the same EU road accident data, but with outliers explicitly highlighted. The upper outliers (above the max limits) are mainly recorded in eastern European countries, Iceland, and Portugal. To check if it is a pattern or just a coincidence, we had to go back to the data and do further analyses. It turns out that the countries that appear as upper outliers in one category also have the highest fatalities recorded in other categories as well. By displaying outliers, we found a pattern that helps us to get a better insight into the data.

It is easy to see that Busses and Coaches are relatively safe compared to Passenger cars, where we find the highest number of fatalities.

Now you have a good idea of how to visualize data relationships depending on the nature of the data.

I hope you enjoyed this article as much as I did.

Feel free to share your favorite charts to display data relationships in the comment section.

Further Reading

Spring Data JPA and Database Relationships

Database Design Best Practices

Chart Data (computing)

Opinions expressed by DZone contributors are their own.

Related

  • How To Create a Resource Chart in JavaScript
  • Creating a Polar Chart to Measure Electromagnetic Field Strength
  • JS Animated Radar Chart With TypeScript and NodeJS
  • Create a Millisecond-Precision Time Ticks Chart with NodeJS

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!