Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

How to Choose the Best Chart for Your Data

DZone's Guide to

How to Choose the Best Chart for Your Data

One struggle with data visualization is that there are so many different chart types. How do know which one is going to bring the most meaning to your data?

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

The amount of data at our fingers tips today can be overwhelming. How do you know what really matters? Data visualizations can help, but how can you set up your data to best visualize it? What chart will help you analyze and digest the data into actionable insights?

One struggle is that there are so many different chart types. How do know which one is going to bring the most meaning to your data? Are you measuring performance? Do you have one or many variables? Is your data time-based? Is it geospatial? How many data points do you have? Are you comparing different categories of data?

Kinds of Charts

There are seven different types of relationships that charts are typically used to display.

1. Nominal Comparison

Displays a series of discrete quantitative values so they can be easily seen and compared.

2. Time Series

Shows quantitative values that are associated with categorical subdivisions of time. Will help you to view trends over time in a sequential, chronological order.

3. Ranking

Shows individual quantitative values associated with a set of categorical subdivisions related to each other sequentially by size. Data is displayed from highest to lowest or lowest to highest.

4. Part to Whole

Shows individual quantitative values associated with a set of categorical subdivisions related to the complete set of values (and each other). The common unit of measurement is a percentage.

5. Deviation

The relationship of one or more sets of quantitative values differs from a primary set of values. The units of measure are actual units, ratios relative to primary value, or positive or negative ratios.

6. Distribution

Shows how a set of quantitative values are distributed across its entire range, from lowest to highest.

7. Correlation

Display whether two paired sets of quantitative values vary in relation to each other. The correlation will show the direction and the degree: high or low, positive or negative.

Choosing Your Chart

When choosing a chart type, is it best to keep four communication methods in mind.

1. Do You Want to Communicate a Composition of Your Data?

A composition chart is designed to show different parts of information that make up a whole, such as your total sales broken down by product line or sales rep.

Charts that best communicate a composition are:

  • Stacked column/bar: Changing over time; few periods; relative and absolute differences matter.
  • Stacked 100% column/bar: Changing over time; few periods; only relative differences matter.
  • Stacked area: Changing over time; many periods; relative and absolute differences matter.
  • 100% stacked area: Changing over time, many periods; only relative differences matter.
  • Pie: Static; simply show the share of a total.
  • Waterfall: Static; accumulation or subtraction to the total.

2. Do You Want to Communicate a Relationship Between Two Datasets?

A relationship will show a correlation between two or more variables through the data you pull together. This can be used to show either a positive or a negative effect that the given variables have on each other.

Charts that best communicate a relationship are:

  • Scatter: Two variables.
  • Bubble: Three or more variables.
  • Line: Two variables.

3. Do You Want to Communicate the Distribution of Your Data?

A distribution chart is used to show the behavior of certain variable over time to help identify any outliers, normal tendencies, and a range of your information and trends.

Charts that best communicate a distribution are:

  • Scatter: Two variables.
  • Line: Single variable; many data points.
  • Column: Single variable; few data points.
  • Bar: Single variable; few data points.

4. Do You Want to Compare Different Variables of Your Data?

A comparison chart is used for comparing one or more sets of data where you can easily show the minimum and maximum values of that set. This chart tries to fix each set of variables from the others and displays how those variables compare. For example:

  • Column: Among items; one variable per item; few categories; many items.
  • Column: Among items; one variable per item; few categories; few items.
  • Column: Over time; few periods; single or few categories.
  • Circular area: Over time; many periods; cyclical data.
  • Line: Over time; many periods; non-cyclical data.
  • Line: Over time; few periods; many categories.

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

Topics:
big data ,data analytics ,data visualization ,tutorial

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}