Over a million developers have joined DZone.

Global Warming Caused by Pirates (and Other Bad Graph Lessons)

DZone's Guide to

Global Warming Caused by Pirates (and Other Bad Graph Lessons)

The cause of global warming is the crisis of declining pirate numbers. Don't believe me? Well, it must be true — there's a graph to prove it!

· Big Data Zone ·
Free Resource

The open source HPCC Systems platform is a proven, easy to use solution for managing data at scale. Visit our Easy Guide to learn more about this completely free platform, test drive some code in the online Playground, and get started today.

It's official! The stats are in, the data's been analyzed, and the true cause of global warming has finally been revealed: a worldwide crisis in declining pirate numbers.

Don't believe me? Well, it must be true, because look — there's a graph to prove it:

The problem, as you've no doubt heard before, is that you can prove anything with facts.

The data itself might be irrefutable, and top-end visualizations can bring the key trends and insights to life, but if you display it disingenuously, cherry-pick the bits you like, or mistake simultaneity for causality, then you can use it to seriously mislead people (or even yourself) about the truth. That can be very bad news indeed.

After all, as Lea Gaslowitz explains in this awesome TED talk, we're all used to being sold wildly exaggerated or misleading statements by advertisers and politicians, but most of us presume that stats and graphs don't lie.

This makes it way too easy for companies, media outlets, or other parties with a particular bias to add legitimacy to an otherwise circumspect claim just by whacking a questionable graph next to it.

One of the simplest ways to mislead people about the results without actually interfering with the data is by distorting the scale of a graph to exaggerate or downplay change over time.

Take this graph from UK newspaper The Times, for example:

At a glance, it looks like The Times is getting more than double the number of full-price subscriptions than its competitor, the Daily Telegraph. Look a little closer, though, and you'll spot that the X-axis values start at 420,000, not at 0.

By snipping just the top of the bar chart for display, The Times makes its modest 10% lead look much, much more dramatic than it really is.

Or, at the other end of the scale, take climate change. Now, when it comes to global warming, most scientists agree that a change of just a few degrees would tip us over the edge, meaning that a half a percent rise is a massive deal. When you're dealing with such teeny tiny, sensitive values, that makes a graph like this kinda meaningless:

...as plenty of people were quick to illustrate with similar graphs of their own.

Like this quick-witted response:

Or this tongue-in-cheek gem from Business Week:

Of course, sometimes the reasons a graph is so bad is a lot less sinister... and a lot more silly. Like this printing slip-up from the Washington Post:

Or this hilarious poll from the Winnipeg Sun:

So... what can you do to make sure you don't fall into the same trap? Here are some pointers to keep you on the right side of the data gods.

1. Pay Close Attention to Your Data Sources

If there are errors or inconsistencies in your data, you can't help but end up with misleading graphs. Don't skip the hard graft of checking for accuracy and cleaning, harmonizing, and streamlining your data before you start, and resist the urge to cherry-pick from conflicting data sources until you get what you want. You should have a single version of truth to work with.

2. Choose the Right Type of Data Visualization

Should you use a bar chart? Line graph? Pie chart? Heat map? Or something else entirely? Each of these types of visualization comes with its own set of benefits and limitations, so think carefully about whether it's the right way to present this particular set of information.

3. Remember that Graphs Are Only Part of the Story

All data needs interpretation, even when you have a lovely shiny visualization to help you make sense of it. Make sure you provide as much context, explanation and annotation as possible to make sure that the results aren't misinterpreted — and never use a graph to push a narrative that you know the data doesn't fully support!

Managing data at scale doesn’t have to be hard. Find out how the completely free, open source HPCC Systems platform makes it easier to update, easier to program, easier to integrate data, and easier to manage clusters. Download and get started today.

big data ,data analytics ,graphs ,data visualization

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}