DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
  1. DZone
  2. Data Engineering
  3. Data
  4. Data Visualization vs. Information Management

Data Visualization vs. Information Management

It's not enough to have an aesthetically appealing data visualization. You must also show the crucial elements of the data.

deepsense.io Blog user avatar by
deepsense.io Blog
·
Jul. 28, 15 · Tutorial
Like (0)
Save
Tweet
Share
2.52K Views

Join the DZone community and get the full member experience.

Join For Free

Yesterday I had a presentation "Data Visualization vs. Information Management." The core of the presentation were two examples, which I present below. The punch line comes down to a simple statement: It is not enough to present data graphically, its presentation must show the crucial elements of the data. Good communication between people making decisions and people processing the data is a necessary condition for determination what is important and what’s not.

As the first example I used the history of Space Shuttle Challenger disaster which occurred in 1986. The direct cause of the disaster was low elasticity of O-rings (insulation rings) resulting from low temperature. To cut the long story short, the weather on the day of the shuttle launch was too cold for the O-rings to function properly.

But there were tests examining O-rings damage were conducted before the launch. So why engineers allow for the start despite bad temperature condition? Because the data presentation (=data understanding) was bad.

The data was presented graphically on the following diagram. Silhouettes of rockets present temperatures of the tests and scope of damage.

image1

This data visualization may be considered esthetic and interesting, yet it has got one considerable defect.

It does not indicate the most important facts. Edward Tufte in his book displayed the same data in a different manner. He marked damage as a function of temperatures. The diagram presented below clearly shows that the higher the temperature, the greater the damage. On the day of the launch the temperature was several degrees lower than in the test conditions (30F, that is below 0C). Although Tufte’s diagram does not have nice pictures of rockets like the previous one, it emphatically tells us that the shuttle should not have been launched at such low temperature.

image2
(source: Representing Industry Information Using Graphs)

Data presentation is not about presentation alone; it is about presentation of the crucial facts. Data needs to be interpreted before it can be displayed correctly.

Let us now focus on another example. Now we will examine data from voting intention polls conducted prior to the presidential election in 2015. The tabular presentation of the data looks in the following way:

Can you read anything from that table? Majority of normal people reacts with headache to such thick rows of numbers. The rest notices that support for Bronisław Komorowski in surveys drops while for Andrzej Duda it grows. But can we observe the pace or nature of changes? Was there any turning point or maybe the changes were systematic? No one can tell.

Let us display this data. While the table is objective and it does not enforce any interpretation, every diagram imposes some interpretation. Firstly, we will select the data concerning only two candidates and we will present the support as a function of time.

All right, we can see a cloud of dots. We can see that the blue dots are going higher and higher and the orange dots are going lower.

Let us add a trend and let us to this separately –before and after the first round of elections. The type of trend we add is our subjective choice. In this case we choose a linear function.

We can read more and more information. The rate of the decline of support for BK is greater than the increase of the support for AD. As we know, Paweł Kukiz gained from that difference. The trend line allows us to notice that individual surveys are evenly distributed around the linear trend. It also lets us see that even extrapolation of the surveys’ results would not reveal a truthful forecast of the results of the first round of elections. A day before the first round the surveys showed 10% advantage of BK. Very far from the actual results of the first round (it’s the AD who won the first round).

It wasn’t the result of one random poll but result of many individual polls. Still results after first round were far from pools.

Answers usually provoke further questions. In case of these results we could ask how would these trends look like if we based our examination on only several opinion poll centers?
Maybe the remaining centers made some mistakes in calculations?

Here we might do with an interactive application allowing us not only to become familiar with the results but also to explore them.
Yet which options should we choose? It is yet another subjective choice.

In this way we moved from a table of numbers to an interactive application. However, during that switch we had to make several decisions which enforced a certain interpretation of the chart. Good data visualization is always an interpretation of numbers.

Let us go back to the topic of communication in companies. It is usually the case that a person who wants to make some decision on the basis of data (product manager, director, management board) orders data extraction or preparation of data visualization from some other person (analyst, statistician, data scientist etc.). The better he indicates what and what for he wants to extract the data, the greater the chance that the presentation of the results reveals the important facts instead of getting struck on a sandbar.

Next week you will learn how to make the diagrams presented above in R and ggplot2.


Przemek Biecek

Data science Data visualization Visualization (graphics) Information management Database

Published at DZone with permission of deepsense.io Blog, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • The Role of Data Governance in Data Strategy: Part II
  • What Was the Question Again, ChatGPT?
  • The Importance of Delegation in Management Teams
  • Taming Cloud Costs With Infracost

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: