Over a million developers have joined DZone.

Moving Beyond Data Visualization to Data Applications

DZone's Guide to

Moving Beyond Data Visualization to Data Applications

Marty McFly's father was right: when you put your mind to it, you can accomplish anything. In this article, learn how to go beyond just plain data visualization.

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

One thing we love doing at Exaptive, aside from creating tools that facilitate innovation, is hiring intelligent, creative, and compassionate people to fill our ranks. Frank Evans is one of our data scientists. He was invited to present at the TEDxOU event on January 26, 2018.

Frank gave a great talk about how to go beyond data visualization. We wanted to share the verbatim of his script.

Exaptive data scientist Frank Evans speaks about using data science for discovery at the TEDxOU event in Norman, OK, on January 26, 2018. Photo by Jill Macchiaverna.

What do you see? [On the projector, Frank showed an image of a tiger camouflaged by long grass.] More than a million pixels on the screen, everything a mix of orange and brown, and in a moment, your brain constructs the information it needs. A million dollars worth of computer technology and every computer vision machine learning trick invented can't reliably and as quickly do what your brain does before you even realize it. It has taken in the individual pixels, recognized the dozens of patterns simultaneously, and constructed a mental, abstract model that says in no uncertain terms: danger, run.

This is what our brains do, we take individual facts and create knowledge from it we can use. I'm a data scientist, which is a fancy buzzword for someone that uses computer tricks to do the same thing: take a collection of individual facts and weave them together into knowledge. These "collections of individual facts" are just data.

The buzzword-de-vive around this particular object is "dashboard." The visualizations are largely unchanged over the last 75 years, even if the artistic quality has gotten markedly better. Dashboards are seen not only as the pinnacle of communicating "knowledge" to consumers, but also its terminus. This is how we know we are done. Data analysts see this as the end product, the thing that lets the user answer the question.

Instead of building a visualization to answer a single question, we should build worlds that are explorable to answer any question. Let a user look around, browse with or without a specific purpose, focus in on what catches their eye. I want someone to interact with data not as a consumer like it is a TV show, but as a collaborator — contributing knowledge, experience, and wisdom into something that will be greater for their interaction. I don't want to stop at answering a question — I want to let them formulate new questions they may not have even known they were capable of asking.

True to good buzzword parlance, this is where data visualizations become data applications. An application is anything but static. A good data application should let you walk around in a world, "pick up" some data like an object, look it over, and perhaps discover something new. It should let you see something familiar in a completely different light, or maybe something different in a completely familiar light.

In my opinion, it's appropriate that these are called "dashboards." Like the dashboard in your car, you glance at it from time to time to make sure everything is okay and to answer specific questions: How fast am I going? Is the engine running at the right speed? Do I have enough gas? Are the oil and water... both still there? But they aren't the purpose of the car — the purpose of the car is to go somewhere. You drive not with your eyes affixed to the dashboard but out the windshield, seeing the world around you. I am not concerned with building a better dashboard, I want to make a windshield.

Making knowledge creation active and interactive opens up two types of opportunities: the power of exploration and the power of collaboration.

I want to start with the relatively obscure: neurophysiological medical research related to post-traumatic stress disorder. The primary method of publication, consumption, and interaction with medical research findings, in general, is largely unchanged over the past several decades: experiments of some kind are conducted, the findings and associated context are written into an article, and that article is published in a peer-reviewed journal to be read by other scientists. At their core, these findings and their context are a collection of facts — ripe for a better way to be explored.

A given article contains several fact patterns: this particular gene is associated with this particular protein, this hormone when in abundance prevents the expression of this other protein. Taking hundreds of these articles and encoding the fact patterns of their findings creates a very rich data set, including the necessary metadata: this experiment was conducted on 100 subjects that were all mice, this other was conducted on 12 subjects that were human and has been replicated three times.

At this point, it would be easy to summarize the data, look for a few interesting patterns, and create some data visualizations that describe the data from a few interesting perspectives. But to do so would belie what could truly be possible if the right person could explore the underlying knowledge base of these hundreds of medical research articles all at once.

We built a data application that starts from simple visualizations and then allows the user to explore the underlying knowledge base — defining what is important to them. Looking not only and what is there, but what is missing and should be investigated. Not only what is connected directly, but what is associated indirectly within pathways through other concepts. While the species of the studies may be of primary importance to one person, the number of subjects and replicability may be the focus of another.

The ability to explore and interrogate a world created from the data unshackles the researcher from a given perspective or question. Pet theories, hunches, harebrained schemes, and vague curiosities can be explored vividly, quickly, easily. Asking questions becomes so open and easy to discover, you can ask anything you want. Aimless wandering can yield the same capability that would otherwise require relentless focus.

Making knowledge interactive isn't only about exploration, it is also about collaboration. For inspiration, I want to turn to what I think is the most successful collaborative data application in history: Wikipedia.

Wikipedia may be the world's easiest data visualization to conceive, it's just text you read like a book. But the power of Wikipedia as a data application is in its ability to facilitate effective collaboration. The underlying technology is just a method for multiple people to build and edit documents. Perhaps it's telling that the first major thing we did with it was make one huge document to catalog our collective knowledge.

Now, Wikipedia on its own is impressive enough and there are great visualizations that have been put together about how if Wikipedia were printed out like a regular encyclopedia it would be the size of a small motel. But I want to put Wikipedia in its entirety to the side and focus instead on one tiny corner: the part of Wikipedia related to Star Trek.

I want to compare it to the first edition of the Encyclopedia Britannica which composed of just shy of 2400 pages. Envisioned as a "dictionary of human knowledge," the initial edition of Britannica took years of combing together many sources of science, history, and literature. The part of Wikipedia currently related to Star Trek is roughly 12% larger than the whole first edition of the Encyclopedia Britannica.

I can feel you right now saying to yourself how depressing that is, what a massive waste of our potential and attention. But I'm not going to let this story off the hook that easily. I want to explore what that really means. Wikipedia and its underlying technology as a data application made collaboration around collecting and organizing information so much more efficient and effective that we built a knowledge base larger than the collected organized knowledge of the world a few generations before us, all about a relatively trivial television show. And we did it together, in our spare time, for free. That is not depressing — that is inspiring.

Whether or not you are willing to grant me the decisively generous leeway needed to refer to the Star Trek Wikipedia portal as a "grand challenge of humanity," I will unapologetically label as such the underlying ability for people to cognitively collaborate in such an efficient and effective manner. Creating knowledge from fact is a not a product to be consumed, but a project to be contributed to.

Tackling everything from the grand challenges of humanity to the daily challenges of modernity will require us to systematically create knowledge from facts. And to do so effectively will require exploration, interaction, and collaboration. Luckily, these are things people are naturally good at.

Humanity is curious — we love to explore and learn something new. Humanity is social — we love to work together and help each other. We love to overcome a challenge and solve problems big and small, from cataloging exactly how many red shirts were killed in the original series (43), to exactly how the human mind works.

Marty McFly's father was right: when you put your mind to it, you can accomplish anything. When we can collectively put our minds to something, we can accomplish everything. Humanity is ambitious in all endeavors. Let's run with that.

Thank you.

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

big data ,data visualization ,data analytics

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}