Over a million developers have joined DZone.

The Hidden Biases of Big Data

· Big Data Zone

Learn how you can maximize big data in the cloud with Apache Hadoop. Download this eBook now. Brought to you in partnership with Hortonworks.

Big Data is the thing to talk about. It has a relatively low barrier to entry for discussion and represents an incredible opportunity for a diverse array of people – data scientists, businessmen, teachers, journalists and more – to benefit from the lessons it can provide. But beneath all of the hype and excitement lay some very real pitfalls.

Along the same lines as the Scientific American's caution against following the Big Data hype comes an article from the Harvard Business Review, which uses concrete examples to dismantle the pervasive idea that Big Data is the savior of proactive problem-solving.

The hype becomes problematic when it leads to what I call “data fundamentalism,” the notion that correlation always indicates causation, and that massive data sets and predictive analytics always reflect objective truth.

The Review uses the example of data gathering from social media during Hurricane Sandy – data indicated a high concentration of tweets and other mobile activity in and around New York City that helped draw conclusions like a spike in grocery store visits on the eve of the storm. But data was not representative of the areas hit hardest by the storm.

Data and data sets are not objective; they are creations of human design. We give numbers their voice, draw inferences from them, and define their meaning through our interpretations. Hidden biases in both the collection and analysis stages present considerable risks, and are as important to the big-data equation as the numbers themselves.

One of the primary dangers in misinterpreting Big Data is a misallocation of public resources, the Review says.

While massive data sets may feel very abstract, they are intricately linked to physical place and human culture. And places, like people, have their own individual character and grain.

While it may be easy to draw surprising conclusions when digging into Big Data, the numbers aren't always necessarily right.

Hortonworks DataFlow is an integrated platform that makes data ingestion fast, easy, and secure. Download the white paper now.  Brought to you in partnership with Hortonworks


Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}