Over a million developers have joined DZone.

The Parable of Google Flu

· Big Data Zone

Learn how you can maximize big data in the cloud with Apache Hadoop. Download this eBook now. Brought to you in partnership with Hortonworks.

In a white paper made publicly available by Harvard University, researchers broach the topic of Google Flu Trends – commonly hailed as an innovative and thorough application of Big Data – and some of its shortcomings. 

It's not an unfamiliar topic: Big Data is a trendy topic right now, and many hail it as the next big thing in public health, marketing, and any field that could benefit from thoughtful, analytics-based insights. But the researchers caution against Big Data hubris:

“Big data hubris” is the often implicit assumption that big data are a substitute for, rather than a supplement to, traditional data collection and analysis. We have asserted that there are enormous scientific possibilities in big data. However, quantity of data does not mean that one can ignore foundational issues of measurement, construct validity and reliability, and dependencies among data. The core challenge is that most big data that have received popular attention are not the output of instruments designed to produce valid and reliable data amenable for scientific analysis.

In the Google Flu Trends example, a model that was intended to predict traditional Center for Disease Control models in tracking the spread of disease during flu season instead way overestimated the spread of the flu. It provides a cautionary tale about what is useful about Big Data and how we should continue to approach it in the future. By digging deep into the algorithms, being transparent about analytic methods, and not relying solely on the size of the data to be panacea for all problems. (This comes on the heels of a lot of criticism of Big Data as being encouraging towards confirmation bias.)

You can read the white paper here.  BBC News also covered the story with a comprehensive piece that looks at the public response.


Hortonworks DataFlow is an integrated platform that makes data ingestion fast, easy, and secure. Download the white paper now.  Brought to you in partnership with Hortonworks

Topics:

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}