Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

The Parable of Google Flu

DZone's Guide to

The Parable of Google Flu

· Big Data Zone ·
Free Resource

The open source HPCC Systems platform is a proven, easy to use solution for managing data at scale. Visit our Easy Guide to learn more about this completely free platform, test drive some code in the online Playground, and get started today.

In a white paper made publicly available by Harvard University, researchers broach the topic of Google Flu Trends – commonly hailed as an innovative and thorough application of Big Data – and some of its shortcomings. 

It's not an unfamiliar topic: Big Data is a trendy topic right now, and many hail it as the next big thing in public health, marketing, and any field that could benefit from thoughtful, analytics-based insights. But the researchers caution against Big Data hubris:

“Big data hubris” is the often implicit assumption that big data are a substitute for, rather than a supplement to, traditional data collection and analysis. We have asserted that there are enormous scientific possibilities in big data. However, quantity of data does not mean that one can ignore foundational issues of measurement, construct validity and reliability, and dependencies among data. The core challenge is that most big data that have received popular attention are not the output of instruments designed to produce valid and reliable data amenable for scientific analysis.

In the Google Flu Trends example, a model that was intended to predict traditional Center for Disease Control models in tracking the spread of disease during flu season instead way overestimated the spread of the flu. It provides a cautionary tale about what is useful about Big Data and how we should continue to approach it in the future. By digging deep into the algorithms, being transparent about analytic methods, and not relying solely on the size of the data to be panacea for all problems. (This comes on the heels of a lot of criticism of Big Data as being encouraging towards confirmation bias.)

You can read the white paper here.  BBC News also covered the story with a comprehensive piece that looks at the public response.


Managing data at scale doesn’t have to be hard. Find out how the completely free, open source HPCC Systems platform makes it easier to update, easier to program, easier to integrate data, and easier to manage clusters. Download and get started today.

Topics:

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}