In a white paper made publicly available by Harvard University, researchers broach the topic of Google Flu Trends – commonly hailed as an innovative and thorough application of Big Data – and some of its shortcomings.
It's not an unfamiliar topic: Big Data is a trendy topic right now, and many hail it as the next big thing in public health, marketing, and any field that could benefit from thoughtful, analytics-based insights. But the researchers caution against Big Data hubris:
“Big data hubris” is the often implicit assumption that big data are a substitute for, rather than a supplement to, traditional data collection and analysis. We have asserted that there are enormous scientific possibilities in big data. However, quantity of data does not mean that one can ignore foundational issues of measurement, construct validity and reliability, and dependencies among data. The core challenge is that most big data that have received popular attention are not the output of instruments designed to produce valid and reliable data amenable for scientific analysis.
In the Google Flu Trends example, a model that was intended to predict traditional Center for Disease Control models in tracking the spread of disease during flu season instead way overestimated the spread of the flu. It provides a cautionary tale about what is useful about Big Data and how we should continue to approach it in the future. By digging deep into the algorithms, being transparent about analytic methods, and not relying solely on the size of the data to be panacea for all problems. (This comes on the heels of a lot of criticism of Big Data as being encouraging towards confirmation bias.)