Over a million developers have joined DZone.

A Statistical Problem with "Nothing to Hide"

· Big Data Zone

Learn how you can maximize big data in the cloud with Apache Hadoop. Download this eBook now. Brought to you in partnership with Hortonworks.

One problem with the nothing-to-hide argument is that it assumes innocent people will be exonerated certainly and effortlessly. That is, it assumes that there are no errors, or if there are, they are resolved quickly and easily.

Suppose the probability of a correctly analyzing an email or phone call is not 100% but 99.99%. In other words, there’s one chance in 10,000 of an innocent message being incriminating. Imagine authorities analyzing one message each from 300,000,000 people, roughly the population of the United States. Then around 30,000 innocent people will have some ‘splaining to do. They will have to interrupt their dinner to answer questions from an agent knocking on their door, or maybe they’ll spend a few weeks in custody. If the legal system is 99.99% reliable, then three of them will go to prison.

Now suppose false positives are really rare, one in a million. If you analyze 100 messages from each person rather than just one, you’re approximately back to the scenario above.

Scientists call indiscriminately looking through large amounts of data “a fishing expedition” or “data dredging.” One way to mitigate the problem of massive false positives from data dredging is to demand a hypothesis: before you look through the data, say what you’re hoping to prove and why you think it’s plausible.

The legal analog of a plausible hypothesis is a search warrant. In statistical terms, “probable cause” is a judge’s estimation that the prior probability of a hypothesis is moderately high. Requiring scientists to have a hypothesis and requiring law enforcement to have a search warrant both dramatically reduce the number of false positives.

Hortonworks DataFlow is an integrated platform that makes data ingestion fast, easy, and secure. Download the white paper now.  Brought to you in partnership with Hortonworks

Topics:

Published at DZone with permission of John Cook, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}