Over a million developers have joined DZone.

Big Data’s Industrial Problems of Pollution, Waste and Leakage

· Big Data Zone

Hortonworks DataFlow is an integrated platform that makes data ingestion fast, easy, and secure. Download the white paper now.  Brought to you in partnership with Hortonworks

[This post was originally written by Jeanne Roue-Taylor]

Big Data pollutionFor centuries now, we’ve created goods through mass production, using machines, assembly lines and ever-larger ways to melt, mix, cut, stamp, rivet and paste. We’ve gotten used to using many materials to create something of higher value, with the waste being given off as liquid, gases and material headed for a landfill, or worse, put in containers to end up in a mountain in Nevada.

Why would we expect ‘clean’ technology — information — to be any different?

Strangely enough, the same thing is happening with data, and the more data we have, really Big Data, the more the unintended side effects are felt from data leakage, outright pollution, and data waste. They’re just electrons, you think? They’re easily secured or destroyed, you say? It isn’t quite that simple.

Google and others love our data waste

Search data is a great example of our data waste. We search on any term we desire, blissfully ignorant of the implications of giving detailed insight into our private matters that is being stored as our “search flotsam.” This data is amazingly good at defining our interests, thus our propensities, far more than we realize.

The NSA sniffs our data pollution

NSA-cellphone-trackingWhen we walk around with our smartphones in our pockets, the data leakage of every cell phone tower check-in is highly interesting to governments (and others) who’d like to understand exactly where we go and when, and who else is at that same spot. We’re leaking data don’t realize that gives up more privacy than we’d ever imagine.

Businesses leak data

And just like a poorly-designed industrial process, businesses leak data through their employees habits, their poor data hygiene and outright fraud and corporate espionage conducted against unsecured systems. Some companies are data hemorrhaging without realizing it until it makes the news. Many more are on the verge of losing data they’ll never realize is gone.

So what now?

In the first century of the industrial revolution, we gave industrialists a free hand with polluting our world because we craved the benefits and didn’t feel the effects right away. Acid rain, dead rivers and lakes and sick humans changed our minds and we got tougher about pollution. There will come a point in the future, who’s to say how far off, where we take the same approach with data that we took to pollution…we’ll realize that the data byproducts of our modern life have an impact on our safety, our privacy and possibly our quality of life.

At that point, we’ll then work to contain the damage and we’ll find out what’s repairable and not. We’ll be surprised about what can’t be put back in the bottle and we’ll make choices that won’t be driven simply by convenience and up-front cost.

Hortonworks Sandbox is a personal, portable Apache Hadoop® environment that comes with dozens of interactive Hadoop and it's ecosystem tutorials and the most exciting developments from the latest HDP distribution, brought to you in partnership with Hortonworks.


Published at DZone with permission of Christopher Taylor, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}