Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Big Data’s Industrial Problems of Pollution, Waste and Leakage

DZone's Guide to

Big Data’s Industrial Problems of Pollution, Waste and Leakage

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

[This post was originally written by Jeanne Roue-Taylor]

Big Data pollutionFor centuries now, we’ve created goods through mass production, using machines, assembly lines and ever-larger ways to melt, mix, cut, stamp, rivet and paste. We’ve gotten used to using many materials to create something of higher value, with the waste being given off as liquid, gases and material headed for a landfill, or worse, put in containers to end up in a mountain in Nevada.

Why would we expect ‘clean’ technology — information — to be any different?

Strangely enough, the same thing is happening with data, and the more data we have, really Big Data, the more the unintended side effects are felt from data leakage, outright pollution, and data waste. They’re just electrons, you think? They’re easily secured or destroyed, you say? It isn’t quite that simple.

Google and others love our data waste

Search data is a great example of our data waste. We search on any term we desire, blissfully ignorant of the implications of giving detailed insight into our private matters that is being stored as our “search flotsam.” This data is amazingly good at defining our interests, thus our propensities, far more than we realize.

The NSA sniffs our data pollution

NSA-cellphone-trackingWhen we walk around with our smartphones in our pockets, the data leakage of every cell phone tower check-in is highly interesting to governments (and others) who’d like to understand exactly where we go and when, and who else is at that same spot. We’re leaking data don’t realize that gives up more privacy than we’d ever imagine.

Businesses leak data

And just like a poorly-designed industrial process, businesses leak data through their employees habits, their poor data hygiene and outright fraud and corporate espionage conducted against unsecured systems. Some companies are data hemorrhaging without realizing it until it makes the news. Many more are on the verge of losing data they’ll never realize is gone.

So what now?

In the first century of the industrial revolution, we gave industrialists a free hand with polluting our world because we craved the benefits and didn’t feel the effects right away. Acid rain, dead rivers and lakes and sick humans changed our minds and we got tougher about pollution. There will come a point in the future, who’s to say how far off, where we take the same approach with data that we took to pollution…we’ll realize that the data byproducts of our modern life have an impact on our safety, our privacy and possibly our quality of life.

At that point, we’ll then work to contain the damage and we’ll find out what’s repairable and not. We’ll be surprised about what can’t be put back in the bottle and we’ll make choices that won’t be driven simply by convenience and up-front cost.

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

Topics:

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}