Over a million developers have joined DZone.

Nearly 90 Percent of Enterprises Report Polluted Data Stores

DZone's Guide to

Nearly 90 Percent of Enterprises Report Polluted Data Stores

Sam Lewis reports on a survey of 300 data management professionals that suggests enterprises are struggling to manage big data flows.

· Big Data Zone
Free Resource

Learn best practices according to DataOps. Download the free O'Reilly eBook on building a modern Big Data platform.

StreamSets, a data performance management company, released the results of a global survey of more than 300 data management professionals conducted by Dimensional Research. The survey found that enterprises are struggling to manage big data flows as the sources of today's big data are constantly in motion.

The study showed that enterprises of all sizes face challenges on a range of key data performance management issues from stopping bad data to keeping data flows operating effectively. In particular, nearly 90 percent of respondents report flowing bad data into their data stores while just 12 percent consider themselves good at the key aspects of data flow performance management.

"In today's world of real-time analytics, data flows are the lifeblood of an enterprise," said Girish Pancha, CEO, StreamSets. "The industry has long been fixated on managing data at rest and this myopia creates a real risk for enterprises as they attempt to harness big and fast data. It is imperative that we shift our mindset towards building continuous data operations capabilities that are in tune with the time-sensitive, dynamic nature of today's data."

For developers, the survey found that low-level coding or use of schema-driven ETL tools, in combination with the changing nature of data, created ample opportunities for big data to turn into bad data. Enterprises are constantly tweaking data pipelines: Eighty-five percent of respondents said that unexpected changes to data structure or semantics create a substantial operational impact. Over half (53%) reported that they have to alter each data flow pipeline several times a month, with 23% making changes several times a week or more.

Making frequent changes to pipelines using these inflexible approaches is not only highly inefficient but prone to errors. Also, these tools do not let you watch the data in motion, which means you are flying blind and can't detect data quality or data flow issues.

Pervasive data pollution, which implies analytic results may be wrong, leading to false insights that drive poor business decisions. Even if companies can detect their bad data, the process of cleaning it after the fact wastes the time of data scientists and delays its use, which is deadly in a world increasingly reliant on real-time analysis.

For more information and complete survey results, please visit https://streamsets.com/big-data-global-survey/

Find the perfect platform for a scalable self-service model to manage Big Data workloads in the Cloud. Download the free O'Reilly eBook to learn more.

big data ,dev tools ,iot data center ,data pollution

Opinions expressed by DZone contributors are their own.


Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.


{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}