Nearly 90 Percent of Enterprises Report Polluted Data Stores
Sam Lewis reports on a survey of 300 data management professionals that suggests enterprises are struggling to manage big data flows.
Join the DZone community and get the full member experience.Join For Free
StreamSets, a data performance management company, released the results of a global survey of more than 300 data management professionals conducted by Dimensional Research. The survey found that enterprises are struggling to manage big data flows as the sources of today's big data are constantly in motion.
The study showed that enterprises of all sizes face challenges on a range of key data performance management issues from stopping bad data to keeping data flows operating effectively. In particular, nearly 90 percent of respondents report flowing bad data into their data stores while just 12 percent consider themselves good at the key aspects of data flow performance management.
"In today's world of real-time analytics, data flows are the lifeblood of an enterprise," said Girish Pancha, CEO, StreamSets. "The industry has long been fixated on managing data at rest and this myopia creates a real risk for enterprises as they attempt to harness big and fast data. It is imperative that we shift our mindset towards building continuous data operations capabilities that are in tune with the time-sensitive, dynamic nature of today's data."
For developers, the survey found that low-level coding or use of schema-driven ETL tools, in combination with the changing nature of data, created ample opportunities for big data to turn into bad data. Enterprises are constantly tweaking data pipelines: Eighty-five percent of respondents said that unexpected changes to data structure or semantics create a substantial operational impact. Over half (53%) reported that they have to alter each data flow pipeline several times a month, with 23% making changes several times a week or more.
Making frequent changes to pipelines using these inflexible approaches is not only highly inefficient but prone to errors. Also, these tools do not let you watch the data in motion, which means you are flying blind and can't detect data quality or data flow issues.
Pervasive data pollution, which implies analytic results may be wrong, leading to false insights that drive poor business decisions. Even if companies can detect their bad data, the process of cleaning it after the fact wastes the time of data scientists and delays its use, which is deadly in a world increasingly reliant on real-time analysis.
For more information and complete survey results, please visit https://streamsets.com/big-data-global-survey/
Opinions expressed by DZone contributors are their own.