Concerns With Big Data
Concerns With Big Data
Concerns regarding the state of big data revolve around security, privacy, quality, volume, and the business problem to be solved.
Join the DZone community and get the full member experience.Join For Free
The open source HPCC Systems platform is a proven, easy to use solution for managing data at scale. Visit our Easy Guide to learn more about this completely free platform, test drive some code in the online Playground, and get started today.
To understand the current and future state of big data, we spoke to 31 IT executives from 28 organizations. We asked them, "Do you have any concerns regarding the state of big data?" Here's what they told us:
- Not enough emphasis on quality and contextual relevance. The trend with technology is collecting more raw data closer to the end user. The danger is data in raw format has quality issues. Reducing the gap between the end user and raw data increases issues in data quality. It's great the middle is being streamlined, but the raw data has a quality issue. Maintain focus on quality data. Once you start handing over processing to AI/ML, you need an understanding of the data. The importance of the data becomes more important from quality, format, and context.
- As analytics speeds up, there is a need for faster access to data. Humans are starting to be removed from the process. Where is the oversight? How do we know the data being used to drive analytics and operations should be used? How do we know that the algorithms are proper and ethical and unbiased and that they are continuing to execute in those ways? What happens when “bad data” gets into the system, even accidentally? Will it be discovered and rejected, or will it be processed with all resulting actions being tainted? Those are some concerns for where we are with big data right now and issues that need to be addressed.
Amount of Data
- More concerned with the hyperbole around AI/ML. Need to get back to solving problems and creating value. General big data has been through the arc, AI/ML is now in it. Need to create value from data.
- One concern is market disillusionment. There has been so much hype about big data that some organizations have unrealistic expectations, and as the hype has morphed into hype about machine learning and AI, there is a risk that projects will lose their mandate or that failed projects will cause a backlash. This is particularly true with data lake initiatives, which too often start without a clear application in mind and become data swamps that don’t visibly deliver value.
- I personally worry about the ethical treatment of data. We're till in a mode where we're eager to get our arms around everything versus looking at the long-term implications of how data is being used. What’s acceptable and what’s not? Businesses are where some of these acquisitions in open source are going – Red Hat, Cloudera — how does the platform space evolve from there? At the end of the day, big data as a concept survives. How it is implemented is likely to change.
- The state of big data is under flux with a lot of experimentation going on. Here are my top three concerns with where it’s headed: 1) Collapsing of the Hadoop market - While touted as a silver bullet that offers an economic solution to big data, Hadoop hasn’t lived up to its hype and we see all the vendors pivoting to AI and ML next. 2) Buzzword Bingo – Another concern I have regarding big data is that all solutions sound the same. One of the things I keep hearing from our customers is that they need to try things before they buy. They see the “buzzword bingo” played with so many big data vendors that they won’t trust any of them going forward. 3) NoSQL not living up to its hype – NoSQL claims to address web-scale issues that plagued RDBMSs for 40+ years with its scale-out architecture. However, they are starting to fail just like Hadoop. They give up SQL and ACID in the process of scale out. That’s like throwing the baby out with the bathwater and not something customers want.
Here’s who we spoke to:
- Cheryl Martin, V.P. Research Chief Data Scientist, Alegion
- Adam Smith, COO, Automated Insights
- Amy O’Connor, Chief Data and Information Officer, Cloudera
- Colin Britton, Chief Strategy Officer, Devo
- OJ Ngo, CTO and Co-founder, DH2i
- Alan Weintraub, Office of the CTO, DocAuthority
- Kelly Stirman, CMO and V.P. of Strategy, Dremio
- Dennis Duckworth, Director of Product Marketing, Fauna
- Nikita Ivanov, founder and CTO, GridGain Systems
- Tom Zawacki, Chief Digital Officer, Infogroup
- Ramesh Menon, Vice President, Product, Infoworks
- Ben Slater, Chief Product Officer, Instaclustr
- Jeff Fried, Director of Product Management, InterSystems
- Bob Hollander, Senior Vice President, Services & Business Development, InterVision
- Ilya Pupko, Chief Architect, Jitterbit
- Rosaria Silipo, Principal Data Scientist and Tobias Koetter, Big Data Manager and Head of Berlin Office, KNIME
- Bill Peterson, V.P. Industry Solutions, MapR
- Jeff Healey, Vertica Product Marketing, Micro Focus
- Derek Smith, CTO and Co-founder and Katie Horvath, CEO, Naveego
- Michael LaFleur, Global Head of Solution Architecture, Provenir
- Stephen Blum, CTO, PubNub
- Scott Parker, Director of Product Marketing, Sinequa
- Clarke Patterson, Head of Product Marketing, StreamSets
- Bob Eve, Senior Director, TIBCO
- Yu Xu, Founder and CEO, and Todd Blaschka, CTO, TigerGraph
- Bala Venkatrao, V.P. of Product, Unravel
- Madhup Mishra, VP of Product Marketing, VoltDB
- Alex Gorelik, Founder and CTO, Waterline Data
Opinions expressed by DZone contributors are their own.