Issues Preventing Big Data Success
Issues Preventing Big Data Success
Lack of skilled data professionals (i.e., resources and internal technical ability) is the biggest issue. There's also a lack of high-value business cases.
Join the DZone community and get the full member experience.Join For Free
Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.
To gather insights on the state of Big Data today, we spoke with 22 executives from 20 companies who are working in big data themselves or providing big data solutions to clients.
Here's what they told us when we asked, "What are the most common issues you see preventing companies from realizing the benefits of big data?"
- Belief that if you build a Big Data lake that the results become obvious. Data management is an issue. Plan with expected outcomes and the insights you want to achieve. Think about how to do more advanced analytics. Use the right tool for the job. Identify what you want to use in the data warehouse.
- Companies do not understand what big data is at the business level. They have not identified the business problem they need to solve. Understand what’s working and what you can do to add value.
- Half of an IT project is integrating the application. Get access. How to cleanse and apply data governance. Seeing the two converge. Who has the capability for you to outsource to? Barrier to entry can be high with Hadoop and Cassandra. Platforms offer less costly access.
- Different formats need to be normalized, insights gathered, tagged and put in a searchable format.
- One common issue is simply underestimating the difficulty of implementing a fully functioning big data system. There are lots of great tools out there that will get you started, and lots of open-source that is great for sandboxing. But standing up a production-grade big data system is a whole different ballgame. And keeping that system up and running and moving it forward as business needs change is yet another major challenge. We hear the same story again and again. They learn about our big data solution, and say, “Thanks for the idea - we have some big data experience, and we think we can build that ourselves.” Often, those same teams are back knocking on our door in a few months saying, “That was a lot harder than we thought it would be.”
- Ability to dynamically connect different sources keeping humans out of the process as much as possible so they can focus on higher level activities.
- Complexity exacerbated by the skill needed to integrate and operationalize the data. Try to get all the data together so you can change the 80:20 ratio of getting access to data versus analyzing it for insights.
- You cannot find the data you’re looking for because there’s too much of it. File names are cryptic. Afraid to give people access to data because you’re not sure what’s in the data. Hadoop is very hard. You need to ingest, catalog, and wrangle the data. There’s an additional layer of capabilities to address problems. Cataloging is one of many pieces.
- Inertia. Not getting started.
- It varies by the company’s aptitude. Perception of big data clusters is 10 to 50 — only a handful of customers with thousands of nodes. Get up and running and stay abreast of releases. Standardization of tools became extra work.
- Cultural — large companies benefit from big data analytics. Get away from assumptions that projects must succeed. Allow for failure and learning. Allow for iteration and experimentation. Innovation leaders like Siemens and Phillips can show the business team how successful you can be when you allow for failure.
- Fixation on a particular technology. Determine what problem you are trying to solve now and be prepared to move over time.
- Having the right people. The talent issue is huge. We have a qualified candidate crisis. Data scientists must keep their skills at the cutting edge and know what tools are evolving to solve their problems.
- They need guidance. The ecosystem is moving quickly and you have to be on the bleeding edge to know what’s the optimal solution to the problem. Spark requires a different architecture going from storage intensive to compute intensive. It’s more difficult for a traditional enterprise with legacy systems. They tend to move more slowly and methodically and have slower adoption. We’ve created a team of business value consultants for banks and healthcare companies. Have clients set specific goals (i.e., reduce churn by 4%) meet or beat the goals, and then move to the next project. The speed of movement in open source is overwhelming for most people. You need to know what’s coming next so you can plan accordingly. We’re driving open standards so customers can be more flexible and have the wherewithal in the market with more skill sets and portability. Promise for flexibility with big data in the cloud and on-premises.
- Lack of high-value business use cases. A lot of marketing implies that use cases and planning are obsolete – ad hoc is just fine. We could not disagree more. You need repeatable and scalable processes. We take open source and put an abstraction layer over it so that the users on the business side can search for what’s most important to them.
- People don’t believe in it or people believe in it blindly without thinking through and evaluating the tools and technologies they need to accomplish a specific goal. We run workshops to help identify possibilities and frameworks.
- Lack of resources and internal technical ability. Everyone needs to understand what people are doing on their site and blog. There are several good products to tell you these things, like Mix Panel and Google Analytics, where you don’t need a data scientist.
- Data residing in silos: Too difficult to integrate and extract meaningful insights in a timely fashion. Store and forget approach to big data: no clear strategy for analyzing big data for business benefits. Skill set gap: big data systems/tools are too complicated to use for most employees.
- Fear of legal concerns when collecting data that involves the behavior of specific individuals. In B2B, this is a real concern. The “is data good enough” questions always comes into play. This is a valid concern – but not doing anything does not answer the question. Jump into it and you will learn. And if you fail, you will know where your data collection should improve. Companies do understand the use cases that can be applied – but it is a new type of project and there are not that many system integrators that currently can support them.
- Inability to define clear business objectives. Access to people with the skill sets to achieve the goals. There aren’t enough people who have the knowledge and the experience required to deliver big data projects. A software engineer must not only understand the concepts and the possibilities but also how to deliver them. People often think they need a data scientist, but they need product owners, a data engineering team, a data scientist and so on.
What issues do you see preventing companies from realizing the benefits of big data?
By the way, here’s who we talked to!
- Nitin Tyagi, Vice President Enterprise Solutions, Cambridge Technology Enterprises.
- Ryan Lippert, Senior Marketing Manager and Sean Anderson, Senior Product Marketing Manager, Cloudera.
- Sanjay Jagad, Senior Manager, Product Marketing, Coho Data.
- Amy Williams, COO, Data Conversion Laboratory (DCL).
- Andrew Brust, Senior Director Market Strategy and Intelligence, Datameer.
- Eric Haller, Executive Vice President, Experian DataLabs.
- Julie Lockner, Global Product Marketing, Data Platforms, Intersystems.
- Jim Frey, V.P. Strategic Alliances, Kentik.
- Eric Mizell, Vice President Global Engineering, Kinetica.
- Rob Consoli, Chief Revenue Officer, Liaison.
- Dale Kim, Senior Director of Industrial Solutions, MapR.
- Chris Cheney, CTO, MPP Global.
- Amit Satoor, Senior Director, Product and Solution Marketing, SAP.
- Guy Levy-Yurista, Head of Product, Sisense.
- Jon Bock, Vice President of Product and Marketing, Snowflake Computing.
- Bob Brodie, CTO, SUMOHeavy.
- Kim Hanmark, Director of Professional Services EMEA, TARGIT.
- Dennis Duckworth, Director of Product Marketing, VoltDB.
- Alex Gorelik, Founder and CEO and Todd Goldman, CMO, Waterline Data.
- Oliver Robinson, Director and Co-Founder, World Programming.
Opinions expressed by DZone contributors are their own.