To gather insights on the state of Big Data today, we spoke with 22 executives from 20 companies who are working with Big Data themselves or providing Big Data solutions to clients. Here’s who we talked to:
Nitin Tyagi, Vice President Enterprise Solutions, Cambridge Technology Enterprises
Ryan Lippert, Senior Marketing Manager, Cloudera
Sean Anderson, Senior Product Marketing Manager, Cloudera
Sanjay Jagad, Senior Manager, Product Marketing, Coho Data
Amy Williams, COO, Data Conversion Laboratory (DCL)
Andrew Brust, Senior Director Market Strategy and Intelligence, Datameer
Eric Haller, Executive Vice President, Experian DataLabs
Julie Lockner, Global Product Marketing, Data Platforms, Intersystems
Eric Mizell, Vice President Global Engineering, Kinetica
Jim Frey, V.P. Strategic Alliances, Kentik
Rob Consoli, Chief Revenue Officer, Liaison
Dale Kim, Senior Director of Industrial Solutions, MapR
Chris Cheney, CTO, MPP Global
Amit Satoor, Senior Director, Product and Solution Marketing, SAP
Guy Levy-Yurista, Head of Product, Sisense
Jon Bock, Vice President of Product and Marketing, Snowflake Computing
Bob Brodie, CTO, SUMOHeavy
Kim Hanmark, Director of Professional Services EMEA, TARGIT
Dennis Duckworth, Director of Product Marketing, VoltDB
Alex Gorelik, Founder and CEO, Waterline Data
Todd Goldman, CMO, Waterline Data
Oliver Robinson, Director and Co-Founder, World Programming
Here are the key findings from the subjects we covered:
The key to a successful Big Data strategy is knowing what problem you are trying to solve before you begin investing in software and tools. Without knowing the problem you are trying to solve and the metrics that will define success, you don’t know how to specify the software and tools that will help you achieve your goals. The second key, closely related to the first, is knowing what insights you are looking for and the value you are attempting to bring your business. The more specific you are about the business need and the problem, the more likely you are to solve it. Pursuing a “Big Data” strategy because you are collecting a lot of data will end up wasting a lot of money and time. A Big Data initiative is not an inexpensive proposition, so identify the specific use case, execute the solution, show the value provided, and then move to the next use case.
The 80 percent of companies that aren’t getting more out of Big Data can start with strategic planning. Know what you're going to do with the information you’ve collected, the insights you want to uncover, the source of the data, and understand the need for it to be cleaned and prepped before it can be integrated with other data. Empower others in the organization to access the data. Ultimately you want to be able to provide real-time decision-making at every level of the business; however, you need to implement several successful use cases before you can achieve this goal. Crawl, walk, then run.
The biggest change in Big Data over the past year has been the uptick in real-time streaming of data and ingestion engines that can handle the volume and scale of that data. Streams are part of a Big Data strategy and help to break down siloes. With machine learning and natural language processing, Big Data is available for humans everywhere. At least one company is already enabling their clients to use Alexa and natural language processing to run queries on their data and obtain insights from oceans of data.
Hadoop, Spark, and Tableau were the most frequently mentioned solutions for collecting and analyzing data with several other tools sitting atop Hadoop. Open source solutions like Kafka, Nitti, and Storm were mentioned by a couple of respondents, with Python and R being mentioned as useful languages for data analysis. SAS used to have a monopoly analytics tools, but that has changed in the last 12 months with more people using R and H2O. However, this is not the end of SAS, since it’s well-engrained in 99 percent of the Fortune 500.
Retail, healthcare, media, and telecommunications are the four most frequently mentioned industries whereBig Data is solving real-world problems. However, examples were also provided in financial services, government, IT, and fleet management. In healthcare and financial services, Big Data is being used to identify patient/customer care, fraud, and abuse. Natural language processing is enabling the monitoring and reporting of sentiment on social media channels to help telcos, retailers, CPG manufacturers, and pharmaceutical companies understand consumer sentiment, predict trends, and churn. Retailers are focused on personalization across multiple devices and brick-and-mortar stores to provide a better customer experience.
Lack of skilled data professionals is the most frequently mentioned issue preventing companies from realizing the benefits of Big Data. Having the right people to build out a Big Data team is key, but there’s currently a huge talent gap. Data scientists must keep their skills sharp and know what tools are evolving to tackle the problems their companies are attempting to solve. The Big Data ecosystem is moving very quickly, and it takes time to learn about what tools are available, what their best use cases are, and to determine if they'll still be relevant in a year. People underestimate the difficulty of implementing a fully-functioning Big Data system. In addition to data scientists, you need product owners, a data engineering team, and other professionals familiar with data preparation, integration, and operationalization.
The future of Big Data is real-time decision-making with machine learning and natural language processing. This will provide insights everywhere for everyone — not just the data elite. We will be collecting more data and getting actionable insights with automated processes to get near-term value from data. Big Data analytics will be integrated into day-to-day operations.
The proliferation of data and tools is on par with privacy and security as the biggest concerns around the state of Big Data today. There is confusion around the technologies and there’s too much data to ingest. We have a separate tool for every problem, the tools are complex, some only have minor differences, and they are changing daily. A year ago, MapReduce was the “big thing.” Today it’s Spark. How do I know where to invest my money and my time? Security and privacy continue to be secondary concerns, with more emphasis on widgets than where the data is coming from and how to keep it safe. Google, Apple, and telcos are collecting data on everyone, and we don’t know what they’re doing with it. Companies are collecting more data than they can protect. The black hats are ahead of the white hats.
The skills developers need to work on for Big Data projects fall into two areas: languages and business skills. The most frequently recommended languages were Java and Python, and knowing Apache Spark was also highly encouraged. The most frequently mentioned business skills were 1) understanding the business and business problem; 2) collaboration; and 3) understanding machine learning and natural language processing.
Additional considerations from the respondents were varied:
Does Big Data technology include relational databases? What are the types of data and speeds that it includes? Can it scale in different formats and different engines? Can it integrate with disparate data?
We talk about Big Data but we don’t talk about the need to clean the data and put it in a searchable format.
We need to help people find the faster path to build solutions and how to project the time to deliver projects.
Where is Big Data going as an industry to produce tangible value?
Specific industries, such as healthcare and financial services, are seeing the need for a very specific set of tools. What technologies and trends are emerging for particular industries with particular needs?
Voice search is a massive opportunity for data and it’s going to get hotter.
How do others see cloud playing into Big Data? Playgrounds in the cloud are great for developers, but how to we bring what they’ve done back on premise?
Focus on machine learning and natural language processing thereafter (Alexa and Echo).
How can companies who aren’t big enough to invest in Big Data solutions find a place to host their data that lets them analyze it and then move it to a larger platform when they’re ready? I know about Mixpanel, but are their others?