To gather insights on the state of Big Data today, we spoke with 15 executives providing Big Data products and services to clients. Specifically, we spoke to:
- Uri Maoz, Head of U.S. Sales and Marketing, Anodot
- Dave McCrory, CTO, Basho
- Carl Tsukahara, CMO, Birst
- Bob Vaillancourt, Vice President, CFB Strategies
- Mikko Jarva, CTO Intelligent Data, Comptel
- Sham Mustafa, Co-Founder and CEO, Correlation One
- Andrew Brust, Senior Director Marketing Strategy, Datameer
- Tarun Thakur, CEO/Co-Founder, Datos IO
- Guy Yehiav, CEO, Profitect
- Hjalmar Gislason, Vice President of Data, Qlik
- Guy Levy-Yurista, Head of Product, Sisense
- Girish Pancha, CEO, StreamSets
- Ciaran Dynes, Vice Presidents of Products, Talend
- Kim Hanmark, Director, Professional Services, TARGIT
- Dennis Duckworth, Director of Product Marketing, VoltDB.
01 The keys to working with Big Data are to be prepared for:
- the number of sources from which data is coming;
- the high volume of data;
- the different forms of data;
- the speed with which the data is coming; and,
- the elasticity of the database, the enterprise, and the applications.
Innovative solutions are becoming available to address the volume and elasticity of data, as well as the integration of the different data types (unstructured, semi-structured, and structured) from different sources. The key is to understand up front what you want to get from the data and then to plan accordingly to ensure the data is operationalized to provide value for the end user—typically in a corporate environment.
02 The most significant changes to Big Data tools and technologies in the past year have been:
- Spark supplanting MapReduce and Hadoop;
- machine learning and clustered computing coming to the forefront;
- the cloud enabling larger datasets at ever lower prices; and,
- the new tools that make the analysis of large, disparate datasets even faster.
Spark has sucked the energy out of some of the newer frameworks while Google cloud has made machine learning and artificial learning accessible. A lot of companies are using Apache Spark as their Big Data platform because it analyzes and hands-off datasets more quickly. The cloud has provided prominent deployment options for companies not in the IT business—Big Data is an operating expense rather than a capital expense so it’s easier to get funding. Cloud and BigData go hand-in-hand, and the cloud providers are ensuring this by developing tools that make Big Data accessible to business professionals rather than just data scientists.
03 Our 15 respondents mentioned 29 different technical solutions they use on Big Data projects. The most frequently mentioned technical solutions are:
- Open Source;
- Kafka; and,
04 Our respondents provided a wide array of use cases and examples of how Big Data is being used to solve real-world problems. The most frequently mentioned use cases involve:
- real-time analytics;
- IoT; and,
- predictive analytics.
Real-time analytics are being used by e-commerce and telcos to provide more personalized services, dynamic pricing, and customer experiences. It’s clear real-time data is more valuable to clients and end users, and as such the speed of ingestion and analysis is key. IoT is most prevalent in industry and utilities to track the use, production, predictive maintenance, and outages to optimize performance, productivity, and efficiency. Predictive analytics are being used for maintenance to reduce downtime in airlines, turbines, and other complex mechanisms, as well as for Wall Street to project the price of commodities based on IoT data collected from farmers’ combines in the Midwest. The latter is a great example of how the same data is being integrated and analyzed to fulfill several end users’ needs.
05 The most common issue affecting Big Data projects is a “lack of vision,” though this was expressed in several ways by respondents. Lack of talent and security were also mentioned by multiple respondents. Companies are slow to see how Big Data can provide them with a business advantage and they tend to be vague about what they want to accomplish. Companies interviewed frequently serve as consultants, helping their clients understand what they can and cannot do with Big Data to address specific business problems and how to make the data they do have actionable for their business. Goals and expectations of the data quality can be unrealistic given the inconsistency of the data and the preparation required for analysis. Companies don’t know what they don’t know, and there is a lack of qualified talent in the market and in the enterprise. The skillset shortage is not going away. In addition, moving data around is inherently unsafe. You need someone who understands the infrastructure and the security protocols; however, these people are nearly as few and far between as Big Data professionals.
06 There was a consistent response regarding the future of Big Data—more data, faster, in more formats, from more sources, with faster analysis, as well as real-time integration and decision making to solve problems before they occur. Data is the oil of the 21st century. The innovation gap is shrinking. More businesses will focus on what you need to achieve to see an ROI on their Big Data initiatives. IoT will drive an order of magnitude increase in the amount of data collected and stored. As such, we’ll need to decide on the fly what data to analyze, store, and throw away. While data is getting bigger and faster, we need to ensure security, governance, oversight, and policies are in place to protect the data and personal identifiable information (PII).
07 The biggest concerns around the state of Big Data today are:
- privacy and security;
- lack of collaboration and an ecosystem mentality; and,
- the need to deliver business value and solve problems.
We need to establish standards so everyone involved is part of a Big Data ecosystem addressing clashes between protocols, data transfer, and disparate data sources. When we’re connecting data across different entities, we need to ensure the connections are secure on both ends.
08 There is very little alignment with regards to what developers need to know to be successful working on Big Data projects. In general, developers need to have traditional programming principles and skills that result in the creation of “rock-hard” applications, while remaining nimble and prepared for change, since it is inevitable. The most recommended tools and languages are Java, SQL, Scala, Spark, C, R, and Python. Learn the ecosystems for the packages you control. Separate ingestion from data analytics and get more comfortable with data science. Ramp up statistics and applied math coding skills. Pick up statistics coding since it’s the foundation of data science. Understand the architecture and how to build a system from the ground up that scales and handles large amounts of data. Lastly, go to the balcony and see how others are using the tools in an unbiased way. Make sure your end users are receiving value from the data you’re providing.
09 Asking respondents what else they had on their mind regarding Big Data raised a diverse collection of questions and thoughts:
- As the next generation of tools come online, what do we need to keep in mind from an application perspective with regards to cloud, scale, encryption, and security?
- What are some new use cases given the capabilities of the platforms and tools? Give developers the opportunities to use all of the new extensions and see what they’re able to come up with.
- How sophisticated and “bought in” are developers to Big Data?
- Where does the data science/predictive analytics world align with business intelligence?
- Systems will be able to handle the size, types, and velocity of data. If analytical tools don’t keep up with the changes they’ll fall by the wayside.
- “Big Data” is simply replaced with “data.”
- Where are people placing their Big Data to do analytics—in the cloud, on-premise, hybrid, local, global?
- What about Blockchain? It’s on our radar since it will result in a sea change for economic transactions. It’s the next big topic to be hyped.