Big Data 2018 Surprises and 2019 Predictions
Big Data 2018 Surprises and 2019 Predictions
Data lakes gain in popularity and value as organizations are able to get value from their contents.
Join the DZone community and get the full member experience.Join For Free
How to Simplify Apache Kafka. Get eBook.
Given the speed with which technology changes, we thought it would be interesting to ask IT executives to share their thoughts on the biggest surprises in 2018 and their predictions for 2019. Here's what they told us about big data:
Despite being very late in the year, the merger of Hortonworks and Cloudera was a surprise for 2018. We have seen a greater shift away from Hadoop and toward data lakes as the number of Hadoop science experiments have not shown significant business value or ROI. The first Hadoop cluster went live on January 28, 2006, at Yahoo, which represented a transformational shift in the ability to process “Big Data.” While a great innovation in 2006, Hadoop also brought a complex infrastructure of components, which, in many cases, complicated operations for businesses that didn’t deal with Internet-scale data. Along with the desire to have a simpler infrastructure with predictable costs, the shift to Python as a general purpose data language has increased the adoption of data lakes, backed by Spark data pipelines.
Another surprise is the accelerated adoption of cloud services, such as Azure, AWS, and GCP for data storage and data pipelines. This is especially true of PaaS products, which address operational costs, time to market, and business agility.
For 2019, this accelerated adoption of data lakes and metadata will only increase. Specifically, this will include metadata that will provide the critical contextual information needed to support the value proposition and bring strategic value to initiatives.
It’s great data can be stored cheaply and moved quickly, but without trust, which is made possible through metadata and more generally, through data governance, the predictive and prescriptive models will certainly fall short of the strategic information assets promised. In addition, data lakes, backed by cheap cloud storage and Spark based pipelines, will continue to gain significant ground over traditional Hadoop implementations.
Finally, fueled by the explosion of information available from data lakes, metadata, and proper data governance, mainstream businesses will continue to adopt machine learning.
Dan Potter, VP of Product Management and Marketing, Attunity
The merger of Hortonworks and Cloudera was a shock to many in the industry. The coming together of two direct competitors with very diverse business models in a strong big data and analytics market was not expected. It will be interesting to see how this combination– an open source Hadoop big data platform and a data warehouse and AI/ML platform – will play out in 2019 and what the consolidation will mean for the rest of players in this space.
DataOps – a more collaborative data management practice – will become more important to IT leaders as it focuses on improving communication, integration, and automation of real-time data loads between business managers and the data/IT teams. Similar to how DevOps changed how applications are developed and tested, DataOps will revolutionize how data is shared, integrated and made available resulting in a more agile, efficient approach. Using a DataOps strategy, enterprises will be able to move data and use it to operate at the speed of change to remain highly competitive.
Additionally, enterprises are starting to understand the value of real-time data pipelines where source data changes are streamed to the data lake, and the merging of data and metadata changes are handled automatically for greater resiliency. The data lake now becomes a much more valuable resource to provision analytics-ready subsets to a diverse set of business users and needs.
Neil Barton, CTO, WhereScape
Business leaders often talk of the "time to value" of investment in new projects. One of the trends we’ve seen through 2018 is how this has spilled over into how organizations are approaching how they leverage data. 2018 was the year that this spotlight was on automation and its associated efficiency benefits for IT teams.
Further eliminating the manual, repetitive elements within the development process will be even more of a priority in 2019. As the speed of business continues to increase, organizations must shorten the time it takes to unlock the value of data. Automation does just that, and additionally enables companies to redeploy valuable developer resources away from routine data infrastructure management processes and onto value-add tasks, such as delivering new solutions and services that will better guide the business.
Dale Kim, Sr. Director, Products/Solutions, Arcadia Data
The obvious choice for the surprise of the year is the announced merger of Cloudera and Hortonworks, as many of us saw each company as a long-term viable public company. Otherwise, a big surprise was how much analytics success business teams are gaining from their modern data architectures, as seen in analyst surveys. These deployments are being used by more than the data scientists and data engineers, and non-technical users are increasingly getting value from them.
We expect clearer categories around analytical processing will emerge, so we won’t still be talking broadly about AI, ML, and BI. Rather, organizations will pursue specific segments such as "data lake BI" and "microservices-based ML" that call out new and differentiated innovations.
Opinions expressed by DZone contributors are their own.