2018 Big Data Predictions (Part 2)
2018 Big Data Predictions (Part 2)
Big data continues to get bigger, and is increasingly analyzed in the cloud or on the edge. Explore this and more intriguing information in this research article.
Join the DZone community and get the full member experience.Join For Free
Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.
Given how fast technology is changing, we thought it would be interesting to ask IT executives to share their thoughts on the biggest surprises in 2017 and their predictions for 2018.
Here's the second of two articles of what they told us about their predictions for big data and analytics in 2018.
Rapid Kubernetes adoption forms the foundation for multi-cloud deployments.
We predict runaway success of Kubernetes, but it is running away with the prize of adoption so fast that this may quickly be more of an observation than a prediction in 2018.
So far, however, almost everybody is thinking of Kubernetes as a way of organizing and orchestrating computation in a cloud. Over the next year, we expect Kubernetes to more and more be the way that leading-edge companies organize and orchestrate computation across multiple clouds, both public and private. On-premises computation is moving to containers and orchestration style at light speed, but when you can interchangeably schedule work anywhere that it makes sense to do so, you will see the real revolution.
But… this only talks about the computation. What about the data? Well, there are two specific predictions related to that...
Big data systems will become the center of gravity (and building a global data fabric is one key way to do that).
In the past, big data and the projects built around it have been isolated — in many cases, special projects or experiments that at best complemented traditional systems. Now, big data is becoming an essential asset and enterprises are transforming into data-driven concerns. This transformation naturally leads to big data systems becoming the center of gravity for enterprises, in terms of data size, storage, and access, as well as operations and analytics.
As a result, more businesses will be looking for ways to build a global data fabric that breaks down silos to give comprehensive access to data from many sources and to computation for truly multi-tenant systems.
Leading organizations knit data flows into a data fabric.
This coming year, we will see more and more businesses treat computation in terms of data flows rather than data that is just processed and landed in a database. These data flows capture key business events and mirror business structure. A unified data fabric will be the foundation for building these large-scale flow-based systems. Such a fabric will necessarily support multiple kinds of computation that are appropriate in different contexts. More and more, databases will become the natural partner and complement of a data flow. The emerging trend is to have a data fabric that provides data-in-motion and data-at-rest needed for multi-cloud computation provided by things like Kubernetes.
DataOps emerges as a key organizational approach to drive agility.
We have lately seen the beginning of a trend toward embedding data scientists and data-focused developers into otherwise traditional DevOps teams to form what we call a DataOps team. This approach involves much better communication, better focus and goal orientation by cross-skilled teams, and results (importantly) in faster time to value and better agility. Organizing work in a DataOps style gives an enterprise better ability to respond to changing conditions in a timely and appropriate way — it provides the flexibility and efficiency at the human level needed to take advantage of new technologies and architectures.
For example, as machine learning becomes mainstream (see item 1), switching to DataOps teams becomes very natural, and we expect this to become very popular this year. This will let some companies pull away from the pack, but it can be incredibly hard for core IT to keep up with the resulting demands. Security teams will also be hard-pressed.
Processing extends to the IoT edge.
In this upcoming year, we aren’t just going to see data fabrics and computation that span on-premises facilities into multiple clouds. We are also going to see full-scale data fabric extend right to the edge next to devices, and, in some cases, we will see threads of the fabric extend right into the devices themselves.
Big data continues to be a huge buzzword and an expensive investment area for a lot of companies. We only see larger organizations able to make headway with realizing value in the field; though, there's still the high cost of data scientists and scarcity of skilled resources to build solutions that organizations can actually realize the value of. The data is there and being collected everywhere. The struggle comes when organizations look to make valuable conclusions from the data. With the continued simplification of these toolsets, we’ll see more organizations taking on big data projects in the coming months.
In 2018, business and IT executives will explore opportunities to innovate with data to improve business efficiency and performance. In order to do so, organizations will increase their reliance on automated analysis such as machine learning, as data scientists and data engineers are tasked with providing broader support to business analysts.
The ability to leverage fully functional data lakes and bring together information from data lakes and other sources, like relational databases, will become a top priority. Finding hidden, undocumented data or implied relationships will also continue to be critical, together with "truly" understanding these data dependencies.
Lastly, the key will be moving away from time-consuming and ineffective manual efforts to innovative technology solutions that empower collaboration and optimum productivity, save time, and remove silos by discovering and delivering a unified view of all data assets across geographies and heterogeneous technologies.
I think we'll see more adoption of big data analytics in the cloud and also continued investment and efforts from vendors in the cloud offering big data and analytics.
Big data will enter an era of “spring cleaning.” Companies have long been hoarding data, knowing it will have value someday, but not actually using most of it. Recently, natural language search and visualization tools have emerged that can help them optimize this data — not just store it. In 2018, more and more CIO’s will get the chance to exclaim, “See? I told you we’d use it someday!” after they roll up their sleeves and tackle their big data repositories with these new tools, uncovering valuable insights — not just information — through their efforts.
Stream processing technologies will become mainstream in the enterprise by the end of 2018, moving beyond technology companies. At data Artisans, we are seeing strong adoption from large organizations in financial services, telecommunications, manufacturing, and other industries. The adoption is accelerating, as well, and surpassing our expectations. Backing this up are analyst predictions that the streaming data applications market will reach more than $13 billion by 2021.
Enterprises will invest in new products and tools to productionize and institutionalize data stream processing. As companies are moving real-time data processing to large scale, both in terms of data processed and the number of applications, they will need to seek out new tools that make it easy to run streaming applications production and reduce the manpower, cost, and effort required.
Stream processing will expand beyond the fast movement of data or simple analytical applications to operational applications that make true use of stateful capabilities. Large global organizations across industries are adopting streaming data applications for fraud detection, sales and marketing management, predictive asset maintenance, real-time inventory, risk management, and operations management, among other use cases.
Evidence of this growth is the increase in Kubernetes deployments versus YARN deployments, and that the development of these new operational applications are increasingly led by product teams rather big data teams. Apache Flink is also becoming more developer-friendly and can now be used without Hadoop, further opening up streaming data applications for developers who are not using Hadoop. Flink programs that do not rely on Hadoop components can now be much smaller, a benefit particularly in a container-based setup resulting in less network traffic and better performance.
The days in which we distinguish between “batch” and “stream” data processing will soon come to an end. There is no fundamental reason for this distinction and the evolution of technology will make it disappear. Many applications need both of these capabilities so in the end, we will talk about data processing.
Companies will realize business ROI faster with real-time data technologies than with Hadoop. In our experience, the adoption of Flink in the enterprise starts with very specific use cases instead of open-ended projects which were very often the case with Hadoop. Companies realize business ROI faster because they get live applications up and running as the first step. For example, after one year of Flink being in production at Alibaba, Alibaba reported a 30% increase in conversion rate during Singles Day 2016 ($25 billion of merchandise sold in a single day this year).
In 2018, expect more of the same, although a leader may emerge. Initial confusion will shift to standardization with most companies picking their favorite. While Spark Streaming appears to be the lead horse, expect sprawl due to the residual left from prior investment and multiple frameworks persisting across the business. Fortunately, businesses can use multiple frameworks without worrying about losing control over their data by selecting a data operations platform that includes a living data map with auto-updating capabilities. This allows application of continuous integration and continuous deployment methods for stream processing within data flows.
As data becomes more self-aware and diverse, we will see the ability of data to self-govern and determine the processes to be executed upon it. With this and the emergence of microservices and compositional programming, metadata will meld with the data and the focus on processing will move from programs at the core to the data itself.
GDPR will face company backlash
In today’s global, digital economy, companies are collecting more data than ever on their customers, and that data is becoming more diverse and complex, from different sources and in different formats. The creation and exchange of data have also increased significantly as bring-your-own-data (the new BYOD) and enterprise collaboration software has grown to become a mainstay in the modern workplace.
GDPR shows once again how out-of-touch the government is from the tech world, only this time instead of being behind, it’s out front. The GDPR regulations are so far ahead of organization’s ability to manage data that most are not ready for it. This inability for organizations to comply with these regulations will force a negotiation.
Ethical lines will be drawn, detailing data morality (AKA data virtue)
Consumers are giving companies more information than they’re even aware of with every purchase and search. In 2018, data and the "morality/virtue" of using that data will come to a crossroads. Organizations collect mass amounts of information on their customers, and while the EU is aggressively moving forward with privacy regulations like GDPR, there is still a lot of grey area when it comes to the ethical implications of third parties gaining this amount of information on its clients.
Social media platforms will become regulated by 2020
Tech companies will undergo serious financial penalties for not removing fake news or banned content. The more data these platforms have, the harder it is for them to discern what’s real and what’s fake. These social platforms are now viewed as the “new media” and therefore they have a social responsibility to manage their public output. It will be imperative for social media companies to implement a data strategy to maintain credibility among the public and the companies leveraging their sites.
Now acting as the fuel source for machine learning algorithms, big data will continue to grow and data repositories will reach unheard of sizes.
Data gravity to the cloud. 2018 is shaping up to be the tipping point for cloud analytics with the growth of cloud data and applications. Most organizations need to operate in a hybrid mode with analytics across data and applications as they transition and take advantage of the cloud. An analytics strategy that is able to address this transition will be critical to run the business while bridging to the future.
A new plateau for ease of use. Natural language will allow users to ask their data a question and receive contextual insight and recommendations without having to understand the underlying data schemas, this will drive new use cases and adoption.
Value of end-to-end cloud analytics. Cloud-based analytics platforms will emerge to deliver a rich set of analytic capabilities to discover, plan, predict, visualize, prepare, collaborate, model, simulate and manage all leveraging a common data logic. The SaaS model provides the business a way to take advantage of continual product innovations in a seamless experience with common UX. This will address analytical requirements throughout the organization at a lower TCO vs. fragmented solutions causing inconsistencies.
Contextual insights delivered in the moment. Organizations will take analytics to the edge with contextual insight delivered to users in their applications, in the most beneficial moment with relevant context. Customer churn analysis, workforce planning, sales compensation, supply chain logistics are just a few examples which will benefit from timely insights delivered to users in-context within their application workflow.
Insight as a service will rise.With the growth of cloud data and AI, ML automation; the business will tap into context-rich insight which they do not own or control. Connecting to an advanced data network, users will access relevant external sources and combine with internal data to offer new digital services to their customers.
In the past, people were focused on learning the various big data technologies: Hadoop, Spark, Kafka, Cassandra, etc. It took time for users to understand, differentiate, and ultimately deploy them. There was a lot of debate and plenty of hype. Now that organizations have cut through the noise and figured all that out, they’re concerned about actually putting their data to use.
Take the recommendation engine for example, a critical app for most all web companies. Consider Netflix: Their recommendation engine isn’t just a nice add-on that enhances the user experience, it’s absolutely fundamental to the experience and to Netflix’s bottom line. Their platform depends on the ability to accurately suggest relevant movies and TV shows to people – otherwise, it’d be almost impossible for viewers to dig through their enormous library.
Netflix/the enterprise doesn’t really care about the technology being used. It’s not important which distribution or database or analytics they’re using, what matters is the result. The enterprise has realized this and we can expect to see an increased adoption of an application-centric approach to big data in the coming year.
Opinions expressed by DZone contributors are their own.