2018 Big Data Predictions (Part 1)
2018 Big Data Predictions (Part 1)
Data continues to grow as do the number of tools and solutions to obtain meaningful real-time insights. Learn what execs think will happen in the world of big data in 2018.
Join the DZone community and get the full member experience.Join For Free
Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.
Given how fast technology is changing, we thought it would be interesting to ask IT executives to share their thoughts on the biggest surprises in 2017 and their predictions for 2018.
Here's article one of two of what they told us about their predictions for big data and analytics in 2018. We'll cover additional predictions for 2018 in a subsequent article.
We’re going to see a lot more “componentization” of big data components and platforms going forward. Google has executed on this brilliantly, where a significant portion of their general cloud offerings such as BigTable and Dataflow are in fact built on top of Hadoop and other big data technologies — both open-source (HBase, Beam) and proprietary (BigQuery). I think we’ll see more products pushed out this way, providing less confusion to practitioners looking to benefit from the Hadoop ecosphere of big data.
Data is being created and collected at massive scale across all industries. This past year, with the support of machine learning integration, new analytics tools are being developed to help companies glean as much valuable insight from that data as possible. I think search capabilities will continue to become an incredibly important component of analyzing this content and furthering the understanding of the user experience. In 2018, robust search tools will become the standard for companies that create and access large reserves of data and content.
Organizations with a global reach or those in regulated fields have to keep data in certain places for legal reasons, making it a challenge to have a single logical view of this data. In many cases, the same technologies that make it easier to develop applications, like cloud infrastructure, services, and serverless backends make it challenging to even enforce data access protections.
Stream processing data will become further integrated into standard backend databases.
More companies will embrace multi-cloud as competition heats up between cloud vendors and fear of lock-in becomes more prevalent.
Graph database use cases will become less art and a lot more science as the technology matures.
Data autonomy is the fear that the big cloud players will become the main driver for large digital transformation projects. More and more brands will want data autonomy in a multi-cloud world in order to compete and stay ahead. The need and urgency to meet the big cloud players head-on with data-driven applications will intensify.
Real-time analysis of operational data will be a qualifying feature for most infrastructures so that their applications can explore emerging trends, provide timely alerts to operators and end users, and reduce the latency between the appearance of a condition and when it is visible to business owners on their dashboards. The traditional model of online data dumping nightly into an analytics data warehouse will not suffice. This will be made all the more challenging by the following.
The continued explosion of data. The philosophy of “store everything” will lead to even faster data growth, forcing organizations to choose between 1) sacrificing real-time access, 2) throwing away data, or 3) inventing new solutions at tremendous cost. However, the requirement to compete will make options 1 and 2 tantamount to ceding ground to competitors. This is compounded by:
A broadening of the definition of IoT. We are moving into a world where everything is generating data, and everything is consuming data. This year, more devices will become data-enabled, and first-gen IoT devices will be replaced by newer models with 5x the number of sensors. Devices that used to primarily report their state and allow a few commands will instead participate in an elaborate network of mutually coordinated behavior, all relying on a constant stream of data.
Different sources and schemas. Ingesting all of the data and making it actionable is complicated by its different sources and schemas. Organizations seldom have the capability to dictate the shape of data from all the sources they need to harness, and kicking the can down the road by dumping it into a shapeless data lake doesn’t help. They will need to find a way to make it all queryable.
Letting developers work more closely and naturally with data.Many technologies that facilitate managing and deriving insight from data at scale introduce impedance to the development process in the form of special-purpose languages or multiple new architectural layers.
Cross-cloud. Public cloud infrastructure and services are a boon to organizations, but they are also a source of lock-in. Organizations will face pressure to mitigate this lock-in, so they can take advantage of regions across cloud providers or mix services offered by separate cloud providers. Perhaps data storage is most cost-effective in one, while another offers the best price/performance ratio on GPU resources. Or they might want to migrate from one provider to another.
Keeping the election theme, everyone will be watching if the polls get it right for the mid-term elections. We will continue to see the adoption of Smart City technologies. More sensors will get deployed and new applications will be created to take advantage of these data streams.
Big Data will balloon into "overweight data" due to technology such as IoT, autonomous vehicles, and the 4th Industrial Revolution. This will drive new startups to address the needs to rapidly process and act upon this data.
Beginning of the end of the traditional data warehouse. As the volume, velocity, and variety of data being generated continue to grow, and the requirements to manage and analyze this data continue to grow at a furious pace, as well, and the traditional data warehouse is increasingly struggling with managing this data and analysis. While in-memory databases have helped alleviate the problem to some extent by providing better performance, data analytics workloads continue to be more and more compute-bound.
These workloads can be up to 100x faster leveraging the latest advanced processors like GPUs, however, this means a nearly complete re-write of the traditional data warehouse. In 2018, enterprises will start to seriously re-think their traditional data warehousing approach and look at moving to next-generation databases either leveraging memory or advanced processors architectures (GPU, SIMD) or both.
Artificial intelligence (AI) deserves the same treatment Hadoop and other big data technologies have received lately. If the industry is trying to balance the hype around big data-oriented products, it has to make sure not to overhype the arrival of AI. This is not to suggest that AI has no place in current and future-looking big data projects, just that we are not at a point in time yet where we can reliably turn business decision-making processes over entirely to machines. Instead, in 2018 the industry will begin to modernize BI with machine assistance rather than AI-driven tasks. Think of it as power steering versus self-driving cars. Business users will get more direction on how to gain better insights faster, as they don’t need to be told what the right insights are. We’re so enamored by the idea of AI, but the reality is it’s not ready to act on its own in the context of analyzing data for business users.
In modernizing BI, we’ll also start to see a shift in which organizations will bring BI to the data. BI and big data have hit a bit of a brick wall. Companies have spent a lot of money on their data infrastructures, but many are left wondering why they have to wait so long for their reports. Part of the problem is that companies are capturing their data in a data lake built on a technology like Hadoop, but they not taking full advantage of the power of the data lake. Rather than ideally moving operations to the data, businesses move data from the lake to external BI-specific environments. This process of “moving data to the compute” adds significant overhead to the analytics lifecycle and introduces trade-offs around agility, scale, and data granularity. Next year and moving forward, we’ll start to see more companies bringing the processing to the data, a core tenet of Hadoop and data lakes, with respect to their BI workloads. This will speed the time to insight and improve the ROI companies see on their big data infrastructure investments.
Don Boxley, Co-Founder and CEO, DH2i
In 2018, organizations will turn to Best Execution Venue (BEV) technologies to enable and speed digital transformation, laying last year’s fears to rest. Organizations will reap immediate business and technological benefit, as well dramatic reductions in associated costs, by employing technologies that dynamically decide and then move workloads/data to the location and conditions where it can function at peak performance and efficiency, to achieve the desired outcome.
Datasets are getting larger and more comprehensive. Accessible storage is getting larger and less expensive, and compute resources are getting bigger and less expensive. This opens up a natural opportunity for advanced artificial intelligence to be used to process and analyze that data. Finding useful trends, patterns, and detect anomalies are natural use cases for AI on these large datasets.
Opinions expressed by DZone contributors are their own.