How the Cloud is Disrupting Data Warehousing
Take a look at how cloud is changing the way big data is stored.
Join the DZone community and get the full member experience.Join For Free
As enterprises move their compute and application infrastructure to the cloud, they are re-evaluating their data technology stacks as well, ushering in a new wave of innovation that comes with some significant challenges. By 2020, over 40% of enterprise workloads will run on public cloud platforms (Cloud Vision 2020, LogicMonitor), but migrating data platforms and data to the cloud continues to be difficult and costly. The majority of enterprise data infrastructure teams — 62% of them, indicate the management of their data warehouse solution is either difficult or very difficult (Data Warehouse Trends Report 2018, Panoply). For IT, disparate data locales create a new layer of complexity on top of existing challenges of managing multiple legacy data platforms and addressing the demands for self-service analytics and AI.
In this article, we delve deeper into the future of data warehousing in the cloud and the new opportunities and hurdles data infrastructure and science teams would need to overcome.
Cloud-Based Data Warehouses Dominate the Big Data Landscape and Cloud-Based File Systems Supplant Hadoop as The Data Lake
The promise of agility and cost-effective data storage drove the data lake wave pioneered by the open-source platform Hadoop. However, the cost savings and flexibility failed to materialize for many enterprises due to the complexity and difficulty of managing Hadoop. Many enterprises are abandoning their on-premises data lakes in favor of the cloud: Amazon S3, Google Cloud Storage, and Microsoft ADLS. These cloud-based, distributed, object storage-based models satisfy the need for cost-effective, flexible storage and serve as the underlying storage engine for a new class of cloud-based data warehouses like Snowflake, Redshift and Google BigQuery. Expect to see more analytics and AI workloads move to these new managed, cloud-based data warehouses at the expense of self-managed, on-premises Hadoop. Analysts predict that the Data Warehouse-as-a-Service (DWaaS) Market will be worth $3.4 billion by 2023 (MarketsandMarkets).
Sticker Shock Drives New Demands for Predictable Cloud-Based Data Pricing Models
The primary motivation for moving to the cloud is to save money. Many enterprises are unpleasantly surprised when they get that first bill from their cloud provider. The “pay per query” models from the new generation of cloud-based data warehouses like Google BigQuery and Snowflake make cost difficult or impossible to predict. Even with the “all you can eat” pricing models, enterprises still need to choose a subscription size that’s likely more than they need. Expect to see a new slate of performance and caching tools that dramatically lower per-query costs and bring back budget rationale and predictability.
IT Finds New Ways to Prevent Public Cloud Lock-In
Whenever customers talk about cloud migration, top of mind for them is architecting their solutions to leverage the agility and speed that the cloud offers but also to avoid “cloud lock-in.” The proprietary tools and platforms that the public cloud providers offer make achieving that goal much more difficult and enterprises need to be aware of the potential pitfalls. For example, without an abstraction layer, moving to a cloud database still requires direct connections for applications and tools, which can inhibit flexibility in the future and result in sky-high switching costs. Expect to see customers turn to cross-platform, cloud-independent formats and tools in lieu of direct connections to the cloud vendors’ proprietary tools. Cloud independent tools for managing compute, provisioning, databases, BI and AI will become essential to keeping enterprises flexible and agile.
Advances in Data Virtualization Mitigate the Data Locale Challenge
The data landscape for the self-service BI and AI user is already complex. Today, analysts and data scientists need to know (1) which data platform has the data they need, (2) how to access it, and (3) how to query it. As data migrates to the cloud, citizen data scientists also need to know “where” the data is: on-premises, in the cloud or spread across both. The latest innovations in data virtualization address these problems by rendering the form and location of data obsolete. By embracing a business intelligence virtualization approach, enterprises can move data to new platforms and locations without disrupting downstream users. Even better, intelligent data virtualization can centralize business logic and provide a single point of control for governing data access and security.
New Tools for Managing Data Security and Data Governance Emerge to Address the Complexity of Managing Data Across Multiple Domains
As self-service data consumption continues to grow, enterprises are struggling to manage access to the myriad of data stored on-premises and in the cloud. On top of that, consumer privacy legislation is coming to the Americas following Europe’s implementation of the General Data Protection Regulation (GDPR). Centrally governed data access addresses both challenges and solutions such as data catalogs and virtualization solutions will bring order to the chaos. Expect enterprises to devote more of their spending to these areas as security spending is set to exceed $124 billion in 2019 (Gartner).
Traditional Cloud Migration Is Being Disrupted by Virtualization
The journey of cloud data migration continues to be long and complicated, but advances in data virtualization are eliminating once-difficult challenges such as vendor lock-in, data location, and interoperability between databases and various BI tools. Other challenges such as security and governance of data, while not eliminated, are substantially addressed by the database-spanning nature of virtualization. Data virtualization is dramatically changing the difficulties and possibilities of cloud migration, making the process more accessible, sooner for enterprises.
Opinions expressed by DZone contributors are their own.