Introducing the Open Hybrid Architecture Initiative
A big data expert discusses what his team has been up to in order to create an open source cloud-based data architecture that will let data scientists collaborate better.
Join the DZone community and get the full member experience.
Join For FreeThe concept of a modern data architecture has evolved dramatically over the past 10-plus years. Turn the clock back and recall the days of legacy data architectures, which had many constraints. Storage was expensive and had associated hardware costs. Compute often involved appliances and more hardware investments. Networks were expensive, deployments were only on-premises, and proprietary software and hardware were locking in enterprises everywhere you turned.
This was (and for many organizations still is) a world of transactional silos where the architecture only allowed for post-transactional analytics of highly structured data. The weaknesses in these legacy architectures were exposed with the advent of new data types such as mobile and sensors, and new analytics such as machine learning and data science. Couple that with the advent of cloud computing and you have a perfect storm.
A multitude of interconnected factors disrupted that legacy data architecture era. Storage became cheaper and new software such as Apache Hadoop took center stage. Compute also went the software route and we saw the start of edge computing. Networks became ubiquitous and provided the planet with 3G/4G/LTE connectivity, deployments started to become hybrid, and enterprises embraced open source software. This led to a rush of innovation as customer requirements changed, influencing the direction that vendors had to take to modernize the data architecture.
The emergence of cloud created the need to evolve again to take advantage of its unique characteristics such as decoupled storage and compute. As a result, this led to connected data architectures, with the Hadoop ecosystem evolving for IaaS and PaaS models and innovations such as Hortonworks DataPlane Service (DPS) for connecting deployments in the data center and the public cloud.
Given that data has "mass" and is responsible for the rapid rise of cloud adoption, the data architecture must evolve again to meet the needs of today's enterprises and take advantages of the unique benefits of cloud. So much more is required in a data architecture today to achieve our dreams of digital transformation, real-time analytics, and artificial intelligence — just to name a few. This paves the way for pre-transaction analysis and drives use cases such as 360-degree view of the customer. Organizations a unified hybrid architecture for on-premises, multi-cloud and edge environments. The time has come to once again reimagine the data architecture, with hybrid as a key requirement.
What does it take to be hybrid? We've been innovating to answer this question for some time. Hybrid requires:
- Cloud-native Hadoop for public cloud — delivered with Hortonworks Data Platform (HDP) and Hortonworks DataFlow (HDF) on IaaS.
- Data flow and management to and from the edge — delivered with HDF, and specifically with MiNiFi.
- Consistent security and data governance across all tiers — delivered with DPS.
- A consistent architecture in the cloud and on-premises. This is the last mile.
The last point on consistent architectures is critical - not just from a technology standpoint, but more because the differences manifest themselves in a fundamental manner in the interaction model for the user vis-a-vis the technology. As an example, when it comes to the Hadoop ecosystem today, users walk up to a shared, multi-tenant cluster and just submit their SQL queries, Spark applications, etc. In the cloud, however, users have to provision their workloads such as query instances, Spark clusters, etc., before they can run analytics.
The Open Hybrid Architecture Initiative
Today, we are excited to announce the Open Hybrid Architecture initiative — the last mile of our endeavor to deliver on the promise of hybrid. This initiative is a broad effort across the open-source communities, the partner ecosystem and Hortonworks platforms to enable a consistent experience by bringing the cloud architecture on-premises for the enterprise.
Another key benefit is helping customers settle on a consistent architecture and interaction model which allows them to seamlessly move data and workloads across on-premises and multiple clouds using platforms such as DPS.
Through the initiative, we deliver an architecture where it absolutely will not matter where your data is — in any cloud, on-prem or the edge — enterprises can leverage open-source analytics in a secure and governed manner. The benefits of ensuring a consistent interaction model cannot be overstated, and provides the key to unlocking a seamless experience.
The Open Hybrid Architecture initiative will make this possible by:
- De-coupling storage, with both file system interfaces and an object-store interface to data.
- Containerizing compute resources for elasticity and software isolation.
- Sharing services for metadata, governance and security across all tiers.
- Providing DevOps/orchestration tools for managing services/workloads via the "infrastructure is code" paradigm to allow spin-up/down in a programmatic manner.
- Designating workloads specific to use cases such as EDW, data science, etc., rather than sharing everything in a multi-tenant Hadoop cluster.
So, What Happens Next?
After careful consideration, we've determined the best path forward is a phased approach, similar to how Hortonworks delivered enterprise-grade SQL queries-on-Hadoop via the and Stinger.Next initiatives.
The Open Hybrid Architecture initiative will include the following development phases:
- Phase 1: Containerization of HDP and HDF workloads with DPS driving the new interaction model for orchestrating workloads by programmatic spin-up/down of workload-specific clusters (different versions of Hive, Spark, NiFi, etc.) for users and workflows.
- Phase 2: Separation of storage and compute by adopting scalable file-system and object-store interfaces via the Apache Hadoop HDFS Ozone project.
- Phase 3: Containerization for portability of big data services, leveraging technologies such as Kubernetes for containerized HDP and HDF. Red Hat and IBM partner with us on this journey to accelerate containerized big data workloads for hybrid. As part of this phase, we will certify HDP, HDF, and DPS as Red Hat Certified Containers on RedHat OpenShift, an industry-leading enterprise container and Kubernetes application platform. This allows customers to more easily adopt a hybrid architecture for big data applications and analytics, all with the common and trusted security, data governance and operations that enterprises require.
Just as we enabled the modern data architecture with HDP and YARN back in the day, we're at it again — but this time it's bringing the innovation we've done in the cloud down to our products in the data center.
Hortonworks has been on a multi-year journey toward cloud-first and cloud-native architectures. The Open Hybrid Architecture initiative is the final piece of the puzzle. Not only will this initiative bring cloud-native to the data center, but it will also help our customers embrace and master the unified hybrid architectural model that is required to get the full benefits of on-premises, cloud and edge computing. We, along with our partner ecosystem and the open-source community, are excited to tackle this next redesign of the modern data architecture.
Published at DZone with permission of Arun Murthy, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments