DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
The Latest "Software Integration: The Intersection of APIs, Microservices, and Cloud-Based Systems" Trend Report
Get the report
  1. DZone
  2. Data Engineering
  3. Big Data
  4. Introducing the Open Hybrid Architecture Initiative

Introducing the Open Hybrid Architecture Initiative

A big data expert discusses what his team has been up to in order to create an open source cloud-based data architecture that will let data scientists collaborate better.

Arun Murthy user avatar by
Arun Murthy
·
Sep. 14, 18 · Opinion
Like (1)
Save
Tweet
Share
4.11K Views

Join the DZone community and get the full member experience.

Join For Free

The concept of a modern data architecture has evolved dramatically over the past 10-plus years. Turn the clock back and recall the days of legacy data architectures, which had many constraints. Storage was expensive and had associated hardware costs. Compute often involved appliances and more hardware investments. Networks were expensive, deployments were only on-premises, and proprietary software and hardware were locking in enterprises everywhere you turned.

This was (and for many organizations still is) a world of transactional silos where the architecture only allowed for post-transactional analytics of highly structured data. The weaknesses in these legacy architectures were exposed with the advent of new data types such as mobile and sensors, and new analytics such as machine learning and data science. Couple that with the advent of cloud computing and you have a perfect storm.

A multitude of interconnected factors disrupted that legacy data architecture era. Storage became cheaper and new software such as Apache Hadoop took center stage. Compute also went the software route and we saw the start of edge computing. Networks became ubiquitous and provided the planet with 3G/4G/LTE connectivity, deployments started to become hybrid, and enterprises embraced open source software. This led to a rush of innovation as customer requirements changed, influencing the direction that vendors had to take to modernize the data architecture.

The emergence of cloud created the need to evolve again to take advantage of its unique characteristics such as decoupled storage and compute. As a result, this led to connected data architectures, with the Hadoop ecosystem evolving for IaaS and PaaS models and innovations such as Hortonworks DataPlane Service (DPS) for connecting deployments in the data center and the public cloud.

Given that data has "mass" and is responsible for the rapid rise of cloud adoption, the data architecture must evolve again to meet the needs of today's enterprises and take advantages of the unique benefits of cloud. So much more is required in a data architecture today to achieve our dreams of digital transformation, real-time analytics, and artificial intelligence — just to name a few. This paves the way for pre-transaction analysis and drives use cases such as 360-degree view of the customer. Organizations a unified hybrid architecture for on-premises, multi-cloud and edge environments. The time has come to once again reimagine the data architecture, with hybrid as a key requirement.

What does it take to be hybrid? We've been innovating to answer this question for some time. Hybrid requires:

  • Cloud-native Hadoop for public cloud — delivered with Hortonworks Data Platform (HDP) and Hortonworks DataFlow (HDF) on IaaS.
  • Data flow and management to and from the edge — delivered with HDF, and specifically with MiNiFi.
  • Consistent security and data governance across all tiers — delivered with DPS.
  • A consistent architecture in the cloud and on-premises. This is the last mile.

The last point on consistent architectures is critical - not just from a technology standpoint, but more because the differences manifest themselves in a fundamental manner in the interaction model for the user vis-a-vis the technology. As an example, when it comes to the Hadoop ecosystem today, users walk up to a shared, multi-tenant cluster and just submit their SQL queries, Spark applications, etc. In the cloud, however, users have to provision their workloads such as query instances, Spark clusters, etc., before they can run analytics.


The Open Hybrid Architecture Initiative

Today, we are excited to announce the Open Hybrid Architecture initiative — the last mile of our endeavor to deliver on the promise of hybrid. This initiative is a broad effort across the open-source communities, the partner ecosystem and Hortonworks platforms to enable a consistent experience by bringing the cloud architecture on-premises for the enterprise.

Another key benefit is helping customers settle on a consistent architecture and interaction model which allows them to seamlessly move data and workloads across on-premises and multiple clouds using platforms such as DPS.

Through the initiative, we deliver an architecture where it absolutely will not matter where your data is — in any cloud, on-prem or the edge — enterprises can leverage open-source analytics in a secure and governed manner. The benefits of ensuring a consistent interaction model cannot be overstated, and provides the key to unlocking a seamless experience.

The Open Hybrid Architecture initiative will make this possible by:

  • De-coupling storage, with both file system interfaces and an object-store interface to data.
  • Containerizing compute resources for elasticity and software isolation.
  • Sharing services for metadata, governance and security across all tiers.
  • Providing DevOps/orchestration tools for managing services/workloads via the "infrastructure is code" paradigm to allow spin-up/down in a programmatic manner.
  • Designating workloads specific to use cases such as EDW, data science, etc., rather than sharing everything in a multi-tenant Hadoop cluster.

So, What Happens Next?

After careful consideration, we've determined the best path forward is a phased approach, similar to how Hortonworks delivered enterprise-grade SQL queries-on-Hadoop via the and Stinger.Next initiatives.

The Open Hybrid Architecture initiative will include the following development phases:

  • Phase 1: Containerization of HDP and HDF workloads with DPS driving the new interaction model for orchestrating workloads by programmatic spin-up/down of workload-specific clusters (different versions of Hive, Spark, NiFi, etc.) for users and workflows.
  • Phase 2: Separation of storage and compute by adopting scalable file-system and object-store interfaces via the Apache Hadoop HDFS Ozone project.
  • Phase 3: Containerization for portability of big data services, leveraging technologies such as Kubernetes for containerized HDP and HDF. Red Hat and IBM partner with us on this journey to accelerate containerized big data workloads for hybrid. As part of this phase, we will certify HDP, HDF, and DPS as Red Hat Certified Containers on RedHat OpenShift, an industry-leading enterprise container and Kubernetes application platform. This allows customers to more easily adopt a hybrid architecture for big data applications and analytics, all with the common and trusted security, data governance and operations that enterprises require.

Just as we enabled the modern data architecture with HDP and YARN back in the day, we're at it again — but this time it's bringing the innovation we've done in the cloud down to our products in the data center.

Hortonworks has been on a multi-year journey toward cloud-first and cloud-native architectures. The Open Hybrid Architecture initiative is the final piece of the puzzle. Not only will this initiative bring cloud-native to the data center, but it will also help our customers embrace and master the unified hybrid architectural model that is required to get the full benefits of on-premises, cloud and edge computing. We, along with our partner ecosystem and the open-source community, are excited to tackle this next redesign of the modern data architecture.

Architecture Big data Initiative hadoop Cloud computing Machine learning Open source Data architecture

Published at DZone with permission of Arun Murthy, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • NoSQL vs SQL: What, Where, and How
  • Stop Using Spring Profiles Per Environment
  • Introduction to Spring Cloud Kubernetes
  • Tracking Software Architecture Decisions

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: