DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Related

  • Three Must-Have Data Center Security Practices
  • Multi-Threaded Geo Web Crawler In Java
  • OWASP TOP 10 API Security Part 2 (Broken Object Level Authorization)
  • Micronaut in the Cloud: Micronaut Data

Trending

  • Introduction to Retrieval Augmented Generation (RAG)
  • Code Reviews: Building an AI-Powered GitHub Integration
  • Building Resilient Networks: Limiting the Risk and Scope of Cyber Attacks
  • Unlocking Data with Language: Real-World Applications of Text-to-SQL Interfaces
  1. DZone
  2. Data Engineering
  3. Data
  4. Demystifying Cloud-Native Data Management: Layers of Operation

Demystifying Cloud-Native Data Management: Layers of Operation

This piece addresses the pros and cons of various data management approaches across several attributes including consistency, storage requirements, and performance.

By 
Gaurav Rishi user avatar
Gaurav Rishi
·
Jun. 24, 22 · Analysis
Likes (1)
Comment
Save
Tweet
Share
7.4K Views

Join the DZone community and get the full member experience.

Join For Free

As containerized applications go through an accelerated pace of adoption, Day 2 services have become a here-and-now problem. These Day 2 services include data management functions such as backup and disaster recovery along with application mobility. In this new world of containerized cloud-native applications, microservices use multiple data services (MongoDB, Redis, Kafka, etc.) and storage technologies to store state and are typically deployed in multiple locations (regions, clouds, on-premises).

In this environment, where legacy infrastructure or hypervisor-based solutions don’t work, what are the right constructs for designing and implementing these data management functions for cloud-native applications? How should you reason about the various data management options provided by storage vendors, data services vendors, and cloud vendors to decide the right approach for your environment and needs? This piece dives under the covers and addresses the pros and cons of various data management approaches across several attributes including consistency, storage requirements, and performance.

Defining a Vocabulary

To start, we’ll deconstruct and simplify a stack to show where the data may reside in a cloud-native application.

Simplified Cloud-Native Application Stack

When thinking of data management, we could be operating on one (or more!) of the layers shown in the figure above. Let’s enumerate these layers:

1. Physical Storage

Physical Storage Layer in a Cloud-Native Stack

This layer includes various storage hardware options that can store state in non-volatile memory with a choice of physical media ranging from NVMe and SSD devices to spinning disks and even tape. They come in different form factors including arrays and stand-alone rack servers.

Physical storage could be located:

  • On-premises, where you may encounter storage hardware from vendors such as Seagate, Western Digital, and Micron.
  • In a managed cloud provider’s data centers. While you may never encounter a physical device, you know it’s there giving that "cloud" gravity!

2. File and Block Storage

File and Block Storage Layer in a Cloud-Native Stack

This software layer provides File or Block level constructs to enable efficient read and write operations from the underlying physical storage. In both cases (file and block), the underlying storage could be stand-alone (local disks) or a shared network resource (NAS or SAN).

  • Block storage implementations allow you to create raw storage volumes from local or remote disks that have low latency and are accessible through protocols such as iSCSI and FiberChannel. Block storage implementations on cloud providers include Amazon EBS and GCE Persistent Disks.
  • File Storage provides shared storage for file semantics and operations using protocols such as NFS and SMB. File storage implementations commonly found on-premises include products from NetApp and Dell EMC. File storage implementations on cloud providers include Amazon EFS, Google Cloud Filestore, and Azure Files.

This layer often provides snapshotting capabilities to make a point-in-time copy of your volume for protection. Also, in Kubernetes environments, this layer provides Container Storage Interface (CSI) drivers to normalize the APIs that higher layers can exercise to call out snapshotting functions. Note that not all implementations of CSI are created equal in terms of the supported capabilities.

3. Data Services

Data Services Layer in a Cloud-Native Stack

This layer resides on top of the file/block storage implementation. It provides various database implementations as well as an increasingly popular storage type, namely object (a.k.a., blob) storage. This is the layer that applications typically interface with and the choice of the underlying database implementations is based on the workloads and business logic. With microservices-based applications, polyglot persistence is a norm since every microservice picks the most appropriate data service for the job at hand.

Some of the database types and a subset of example implementations include:

  • SQL Databases: MySQL, PostgreSQL, SQL Server
  • NoSQL Databases:
    • Key-Value Stores: Redis, BerkeleyDB
    • Time-series Databases: InfluxDB, Prometheus
    • Graph Databases: Neo4j, GraphDB
    • Wide-column stores: Cassandra, Azure Cosmos
    • Document stores: MongoDB, CouchDB
  • Message Queues: Kafka, RabbitMQ, Amazon SQS
  • Object Stores1: Amazon S3, Google Cloud Storage, Minio

There are also several hosted implementations of these databases, commonly referred to as Database-as-a-Service (DBaaS) systems. These typically include one of the database categories listed above and can sometimes provide auto-scaling along with the consumption economics of an as-a-Service (-aaS) business. Examples of DBaaS systems include Amazon RDS, MongoDB Atlas, and Azure SQL.

From the perspective of data protection, each one of the database implementations provides a specific set of utilities (pg_dump or WAL-E for PostgreSQL, mongodump for MongoDB, etc.) to backup and restore data. It is important to note that there are many utilities available with widely varying capabilities in terms of consistency, recovery granularity, and speed. They are usually restricted to a particular database implementation or, at best, a database type regardless of whether they are provided as a standalone utility or as a part of an -aaS offering.

4. Stateful Application

Stateful Applications Layer in a Cloud-Native Stack

The application layer is where the business logic resides and, in the cloud-native world, applications are typically based on modern agile development methodologies and implemented as distributed microservices. Almost all applications have a state that needs to persist. While there are several patterns of storing application state, we need to persist and protect the following information in the context of a stateful Kubernetes application as an atomic unit:

  • Application Data: Across various Data Services, Block, and File storage implementations spread across multiple containers.
  • Application Definition and Configuration: The application image and the associated environment configuration are spread across various Kubernetes objects including ConfigMaps, Secrets, etc.
  • Other Configuration State: Including CI/CD pipeline state, release information, and associated Helm deployment metadata.

Under the Hood of a Typical Cloud-Native Application

An example of a stateful application is shown in the figure above which highlights some of the components and the associated state that needs to be protected. It is important to note that, for real-world deployment, the application is composed of hundreds of these underlying components. Also, in a cloud-native construct, the unit of atomicity for protection needs to be the application vs. the underlying data service or storage infrastructure layer. As enumerated earlier, this is because an application's state includes the application data, definition, and configuration that is spread across multiple physical or virtual nodes and data services.

Conclusion

From a backup/recovery and application portability perspective, a good data management solution needs to treat the entire application as the unit of atomicity, rendering traditional hypervisor-centric solutions obsolete. We also illustrated a simple stack diagram to show where the application state actually resides from the perspective of various data services, block, and file stores as well as physical storage across both on-premises and cloud implementations. This defines a basic vocabulary that allows us to dive under the covers of the layers of operations of cloud data management.

Footnote

1. Some may argue that Object Stores should belong in the same layer as File/Block. In this paper Object stores will be treated as another data service with a key-value interface that, if needed, can be run within Kubernetes. 

Data management Database Implementation application Cloud Data (computing)

Published at DZone with permission of Gaurav Rishi. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Three Must-Have Data Center Security Practices
  • Multi-Threaded Geo Web Crawler In Java
  • OWASP TOP 10 API Security Part 2 (Broken Object Level Authorization)
  • Micronaut in the Cloud: Micronaut Data

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!