Containers Resources

Why Use LocalPV with NVMe for Your Workload?

Containerized applications are ephemeral, which means any data created by a container is lost as soon as the process terminates. This requires a pragmatic approach to data persistence and management when orchestrating containers using Kubernetes. To deal with this, the Kubernetes orchestration platform uses Volume plugins to isolate storage consumption from provisioned hardware. A Persistent Volume (PV) is a Kubernetes API resource that provisions persistent storage for PODs. Cluster resources can use a PV construct to mount any storage unit -- file system folders or block storage options -- to Kubernetes nodes. PODs request for a PV using Persistent Volumes Claims (PVC). These storage integrations and other features make it possible for containerized applications to share data with other containers and preserve the container state. PVs can be provisioned statically by the cluster administrator or dynamically using Storage Classes. Some important features that distinguish different Storage Classes include Capacity, Volume Mode, Access Modes, Performance and Resiliency. When a Local Disk is attached directly to a single Kubernetes node, it is known as a Local PV which provides the best performance and is only accessible from a single node where it is attached. This post explores why LocalPV and NVMe storage should be used for Kubernetes workloads. Non-Volatile Memory Express (NVMe) for Kubernetes NVMe is a high-speed access protocol that delivers low latency and high throughput for SSD storage devices by connecting them to the processor through a PCIe interface. Early SSDs connected to the CPU through SATA or Serial Attached SCSi (SAS). These relied on legacy standards customized for Hard Disk speeds which were considered inefficient since each connection to the processor remained limited by synchronized locking or the SAS Host Bus Adapter (HBA). To overcome this challenge, NVMe unlocks the true potential of flash storage using the Peripheral Component Interconnect Express (PCIe) that supports high performance, Non-Uniform Memory Access (NUMA). NVMe also supports parallel processing, with 64K Input-Output queues with each queue having 64K entries. This high-bandwidth, low-latency storage hosts applications that can create as many I/O queues as system configuration, workload and the NVMe controller allows. Following a NUMA based storage protocol, NVMe allows different CPUs to manage I/O queues, using various arbitration mechanisms. Modern enterprises are data-driven, with users and devices generating huge amounts of data that may overwhelm companies. By enhancing the capabilities of multi-core CPUs, NVMe provides low latency and fast transfer rates for better access and processing of large data sets. NVMe devices typically rely on NAND Flash Memory that can be hosted on various SSD form factors including normal SSDs, U2 Cards, M2 Cards, and PCIe Add-In Cards. NVMe over Fabrics (NVMe-oF) extends the advantages of NVMe storage access by implementing the NVMe protocol for remotely connected devices. The architecture allows one node to directly access a storage device of another computer over several transport protocols. NVMe Architecture In NVMe architecture, the host computer is connected to SSD storage devices via a high throughput Host-Controller Interface. The storage service is composed of three main elements: SSD Controllers The PCIe Host Interface Non-Volatile Memory (e.g., NAND Flash) To submit queues to the Input/Output, the NVMe controller utilizes Memory-Mapped Controller Registers and the host system’s DRAM. The number of mapped registers determines the number of parallel I/O operations the protocol can support. A typical NVMe storage architecture Advantages of Using NVMe for Kubernetes Clusters PCIe reduces the need for various abstract implementation layers, allowing for faster, efficient storage. Some benefits of using NVMe for storage include: Efficient memory transfer - NVMe protocol only requires one ring per CPU to communicate directly with Non-Volatile Memory, thereby reducing locking speeds for I/O controllers. NVMe also enables parallelism by combining the number of Message Signalled Interrupts with multi-core CPUs to further reduce latency. Secured Cluster Data - NVMe-oF enables secure tunnelling protocols developed and managed by reputable data security communities such as the Trusted Security Group (TSG). This enables enterprise-grade security features such as Encryption at REST, Access Control and Crypto-Erase for cluster nodes and SSD storage devices. Supports Multi-Core Computing - The NVMe protocol utilizes a private queueing strategy to support up to 64K commands per queue over 64K queues. Since every controller has its own set of queues, the throughput increases linearly with the number of CPU cores available. Requires Fewer Instructions to Process I/O requests - NVMe relies on an efficient set of commands to half the number of CPU instructions required to implement Input-Output operations. This reduces latency while enabling advanced security features like reservations and power management for cluster administrators. Why use LocalPV with NVMe Storage for Kubernetes Clusters? While most storage systems used to persist data for Kubernetes clusters are remote and independent of the source nodes, it is possible to attach a local disk directly to a single node. Locally attached storage typically guarantees higher performance and tighter security than remote storage. A Kubernetes LocalPV represents a portion of local disk storage that can be used for data persistence in StatefulSets. With LocalPV, the local disk is specified as a persistent volume that can be consumed with the same PVC and Storage Class abstractions used for remote storage. This results in low latency storage that is suitable for fault-tolerant use-cases such as: Distributed data stores that share replicated data across multiple nodes LocalPV can also be used to cache data sets that require faster processing over data gravity LocalPV vs. hostPath Volumes Before the introduction of LocalPV volumes, hostPath volumes were used for accessing local storage. There were certain challenges while orchestrating local storage with hostPath as it didn’t support important Kubernetes features, such as StatefulSets. Additionally, hostPath volumes required separate operators for disk management, POD scheduling and topology, making them difficult to use in production environments. LocalPV volumes were designed in response to issues with the scheduling, disk accounting and portability of hostPath volumes. One of the major distinctions is that the Kubernetes control plane knows the Node that owns a LocalPV. With hostPath, data is lost when a POD referencing the volume is scheduled to a different node. LocalPV volumes can only be referenced using a Persistent Volume Claim (PVC) while hostPath volumes can be referenced both directly in the POD definition file and via PVC. How to Configure a Kubernetes Cluster with LocalPV NVMe Storage Workloads can be configured to access NVMe SSDs on a local machine using LocalPV and a Persistent Volume Claim, or StatefulSet with volume claim attributes. This section explores how to attach a local disk to a Kubernetes cluster with NVMe storage configured. The first step is to create a storage class that enables Volume Topology-Aware Scheduling. This will instruct the Kubernetes API to not bind a PVC until a Pod consuming the PVC is scheduled. The configuration file for the storage class will be similar to: YAML $ cat sc.yaml apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: openebs-device-sc allowVolumeExpansion: true parameters: devname: "test-device" provisioner: device.csi.openebs.io volumeBindingMode: WaitForFirstConsumer Check the doc on storageclasses to know all the supported parameters for Device LocalPV. If the device with a meta partition is available on certain nodes only, then make use of topology to tell the list of nodes where we have the devices available. As shown in the below storage class, we can use allowedTopologies to describe device availability on nodes. YAML apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: openebs-device-sc allowVolumeExpansion: true parameters: devname: "test-device" provisioner: device.csi.openebs.io allowedTopologies: - matchLabelExpressions: - key: kubernetes.io/hostname values: - device-node1 - device-node2 The above storage class tells that device with meta partition test-device is available on nodes device-node1 and device-node2 only. The Device CSI driver will create volumes on those nodes only. The OpenEBS Device driver has its own scheduler which will try to distribute the PV across the nodes so that one node should not be loaded with all the volumes. Currently, the driver supports two scheduling algorithms: VolumeWeighted and CapacityWeighted, in which it will try to find a device which has lesser number of volumes provisioned in it or less capacity of volume provisioned out of a device respectively, from all the nodes where the devices are available. To know about how to select a scheduler via storage-class, refer this link. Once it is able to find the node, it will create a PV for that node and also create a DeviceVolume custom resource for the volume with the node information. The watcher for this DeviceVolume CR will get all the information for this object and create a partition with the given size on the mentioned node. The scheduling algorithm currently only accounts for either the number of volumes or total capacity occupied from a device and does not account for other factors like available cpu or memory while making scheduling decisions. So if you want to use node selector/affinity rules on the application pod, or have cpu/memory constraints, a Kubernetes scheduler should be used. To make use of kubernetes scheduler, you can set the volumeBindingMode as WaitForFirstConsumer in the storage class. This will cause a delayed binding, i.e Kubernetes scheduler will schedule the application pod first, and then it will ask the Device driver to create the PV. The driver will then create the PV on the node where the pod is scheduled. YAML apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: openebs-device-sc allowVolumeExpansion: true parameters: devname: "test-device" provisioner: device.csi.openebs.io volumeBindingMode: WaitForFirstConsumer Please note that once a PV is created for a node, the application using that PV will always get scheduled to that particular node only, as PV will be sticky to that node. The scheduling algorithm by Device driver or kubernetes will come into picture only during the deployment time. Once the PV is created, the application can not move anywhere as the data is there on the node where the PV is. Create a PVC using the storage class created for the device driver. YAML $ cat pvc.yaml kind: PersistentVolumeClaim apiVersion: v1 metadata: name: csi-devicepv spec: storageClassName: openebs-device-sc accessModes: - ReadWriteOnce resources: requests: storage: 4Gi Create the deployment YAML using the PVC backed by device driver storage. YAML $ cat fio.yaml apiVersion: v1 kind: Pod metadata: name: fio spec: restartPolicy: Never containers: - name: perfrunner image: openebs/tests-fio command: ["/bin/bash"] args: ["-c", "while true ;do sleep 50; done"] volumeMounts: - mountPath: /datadir name: fio-vol tty: true volumes: - name: fio-vol persistentVolumeClaim: claimName: csi-devicepv After the deployment of the application, we can go to the node and see that the partition is created and is being used as a volume by the application for reading/writing the data. Advantages of Using LocalPV with NVMe for Kubernetes Operators Some benefits of integrating LocalPV into clusters using NVMe for storage include: Compared to remotely connected storage systems, Local Persistent Volumes support more Input-Output Operations Per Second (IOPS) and throughput since the volume directory is directly mounted on the node. This means that with LocalPV Volumes, organizations can hone in on the high performance offered by NVMe SSDs. LocalPV also enables the dynamic reservation of storage resources needed for stateful services. This makes it easy to relaunch a process on the same node using the same SSD volume. LocalPV volume configuration pins tasks to the nodes where their data resides, eliminating the need for scheduling constraints, thereby enabling quicker access of SSDs through NVMe. Destroying a LocalPV is as easy as deleting the PVC consuming it, allowing for simpler storage management. Summary Non-Volatile Memory Express (NVMe) enhances data storage and access by leveraging the performance benefits of flash memory for SSD based storage. By connecting storage devices to the CPU directly via the PCIe interface, data companies eliminate the bottlenecks associated with SATA or SAS based access. LocalPV reduces the data path between storage and Kubernetes nodes by mounting a volume directly on a Kubernetes node. This results in higher throughput and IOPS, suitable for fault-tolerant stateful applications. OpenEBS by MayaData is one of the popular open-source, agile storage stacks for performance-sensitive databases orchestrated by Kubernetes. Mayastor, OpenEBS’s latest storage engine, delivers very low overhead versus the performance capabilities of underlying devices.. OpenEBS Mayastor does not require NVMe devices or that workloads consume NVMe, although in both cases performance will increase. OpenEBS Mayastor is unique currently amongst open source storage projects in utilizing NVMe internally, to communicate to options OpenEBS replicas. To learn more about how OpenEBS Mayastor, leveraging NVMe as a protocol, performs when leveraging some of the fastest NVMe devices currently available on the market, visit this article. OpenEBS Mayastor builds a foundational layer that enables workloads to coalesce and control storage as needed in a declarative, Kubernetes-native way. While doing so, the user can focus on what's important, that is, deploying and operating stateful workloads. If you’re interested in trying out Mayastor for yourself, instructions for how to set up your own cluster, and run a benchmark like `fio` may be found at https://docs.openebs.io/docs/next/mayastor.html. Related Blogs: https://blog.mayadata.io/the-benefits-of-using-nvme-for-kubernetes https://blog.mayadata.io/mayastor-nvme-of-tcp-performance

August 12, 2021

by Sudip Sengupta

CORE

· 7,000 Views · 2 Likes

Container Attached Storage (CAS) vs. Shared Storage: Which One to Choose?

An Overview of Storage in Kubernetes Kubernetes supports a powerful storage architecture that is often complex to implement unless done right. The Kubernetes orchestrator relies on volumes-abstracted storage resources - that help to save and share data between ephemeral containers. Since these storage resources abstract the underlying infrastructure, volumes enable dynamic provisioning of storage for containerized workloads. In Kubernetes, shared storage is typically achieved by mounting volumes and connecting to an external filesystem or block storage solution. Container Attached Storage (CAS) is a relatively newer solution that allows Kubernetes administrators to deploy storage as containerized microservices in a cluster. The CAS architecture makes workloads more portable and simpler to modify storage based on application needs. Because CAS is deployed per workload or per cluster, it also eliminates the cross workload and cluster blast radius of traditional shared storage. This article compares CAS with traditional shared storage to explore their similarities, differences and architecture overview. Container Attached Storage: Container Attached Storage (CAS) is a solution for stateful workloads that deploys storage as a cluster running in the cloud or on-premises. Unlike traditional storage options where storage is a shared filesystem or block storage running externally, CAS enables storage controllers that can be managed by Kubernetes. These storage controllers can run anywhere with a Kubernetes distribution, whether on top of traditional shared storage systems, or managed storage services like Amazon EBS. Data stored in CAS is accessed directly from containers within the cluster, thereby significantly reducing Read/Write times. Architecture Overview: CAS leverages the container orchestrator’s environment to enable persistent storage. The CAS software has storage targets in containers that run as services. If desired, these services are replicated as microservice-based storage replicas that can easily be scheduled and scaled independently of each other. CAS services can be orchestrated using Kubernetes or any other orchestration platform as containerized workloads, ensuring the autonomy and agility of software development teams. For any CAS solution, the cluster is typically divided into two layers: The control plane consists of the storage controllers, storage policies, and instructions on how to configure the data plane. Control plane components are responsible for the provisioning volumes and other storage associated tasks. The data plane components receive and execute instructions from the control plane on how to save and access container information. The main element of the data plane is the Storage Engine which implements pooled storage. The engine is essentially responsible for the Input-Output volume path. Some popular storage engines of OpenEBS include Mayastor, cStor, Jiva and OpenEBS LocalPV. Some prominent users of OpenEBS include the CNCF, ByteDance(Tiktok), Optro, Flipkart, Bloomberg and others. Features: Container Attached Storage is built to primarily run on Kubernetes and other cloud-native container orchestrators. This makes the solution inherently platform-agnostic and portable, thereby making it an efficient storage solution that can be deployed on any platform without the inconvenience of vendor lock-in. CAS decomposes storage controllers into constituent units that can be scaled and run independently. Every storage controller is attached to a Persistent Volume and typically runs within the user-space, achieving storage granularity and independence from underlying operating systems Control plane entities are deployed as Custom Resource Definitions that deal with physical storage entities such as disks Data plane entities are deployed as a collection of PODs running in the same cluster as the workload The CAS architecture can offer synchronous replication in order to add additional availability. When to Use: Container Attached Storage is steadily becoming the de-facto standard for persistent storage of stateful Kubernetes workloads. CAS is most like the Direct Attached Storage that many current workloads expect, such as NoSQL, logging, machine learning pipelines, Kafka and Pulsar. Many workload communities and users have embraced CAS. CAS also allows small teams to retain control over their workloads. In short, CAS may be preferred where: The workloads expect local storage Teams want to be able to efficiently turn local storage, including disks or cloud volumes, into volumes on demand for Kubernetes workloads Performance is a concern The loose coupling of the architecture is desired to be maintained at the storage layer Increased density of workloads on hosts is desired Small team autonomy is desired to be maintained Traditional Shared Storage: Shared storage was designed to allow multiple users/machines to access and store data in a pool of devices. Shared storage provided additional availability to workloads that themselves were unable to provide for their own availability; additionally, shared storage was able to work around the poor performance of the underlying disk which at the time we're able to deliver no more than 150 I/O operations per second. Today’s underlying drives can be 10,000 times more performant; massively faster than the performance requirements of most workloads. A shared storage infrastructure typically consists of block storage systems in Storage Area Networks (SANs) or file system based storage devices in Network Attached Storage (NAS) configurations. Adoption While the storage industry was once a rapidly growing industry, with growth rates in excess of 30% - 50% YoY in the late 1990s and early 2000s. In the 2010s this growth rate moderated and in certain years stopped entirely. In the 2020s growth started again, however, at a rate much slower than the exponential growth in the amount of data storage. Meanwhile, Direct Attached Storage and Cloud storage each grew more quickly in terms of capacity shipped and overall spending. Architecture Overview In traditional shared storage, all nodes in a network share the same physical storage resources but have their own private memory and processing devices. Files and other data can be accessed by any machine connected to the central storage. For a Kubernetes application, traditional shared storage is first implemented by using monolithic storage software to virtualize physical storage resources, which could either be bare-metal servers, SAN/NAS networks or block storage solutions. The software then connects to Persistent Volumes that store cluster data. Each Persistent Volume (PV) is bound to a Persistent Volume Claim (PVC) which application PODs use to request a portion of the shared storage. Both CAS and shared storage can utilize the Container Storage Interface (CSI). CSI is used to issue the commands to the underlying storage such as the need to provision a PV or to expand or snapshot that capacity. Features: Embraces centralized, consolidated storage for Block and File Storage systems, allowing administration from a single interface. Traditional storage is distinctly divided into 3 layers: the Hosts tier which has client machines, the Fabric layer which includes switches and other networking devices, and the storage layer which includes the controllers used to read/write data onto physical disks. Shared storage integrates redundancy into the design of storage devices, allowing systems to withstand failure to a sizable degree. To scale up traditional shared storage, additional storage devices should be deployed and configured into the existing array. When to Use Shared storage is used to manage large amounts of data generated and accessed by a number of different machines. This is because traditional shared storage enables high performance for large files with no bottlenecks or downtimes. Shared storage is also the go-to storage solution for organizations that depend on collaboration between teams. As data and files are managed centrally, shared storage allows efficient version control and consolidated information management. Traditional Shared Storage is also used to eliminate the need for multiple drives containing the same information, which helps reduce redundancies, thus increasing storage capacity. CAS vs. Shared Storage The two storage options vary greatly in how they persist application data. While traditional shared storage relies on an external array of storage devices to persist data, CAS uses containers within an orchestrated environment. Following are a few similarities and differences between CAS and Traditional Shared Storage: Similarities: Both CAS and traditional shared storage offer high availability storage for applications. CAS allows high availability using Data POD replicas that ensure storage is always available for the CAS cluster. While traditional shared storage uses a redundant design to ensure that the storage system can withstand failure. Both options provide quick storage options for critical applications. CAS uses agile microservices to ensure quick I/O times while shared storage allows multiple machines to quickly read and write data on a shared pool of storage devices, reducing the need to create connections between individual machines. Both solutions accommodate software-defined storage which leverages the performance of physical devices with the agility of software. Both can utilize the Container Storage Interface (CSI) to issue the commands to the underlying storage. Both can be Open Source, extending the openness of Kubernetes to the data layer. It appears that container attached storage is somewhat more likely to be open source however that is yet to be determined conclusively. Differences CAS follows a container-based microservice framework for storage, which means teams can take advantage of the agility and portability of containers to ensure faster, more efficient storage. On the contrary, traditional shared storage involves different virtual or physical machines reading/writing into a shared pool of storage devices, thereby increasing latency and reducing access speeds. CAS is platform-agnostic. This means CAS-based storage solutions can run either on-premises or the cloud, without requiring extensive configuration changes. While shared storage relies on Kernel modifications, making it is inefficient to deploy for workloads across different environments. While traditional shared storage relies on consolidated monolithic storage software, CAS runs on the userspace, enabling independent management capabilities for efficient storage administration at the granular level. CAS allows linear scalability since storage containers can be brought up as required, while in traditional shared storage, scaling involves adding newer devices to an existing storage array Summary Designed in Kubernetes, CAS enables agility, granularity and linear scalability, making it a favourite for cloud-native applications. Traditional shared storage offers a mature stack of storage technology that mainly falls short in persisting storage for stateful applications due to the inherent lack of linear scalability. CAS is a novel solution that enables the implementation of storage controllers to exist in userspace, allowing maximum scalability. OpenEBS, a popular CAS based storage solution, has helped several enterprises run stateful workloads. Originally developed by MayaData, OpenEBS is now a CNCF project with a vibrant community of organizations and individuals alike. This was also evident from CNCF’s 2020 Survey Report that highlighted MayaData (OpenEBS) in the top-5 list of popular storage solutions. Resources: Canonical definition of Container Attached Storage: https://www.cncf.io/blog/2020/09/22/container-attached-storage-is-cloud-native-storage-cas/ To read Adopter use-cases or contribute your own, visit: https://github.com/openebs/openebs/blob/master/ADOPTERS.md. CNCF 2020 Survey Report: https://www.cncf.io/wp-content/uploads/2020/11/CNCF_Survey_Report_2020.pdf OpenEBS LocalPV Quick Start Guide: https://docs.openebs.io/docs/next/localpv.html This article has already been published on https://blog.mayadata.io/container-attached-storage-cas-vs.-shared-storage-which-one-to-choose and has been authorized by MayaData for a republish.

August 10, 2021

by Sudip Sengupta

CORE

· 4,760 Views · 3 Likes

Microservices vs Monolith: The Ultimate Comparison 2021

Microservices vs Monolith: Why microservices are something new that has hit the software market thread while the monolithic approach is losing its value?

August 9, 2021

by Alfonso Valdes

· 8,460 Views · 7 Likes

Managing Secrets in Node.js With HashiCorp Vault

Safely manage your company's secrets by learning how to access Vault via Node.js applications, retrieve secrets, and interface with Vault via Web UI and CLI.

August 5, 2021

by Kentaro Wakayama

· 20,009 Views · 2 Likes

IBM App Connect Enterprise

In this article, we describe and provide a walkthrough of the configuration required to run a SAP Inbound scenario in an IBM App Connect Enterprise Container.

August 5, 2021

by Amar Shah

· 9,846 Views · 3 Likes

Advanced Kubernetes Setup for Spring Boot App With PostgreSQL DB

Minkube setup with Spring Actuators for probes, resource limits, and use of JVM Container support.

August 5, 2021

by Sven Loesekann

· 18,507 Views · 4 Likes

One CKA/CKAD/CKS Requirement: Mastering Kubectl

Mastering Kubectl is mandatory to manage a Kubernetes cluster. This post aims to give some guidelines to operate any Kubernetes objects in the command line.

August 4, 2021

by Nicolas Giron

· 5,521 Views · 5 Likes

PostgreSQL HA and Kubernetes

I share my thoughts about how to set up a PostgreSQL Database in Kubernetes with some level of high availability, introducing 3 different architectural styles to do so.

July 30, 2021

by Ralph Soika

· 18,529 Views · 3 Likes

A Beginner's Guide to Machine Learning: What Aspiring Data Scientists Should Know

Learn all about machine learning and it's different subsets, such as supervised learning, unsupervised learning, and the subsets within those tops.

Updated July 30, 2021

by Manoj Rupareliya

· 26,996 Views · 20 Likes

Container Attached Storage (CAS) vs. Software-Defined Storage - Which One to Choose?

Hardware abstraction involves the creation of a programming layer that allows the computer operating system to interact with hardware devices at a general rather than detailed level. This layer involves logical code implementation that avails the hardware to any software program. For storage devices, abstraction provides a uniform interface for users accessing shared storage, concealing the hardware’s implementation from the operating system. This allows software running on user machines to get the highest possible performance from the storage devices. It also allows for device-independent programs since storage hardware abstraction enables device drivers to access each storage device directly. Kubernetes is, by nature, infrastructure agnostic, for that it relies on plugins and volume abstractions to decouple storage hardware from applications and services. On the other hand, containers are ephemeral, and immediately lose data when they terminate. Kubernetes persists data created and processed by containerized applications on Physical Storage devices using Volumes and Persistent Volumes. These abstractions connect to storage hardware through various types of Hardware Abstraction Layer (HAL) implementations. Two commonly used HAL storage implementations for Kubernetes clusters are Container Attached Storage (CAS) and Software Designed Storage (SDS). This blog delves into fundamental differences of CAS and SDS, the benefits of each, and the most appropriate use-cases for typical HAL storage implementations. Container Attached Storage Vs Software-Defined Storage Kubernetes employs abstracted storage for portable, highly available and distributed storage. The Kubernetes API supports various CAS and SDS storage solutions connecting through the CSI interface. Let us take a closer look into the functioning of both the abstraction models and the purpose each solves for storage in a Kubernetes cluster. Container Attached Storage Container Attached Storage (CAS) introduces a novel approach of persisting data for stateful workloads in Kubernetes clusters. With CAS, storage controllers are managed and run in containers as part of the Kubernetes cluster. This allows storage portability since these controllers can be run on any Kubernetes platform, whether on personal machines, on-premises data centres or public cloud offerings. Since a CAS leverages a microservice architecture, the storage solution remains closely associated with the application that binds to physical storage devices, reducing I/O times. Container Attached Storage Architecture CAS leverages the Kubernetes environment to enable the persistence of cluster data. The storage solution runs storage targets in containers. These targets are microservices that can be replicated for independent scaling and management. For enhanced autonomy and agility, these microservice-based storage targets can then be orchestrated using a platform like Kubernetes. A CAS cluster uses the control plane layer for storage management while the data plane layer is used to run storage targets/workloads. Storage controllers in the control plane provision volumes, spin up storage target replicas and perform other management associated tasks. Data plane components execute storage policies and instructions from control plane elements. These instructions typically include file paths, storage and access methods. The data plane additionally contains the storage engine which is responsible for implementing the actual Input-Output Path for file storage. Benefits of Container Attached Storage Container Attached Storage enables agile storage for stateful containerized applications. This is because it follows a microservice-based pattern which allows the storage controller and target replicas to be upgraded seamlessly. Containerization of storage software means that administrative teams can dynamically allocate and update storage policies for each volume. With CAS, low-level storage resources are represented using Kubernetes Custom Resource Definitions. This allows for seamless integration between storage and cloud-native tooling, which enables easier management and monitoring. CAS also ensures storage is vendor-agnostic since stateful workloads can be moved from one Kubernetes deployment environment to another without disrupting services. Container Attached Storage Use-Cases CAS uses storage target replication to ensure high availability, avoiding blast radius limitations of traditional distributed storage architecture. This makes CAS the top storage choice for cloud-native applications. CAS is also appropriate for organizations looking to orchestrate their storage across multiple clouds. This is because CAS can be deployed on any Kubernetes platform. Container Attached Storage enables simple storage backup and replication, making it perfect for applications that require scale-out storage. It is also perfect for development teams that want to improve read-write times for their Continuous Integration and Development (CI/CD) pipelines. Popular CAS solutions providers for Kubernetes include: OpenEBS StorageOS Portworx Longhorn Software-Defined Storage Software-Defined Storage architecture relies on data programs to decouple running applications from storage hardware. This simplifies the management of storage devices by abstracting them into virtual partitions. Management is then enabled on a Data Management Interface (DMI) that hosts command and control functions. Features of Software-Defined Storage With Software-Defined Storage, the data/service management interface is hosted on a master server that controls storage layers consisting of shared storage pools. This makes provisioning and allocation of storage easy and flexible. Following are some of the key features of software-defined storage: Device Abstraction - Data I/O services should be delivered uniformly to users regardless of the underlying hardware. Through SDS, storage abstraction constructs, such as repositories, file shares, volumes, and Logical Unit Numbers (LUNs) are used to create a clear divide between physical hardware and logical aspects of data storage. Automation - The SDS solution implements workflows and algorithms that reduce the amount of manual work performed by administrators. To enable efficient automation, SDS storage systems adapt to varying performance and data needs that require little human intervention. Disaggregated, Pooled Storage - Physical storage devices are part of a shared tool from which the software can carve out storage for services and applications. This allows SDS to use available storage efficiently when required, thereby resulting in optimum usage of resources. Advantages of Software-Defined Storage Some benefits of using SDS include: Enhanced Scalability - Decoupling hardware resources allows administrators to allocate physical storage dynamically depending on workload. Pooled, disaggregated storage enabled by SDS allows for both vertical and horizontal scaling of physical volumes, supporting larger capacity and higher performance. Improved I/O Performance - SDS enables input-output parallelism to process host requests dynamically across multiple CPUs. SDS also supports large caching memory of up to 8TB, while enabling automatic data tiering. This allows faster input-output operations for quicker data processing. Interoperability - SDS uses the Data Management Interface as a translator that allows storage solutions running on different platforms to interact with each other. It also groups physically isolated storage hardware into logical pools, allowing organizations to host shared storage from different vendors. Reduced Costs - SDS storage solutions typically run on existing commodity hardware while optimizing the consumption of storage. SDS also enables automation that reduces the number of administrators required to manage storage infrastructure. These factors lead to lower upfront and operational expenses towards managing workloads. When to Use Software-Defined Storage SDS offers several benefits for teams looking to enhance storage flexibility at reduced costs. Some common use-cases for SDS include: Data centre infrastructure modernization Creating robust systems for mobile and challenging environments Creating Hybrid Cloud Implementations to be managed on the same platform Leveraging existing infrastructure for Remote and Branch Offices Comparing Container Attached Storage with Software-Defined Storage Similarities: Both CAS and SDS enable isolation between physical storage hardware and running applications. While doing so, both technologies abstract data management from data storage resources. The two HAL implementations share several features in common, including: Vendor-agnostic Both CAS and SDS architectures allow multiple workloads running on a single host. This allows administrators to avail a separation between storage devices and the access software. As a result, organizations can choose either CAS or SDS to implement a storage solution that can run on any platform, regardless of who develops or manages the tooling. Allow dynamic storage allocation SDS and CAS allow for the dynamic attachment and detachment of storage tools, thereby enabling automatic provisioning of data backups and replicas for high availability applications. Both SDS and CAS allow for automatic deployment of storage infrastructure, which allows for storage technology diversity and heterogeneity. Allow efficient infrastructure scaling CAS and SDS allow horizontal and vertical infrastructure scaling to automate data workflows. The two HAL approaches enable the creation of a composable disaggregated infrastructure that enhances the creation of versatile, distributed environments. Differences While SDS enables distributed storage management and reduced hardware dependencies, CAS allows for disintegrated storage that can be run using any container orchestration platform. This introduces various differences between CAS and SDS, including: Software-Defined Storage relies on traditional shared software with limitations on blast radius, while Container Attached Storage (CAS) allows the replication of storage software, allowing for independent management and scaling. CAS allows for scaling up/sideways in both storage and volume performance, while SDS enables the scaling up of storage nodes to improve storage capacity. SDS enables a Hyper-Converged Infrastructure (HCI) while CAS enables Highly Disaggregated Storage Infrastructure. Container Attached Storage and Software-Defined Storage both allow cluster administrators to leverage the benefits of hardware abstraction to persist data for stateful applications in Kubernetes. CAS allows the flexible management of storage controllers by allowing microservices-based storage orchestration using Kubernetes. On the other hand, Software-Defined Storage allows the abstraction of storage hardware using a programmable data control plane. CAS has all the features that a typical SDS provides, albeit tailored for container workloads and built with the latest software and hardware primitives. OpenEBS, a popular CAS based storage solution, has helped several enterprises run stateful workloads. Originally developed by MayaData, OpenEBS is now a CNCF project with a vibrant community of organizations and individuals alike. This was also evident from CNCF’s 2020 Survey Report that highlighted MayaData (OpenEBS) in the top-5 list of most popular storage solutions. To know more on how OpenEBS can help your organization run stateful workloads, contact us here. This article has already been published on https://blog.mayadata.io/container-attached-storage-cas-vs.-software-defined-storage-which-one-to-choose and has been authorized by MayaData for a republish.

July 30, 2021

by Sudip Sengupta

CORE

· 7,955 Views · 1 Like

Automated GitOps With Flux

GitOps: Git as a single source of truth for all our Kubernetes-related deployments. Let's discuss the later aspects of GitOps through Kubernetes and Flux.

July 28, 2021

by Vasu Maganti

· 7,441 Views · 2 Likes

Where Are Docker Images Stored on the Host Machine?

The Docker Images and other objects are stored inside the docker directory in the local machine depending upon the default storage driver being used by the machine

July 28, 2021

by Raunak Jain

· 11,035 Views · 1 Like

Enabling Docker Volume Local Directory for a Docker React Application

Docker Volume Local Directory setup helps make the development workflow a lot better for developers. We see how it can be achieved with a simple step.

July 28, 2021

by Saurabh Dashora

CORE

· 5,939 Views · 2 Likes

Running Apache Spark on Kubernetes

This article covers using Spark on K8s to overcome dependency on cloud providers and running Apache Spark on Kubernetes.

July 26, 2021

by Ramiro Alvarez Fernandez

· 12,053 Views · 3 Likes

13 Best Practices for Using Helm

Helm is a tool for deploying applications to Kubernetes clusters. Here are 13 best practices to help you create, operate, and upgrade applications using Helm.

July 26, 2021

by Kentaro Wakayama

· 6,400 Views · 2 Likes

The Perfect SaaS Tech Stack

Learn how to create your Perfect SaaS tech stack with the best programming language and build a multi tenant architecture on AWS for your SaaS web app.

July 24, 2021

by Alfonso Valdes

· 9,168 Views · 4 Likes

Next-Gen Data Pipes With Spark, Kafka and k8s

This article examines the architecture patterns and provides some sample code for the readers to implement in their own environment.

July 23, 2021

by Subhendu Dey

· 14,585 Views · 4 Likes

Top 25 DevOps Tools for 2021

DevOps is transforming the state of software development worldwide. This article takes a detailed look at the Top 25 DevOps tools currently available.

July 22, 2021

by Vishnu Vasudevan

· 16,365 Views · 10 Likes

Simplifying IAC Using Terraform, Terragrunt, and Atlantis

In this article, we discussed integrating Terraform with Terragrunt and how to automate terraform operations with Atlantis.

July 20, 2021

by Hitendra Verma

· 5,816 Views · 3 Likes

How to Prepare for CKAD and CKA Certification

With around 50% developers CKA or CKAD certified, we share our experiences, study material, mistakes to avoid, FAQ, etc. about the CKA and CKAD certification.

July 20, 2021

by Gaurav Gahlot

· 7,325 Views · 2 Likes

The Latest Containers Topics