Integrating PostgreSQL Databases with ANF: Join this workshop to learn how to create a PostgreSQL server using Instaclustr’s managed service
Mobile Database Essentials: Assess data needs, storage requirements, and more when leveraging databases for cloud and edge applications.
Containers allow applications to run quicker across many different development environments, and a single container encapsulates everything needed to run an application. Container technologies have exploded in popularity in recent years, leading to diverse use cases as well as new and unexpected challenges. This Zone offers insights into how teams can solve these challenges through its coverage of container performance, Kubernetes, testing, container orchestration, microservices usage to build and deploy containers, and more.
[DZone Research] Join Us for Our 5th Annual Kubernetes Survey!
Shift-Left Monitoring Approach for Cloud Apps in Containers
Machine learning (ML) has seen explosive growth in recent years, leading to increased demand for robust, scalable, and efficient deployment methods. Traditional approaches often need help operationalizing ML models due to factors like discrepancies between training and serving environments or the difficulties in scaling up. This article proposes a technique using Docker, an open-source platform designed to automate application deployment, scaling, and management, as a solution to these challenges. The proposed methodology encapsulates the ML models and their environment into a standardized Docker container unit. Docker containers offer numerous benefits, including consistency across development and production environments, ease of scaling, and simplicity in deployment. The following sections present an in-depth exploration of Docker, its role in ML model deployment, and a practical demonstration of deploying an ML model using Docker, from the creation of a Dockerfile to the scaling of the model with Docker Swarm, all exemplified by relevant code snippets. Furthermore, the integration of Docker in a Continuous Integration/Continuous Deployment (CI/CD) pipeline is presented, culminating with the conclusion and best practices for efficient ML model deployment using Docker. What Is Docker? As a platform, Docker automates software application deployment, scaling, and operation within lightweight, portable containers. The fundamental underpinnings of Docker revolve around the concept of 'containerization.' This virtualization approach allows software and its entire runtime environment to be packaged into a standardized unit for software development. A Docker container encapsulates everything an application needs to run (including libraries, system tools, code, and runtime) and ensures that it behaves uniformly across different computing environments. This facilitates the process of building, testing, and deploying applications quickly and reliably, making Docker a crucial tool for software development and operations (DevOps). When it comes to machine learning applications, Docker brings forth several advantages. Docker's containerized nature ensures consistency between ML models' training and serving environments, mitigating the risk of encountering discrepancies due to environmental differences. Docker also simplifies the scaling process, allowing multiple instances of an ML model to be easily deployed across numerous servers. These features have the potential to significantly streamline the deployment of ML models and reduce associated operational complexities. Why Dockerize Machine Learning Applications? In the context of machine learning applications, Docker offers numerous benefits, each contributing significantly to operational efficiency and model performance. Firstly, the consistent environment provided by Docker containers ensures minimal discrepancies between the development, testing, and production stages. This consistency eliminates the infamous "it works on my machine" problem, making it a prime choice for deploying ML models, which are particularly sensitive to changes in their operating environment. Secondly, Docker excels in facilitating scalability. Machine learning applications often necessitate running multiple instances of the same model for handling large volumes of data or high request rates. Docker enables horizontal scaling by allowing multiple container instances to be deployed quickly and efficiently, making it an effective solution for scaling ML models. Finally, Docker containers run in isolation, meaning they have their runtime environment, including system libraries and configuration files. This isolation provides an additional layer of security, ensuring that each ML model runs in a controlled and secure environment. The consistency, scalability, and isolation provided by Docker make it an attractive platform for deploying machine learning applications. Setting up Docker for Machine Learning This section focuses on the initial setup required for utilizing Docker with machine learning applications. The installation process of Docker varies slightly depending on the operating system in use. For Linux distributions, Docker is typically installed via the command-line interface, whereas for Windows and MacOS, a version of Docker Desktop is available. In each case, the Docker website provides detailed installation instructions that are straightforward to follow. The installation succeeded by pulling a Docker image from Docker Hub, a cloud-based registry service allowing developers to share applications or libraries. As an illustration, one can pull the latest Python image for use in machine learning applications using the command: Shell docker pull python:3.8-slim-buster Subsequently, running the Docker container from the pulled image involves the docker run command. For example, if an interactive Python shell is desired, the following command can be used: Shell docker run -it python:3.8-slim-buster /bin/bash This command initiates a Docker container with an interactive terminal (-it) and provides a shell (/bin/bash) inside the Python container. By following this process, Docker is effectively set up to assist in deploying machine learning models. Creating a Dockerfile for a Simple ML Model At the heart of Docker's operational simplicity is the Dockerfile, a text document that contains all the commands required to assemble a Docker image. Users can automate the image creation process by executing the Dockerfile through the Docker command line. A Dockerfile comprises a set of instructions and arguments laid out in successive lines. Instructions are Docker commands like FROM (specifies the base image), RUN (executes a command), COPY (copies files from the host to the Docker image), and CMD (provides defaults for executing the container). Consider a simple machine learning model built using Scikit-learn's Linear Regression algorithm as a practical illustration. The Dockerfile for such an application could look like this: Dockerfile # Use an official Python runtime as a parent image FROM python:3.8-slim-buster # Set the working directory in the container to /app WORKDIR /app # Copy the current directory contents into the container at /app ADD . /app # Install any needed packages specified in requirements.txt RUN pip install --no-cache-dir -r requirements.txt # Make port 80 available to the world outside this container EXPOSE 80 # Run app.py when the container launches CMD ["python", "app.py"] The requirements.txt file mentioned in this Dockerfile lists all the Python dependencies of the machine learning model, such as Scikit-learn, Pandas, and Flask. On the other hand, the app.py script contains the code that loads the trained model and serves it as a web application. By defining the configuration and dependencies in this Dockerfile, an image can be created that houses the machine learning model and the runtime environment required for its execution, facilitating consistent deployment. Building and Testing the Docker Image Upon successful Dockerfile creation, the subsequent phase involves constructing the Docker image. The Docker image is constructed by executing the docker build command, followed by the directory that contains the Docker file. The -t flag tags the image with a specified name. An instance of such a command would be: Shell docker build -t ml_model_image:1.0 Here, ml_model_image:1.0 is the name (and version) assigned to the image, while '.' indicates that the Dockerfile resides in the current directory. After constructing the Docker image, the following task involves initiating a Docker container from this image, thereby allowing the functionality of the machine learning model to be tested. The docker run command aids in this endeavor: Shell docker run -p 4000:80 ml_model_image:1.0 In this command, the -p flag maps the host's port 4000 to the container's port 80 (as defined in the Dockerfile). Therefore, the machine learning model is accessible via port 4000 of the host machine. Testing the model requires sending a request to the endpoint exposed by the Flask application within the Docker container. For instance, if the model provides a prediction based on data sent via a POST request, the curl command can facilitate this: Shell curl -d '{"data":[1, 2, 3, 4]}' -H 'Content-Type: application/json' http://localhost:4000/predict The proposed method ensures a seamless flow from Dockerfile creation to testing the ML model within a Docker container. Deploying the ML Model With Docker Deployment of machine learning models typically involves exposing the model as a service that can be accessed over the internet. A standard method for achieving this is by serving the model as a REST API using a web framework such as Flask. Consider an example where a Flask application encapsulates a machine learning model. The following Python script illustrates how the model could be exposed as a REST API endpoint: Python from flask import Flask, request from sklearn.externals import joblib app = Flask(__name__) model = joblib.load('model.pkl') @app.route('/predict', methods=['POST']) def predict(): data = request.get_json(force=True) prediction = model.predict([data['features']]) return {'prediction': prediction.tolist()} if __name__ == '__main__': app.run(host='0.0.0.0', port=80) In this example, the Flask application loads a pre-trained Scikit-learn model (saved as model.pkl) and defines a single API endpoint /predict. When a POST request is sent to this endpoint with a JSON object that includes an array of features, the model makes a prediction and returns it as a response. Once the ML model is deployed and running within the Docker container, it can be communicated using HTTP requests. For instance, using the curl command, a POST request can be sent to the model with an array of features, and it will respond with a prediction: Shell curl -d '{"features":[1, 2, 3, 4]}' -H 'Content-Type: application/json' http://localhost:4000/predict This practical example demonstrates how Docker can facilitate deploying machine learning models as scalable and accessible services. Scaling the ML Model With Docker Swarm As machine learning applications grow in scope and user base, the ability to scale becomes increasingly paramount. Docker Swarm provides a native clustering and orchestration solution for Docker, allowing multiple Docker hosts to be turned into a single virtual host. Docker Swarm can thus be employed to manage and scale deployed machine learning models across multiple machines. Inaugurating a Docker Swarm is a straightforward process, commenced by executing the 'docker swarm init' command. This command initializes the current machine as a Docker Swarm manager: Shell docker swarm init --advertise-addr $(hostname -i) In this command, the --advertise-addr flag specifies the address at which the Swarm manager can be reached by the worker nodes. The hostname -i command retrieves the IP address of the current machine. Following the initialization of the Swarm, the machine learning model can be deployed across the Swarm using a Docker service. The service is created with the docker service create command, where flags like --replicas can dictate the number of container instances to run: Shell docker service create --replicas 3 -p 4000:80 --name ml_service ml_model_image:1.0 In this command, --replicas 3 ensures three instances of the container are running across the Swarm, -p 4000:80 maps port 4000 of the Swarm to port 80 of the container, and --name ml_service assigns the service a name. Thus, the deployed machine learning model is effectively scaled across multiple Docker hosts by implementing Docker Swarm, thereby bolstering its availability and performance. Continuous Integration/Continuous Deployment (CI/CD) With Docker Continuous Integration/Continuous Deployment (CI/CD) is a vital aspect of modern software development, promoting automated testing and deployment to ensure consistency and speed in software release cycles. Docker's portable nature lends itself well to CI/CD pipelines, as Docker images can be built, tested, and deployed across various stages in a pipeline. An example of integrating Docker into a CI/CD pipeline can be illustrated using a Jenkins pipeline. The pipeline is defined in a Jenkinsfile, which might look like this: Groovy pipeline { agent any stages { stage('Build') { steps { script { sh 'docker build -t ml_model_image:1.0 .' } } } stage('Test') { steps { script { sh 'docker run -p 4000:80 ml_model_image:1.0' sh 'curl -d '{"features":[1, 2, 3, 4]}' -H 'Content-Type: application/json' http://localhost:4000/predict' } } } stage('Deploy') { steps { script { sh 'docker service create --replicas 3 -p 4000:80 --name ml_service ml_model_image:1.0' } } } } } In this Jenkinsfile, the Build stage builds the Docker image, the Test stage runs the Docker container and sends a request to the machine learning model to verify its functionality, and the Deploy stage creates a Docker service and scales it across the Docker Swarm. Therefore, with Docker, CI/CD pipelines can achieve reliable and efficient deployment of machine learning models. Conclusion and Best Practices Wrapping up, this article underscores the efficacy of Docker in streamlining the deployment of machine learning models. The ability to encapsulate the model and its dependencies in an isolated, consistent, and lightweight environment makes Docker a powerful tool for machine learning practitioners. Further enhancing its value is Docker's potential to scale machine learning models across multiple machines through Docker Swarm and its seamless integration with CI/CD pipelines. However, to extract the most value from Docker, certain best practices are recommended: Minimize Docker image size: Smaller images use less disk space, reduce build times, and speed up deployment. This can be achieved by using smaller base images, removing unnecessary dependencies, and efficiently utilizing Docker's layer caching. Use .dockerignore: Similar to .gitignore in Git, .dockerignore prevents unnecessary files from being included in the Docker image, reducing its size. Ensure that Dockerfiles are reproducible: Using specific versions of base images and dependencies can prevent unexpected changes when building Docker images in the future. By adhering to these guidelines and fully embracing the capabilities of Docker, it becomes significantly more feasible to navigate the complexity of deploying machine learning models, thereby accelerating the path from development to production. References Docker Official Documentation. Docker, Inc. Docker for Machine Learning. O'Reilly Media, Inc. Continuous Integration with Docker. Jenkins Documentation. Scikit-learn: Machine Learning in Python. Scikit-learn Developers. Kalade, S., Crockett, L. H., & Stewart, R. (2018). Using Sequence to Sequence Learning for Digital BPSK and QPSK Demodulation. Blog — Page 3 — Liran Tal. Introduction to the Dockerfile Part II | by Hakim | Medium. Spring Boot 2.2 with Java 13 CRUD REST API Tutorial: Using JPA Hibernate & MySQL | Techiediaries.
The software industry has discovered effective solutions to its development-oriented problems in DevOps, CI/CD, and containers. Although not mandatory to use all three together, they often complement and rely on each other. DevOps promotes collaboration between development and IT teams, while CI/CD simplifies the software delivery process for quicker outcomes. Containerization combines an application with its dependencies to establish consistent development and deployment environments. Implementing these approaches optimizes software development automation, enhancing agility, scalability, reducing downtime, and improving digital product quality. Despite their perceived complexity, implementing these technologies can be manageable. This article delves into these concepts' intricacies, illustrates their real-world impact, and uncovers the keys to unlocking remarkable efficiency and productivity. The Power Trio: DevOps, CI/CD, and Containerization Picture this: A perfect harmony between development and operations, seamless code integration, and rocket-speed deployments. That's the magic of DevOps, CI/CD, and containerization. Let us have a detailed overview of these amazing technologies and how they helps in software development: DEVOPS: Merging the Gap Between Development and Operations DevOps is a collaborative methodology that merges development and operations teams to optimize the software development lifecycle (SDLC). This approach facilitates improved communication, collaboration, and integration among these teams, eliminating barriers and ensuring a smooth workflow from the conception of ideas to their deployment. Through goal alignment, shared responsibilities, and process automation, DevOps empowers organizations to achieve quicker time-to-market, higher-quality software, and enhanced customer satisfaction. By working in tandem, development and operations teams leverage automation tools and practices to streamline the entire SDLC. This cohesive approach enables faster and more efficient delivery of software solutions. In summary, DevOps unifies previously separate development and operations functions, fostering a culture of collaboration and leveraging automation to optimize the software development process. This results in faster delivery of high-quality software, ultimately satisfying customer needs and expectations. CI/CD: Accelerating Software Delivery Through Continuous Integration and Continuous Deployment CI/CD encompasses a range of methodologies that streamline the process of integrating and deploying code alterations, guaranteeing swift and consistent software development. Continuous integration involves frequently merging code changes into a communal repository and employing automated tests to identify any integration conflicts promptly. On the other hand, continuous deployment automates software release into production environments, eliminating the need for manual and error-prone deployment procedures. By combining CI/CD, the time and resources needed for each release are minimized while also facilitating rapid feedback loops and fostering a culture of continual enhancement. While using the continuous integration (CI) model, teams implement frequent small changes and verify the code using version control repositories. This ensures consistency in app building, packaging, and testing, improving collaboration and software quality. On the other hand, continuous deployment (CD) automates code deployment to different environments (production, development, testing) and executes service calls to databases and servers. The combination of CI/CD minimizes the time and effort required for each release, enables rapid feedback cycles, and promotes a culture of continuous improvement. Containerization: Empowering Software Deployment With Efficiency and Portability Containerization is a robust technology that packages applications and their dependencies in self-contained units called containers. These containers provide a lightweight, isolated, consistent runtime environment, ensuring applications run reliably across different platforms and infrastructures. They summarize every single thing that is required to run an application, right from the code to the system tools and libraries. Containerization simplifies software deployment by abstracting away the underlying infrastructure details, making it easier to package, distribute, and deploy applications consistently. It also enables efficient resource utilization, scalability, and portability, as containers can be easily moved between different environments. Combining CI/CD With DevOps Organizations can achieve an efficient and automated software delivery pipeline by integrating CI/CD practices into a DevOps environment. Development and operations teams collaborate closely to implement CI/CD workflows that enable seamless integration, automated testing, and continuous deployment of applications. DevOps provides the cultural foundation for effective collaboration, while CI/CD practices automate the processes and ensure a consistent and reliable release cycle. They enable organizations to respond rapidly to customer feedback, quickly deliver new features and enhancements, and maintain high software quality. Harnessing the Power of Containerization in CI/CD Containerization plays a vital role in the CI/CD process by offering an efficient and consistent application runtime environment. By packaging applications and their dependencies into portable and self-contained units, containers allow organizations to streamline their deployment workflows. One of the key advantages of containerization is the ease with which CI/CD pipelines can create and manage container images. This ensures consistent and reproducible deployments across different environments, paving the way for efficient software delivery. Moreover, containerization enables the implementation of zero-downtime deployments and rollbacks, offering organizations a safety net in case of any issues during the release process. By automating the deployment process through containers, development teams can allocate more time and resources to innovation, feature development, and enhancing the overall user experience. Containers also facilitate scalability, allowing organizations to horizontally scale their applications by spinning up multiple instances of the same container image. This flexibility is precious in cloud-native environments, enabling organizations to adjust resources and meet changing demands dynamically. Conclusion In a world where software development is a race against time, organizations must leverage the power of DevOps, CI/CD, and containerization to fuel their journey toward success. DevOps serves as a bridge between development and operations, fostering collaboration and automating processes. By embracing DevOps, organizations can break down silos and enhance communication, increasing efficiency. CI/CD practices play a crucial role in accelerating software delivery. They enable teams to release software rapidly and reliably, ensuring that new features and updates reach users in a timely manner. Containerization provides organizations with efficiency, scalability, and portability. By encapsulating applications and their dependencies, containers offer a consistent environment across different platforms, enabling seamless deployment and scaling. DevOps, CI/CD, and containerization streamline the software development lifecycle. They promote integration, deployment, and continuous improvement, allowing organizations to adapt quickly and deliver innovative solutions. Organizations unlock remarkable efficiency, productivity, and innovation by embracing the transformative potential of DevOps, CI/CD, and containerization in today's rapidly evolving technology landscape.
In the realm of Docker, a volume refers to a durable storage destination that exists independently of the container. Volumes prove invaluable for preserving data that should endure beyond the container's lifecycle, even if the container is halted or deleted. The volume will be created when the container is built, and it can be accessed and modified by processes running inside the container. Utilizing Volumes in a Docker Container Offers Several Compelling Advantages: Data Persistence: When you have critical data that must endure beyond a container's lifecycle, volumes provide the ideal solution. Storing items like database files or application logs in a volume ensures their preservation even if the container is halted or deleted. Sharing Data Among Containers: Volumes facilitate seamless data sharing among multiple containers. By leveraging a volume, you allow different containers to access the same data, making it convenient for storing shared configuration files or data utilized by multiple containers. Streamlining Data Management: Volumes contribute to efficient data management by decoupling data from the container itself. For instance, you can employ a volume to store data generated by the container and then easily access that data by mounting the volume on a host system, simplifying data handling and manipulation. In Jenkins pipelines, Docker volumes offer a convenient solution for building Angular projects without requiring the installation of the Angular library on the Jenkins node machine. The concept behind this approach involves the following steps: Copy Angular application source code into a Docker volume. Utilize a Docker image pre-installed with Node.js to create a new Docker container. Attach the Docker volume containing the source code to this new container. Once the Docker container is set up, it will efficiently build and compile the Angular application. At the completion of the job, the container will publish the compiled Angular application as a zip file. This process enables seamless development and deployment without cluttering the Jenkins node machine with Angular dependencies. The Following Code Implements the Ideas Mentioned Above Groovy pipeline { agent any { label 'AngularApp' } options { timestamps() } environment { SRC_PATH = '/home/node/angular-realworld-example-app' } stages { stage('Preparation') { agent { label 'AngularApp' } steps { script { // Pull angular app code and clean the workspace // Create volumes to be used across all stages. sh( label: 'Pre-create all the docker volumes', script: ''' rm -rf ./* git clone https://github.com/gothinkster/angular-realworld-example-app.git ls -l docker volume create --name=src-volume docker volume create --name=node_modules-volume docker volume create --name=dist-prod-volume docker container prune -f || true ''' ) /* Copy the source code in the jobs workspce to docker volume '/ws' in the docker container maps to the current workspace ${WORKSPACE} */ withDockerContainer( image: 'alpine', args: '-u 0 -v src-volume:/tmp/src -v ${WORKSPACE}:/ws' ) { sh( label: 'Copy source files into src-volume', script: ''' # Going to source code directory cd /ws/angular-realworld-example-app # copying code to docker volume cp -arf . /tmp/src # Printing the contents on console ls -l ''' ) } } } } stage('Install Dependencies') { agent { docker { image 'node:18-alpine' reuseNode true args '-u 0 \ -v src-volume:${SRC_PATH} \ -v node_modules-volume:${SRC_PATH}/node_modules' } } steps { /* Installing dependencies defined in package.json by calling yarn install. Here the docker volume "src-volume" is mapped to path ${SRC_PATH} inside the container. */ sh( label: 'Install dependencies', script: ''' cd ${SRC_PATH} npm config set registry "http://registry.npmjs.org/" yarn install ''' ) } } stage('Build') { agent { /* use the root user (-u 0) to access the docker container this is needed to write to volumes. dist-prod-volume is used to expose build artifacts outside of this stage node_modules-volume is to reuse dependencies from cache "reuseNode true" is necessary to run the stage on the same node used at the begning of the script as docker volumes cannot work across nodes. */ docker { image 'node:18-alpine' reuseNode true args '-u 0 \ -v src-volume:${SRC_PATH} \ -v dist-prod-volume:${SRC_PATH}/dist/ \ -v node_modules-volume:${SRC_PATH}/node_modules' } } steps { sh( label: 'Clean and rebuild application distribution files', script: ''' cd ${SRC_PATH} # clean old files under dist directory yarn rimraf dist/* export NODE_OPTIONS=--openssl-legacy-provider # Build the angular application using yarn yarn build ''' ) } post { success { // Jenkins job cannot access the files from the docker container/volumes // directly, hence we need to first copy them into the workspace before // archiving the artifacts. sh( label: 'Copy artifacts from docker container to workspace', script: ''' mkdir -p $WORKSPACE/artifacts/AngularApp cp -ar ${SRC_PATH}/dist/. $WORKSPACE/artifacts/AngularApp ''' ) zip archive: true, dir: 'artifacts/AngularApp', zipFile: 'AngularApp.zip', overwrite: true archiveArtifacts artifacts: 'AngularApp.zip' } cleanup { // wipe out this workspace after the job is completed (successful or not) cleanWs() } } } } post { cleanup { cleanWs() // clean docker stuff before finishing the job sh( label: 'Clean dangling docker artifacts', script: ''' docker volume rm src-volume || true docker volume rm node_modules-volume || true docker volume rm dist-prod-volume || true docker container prune -f docker image prune -f ''' ) } } }
Zhejiang Lab is a research institute in China, focusing on intelligent sensing, AI, computing, and networks. We specialize in various scientific fields, like materials, genetics, pharmaceuticals, astronomy, and breeding. Our aim is to become a global leader in basic research and innovation. Finding a storage solution for our ultra-heterogeneous computing cluster was challenging. We tried two solutions: object storage with s3fs + network-attached storage (NAS) and Alluxio + Fluid + object storage, but they had limitations and performance issues. Finally, we chose JuiceFS, an open-source POSIX-compatible distributed file system, as it offers easy deployment, rich metadata engine options, atomic and consistent file visibility across clusters, and caching capability for reduced pressure on the underlying storage. Our test shows that JuiceFS excelled over s3fs in sequential reads/writes, with sequential reads 200% faster. Now we use it in general file storage, storage volumes, and data orchestration. In this article, we’ll dive into our storage challenges, why we chose JuiceFS over s3fs and Alluxio, how we use it, and the issues we encountered and their solutions. We hope this post can help you choose a suitable storage system based on your application needs. Our Storage Challenges We Built an Ultra-Heterogeneous Computing Cluster Our lab is developing an intelligent operating system, which consists of two components: A general-purpose computing platform solution that supports various applications in domains such as computing materials, pharmaceuticals, and astronomy A heterogeneous resource aggregation solution for managing multiple clusters, including CPU, GPU, and high-performance computing (HPC) resources At our computing and data center, we deployed multiple sets of heterogeneous clusters. These clusters are abstracted and managed using Kubernetes (K8s) for unified control. A meta-scheduler allocates application instructions and jobs based on scheduling policies like computing and performance priorities. Now we’ve integrated about 200 PFLOPS of AI computing power and 7,000 cores of HPC computing power. Storage Requirements for Computing Power The heterogeneous nature of computing resources and the diverse system architectures and instruction sets employed led to software incompatibility. This impeded efficient computing power usage. To address this issue, we consolidated various heterogeneous computing resources into a vast computing pool. Our computing power storage requirements included: Abstraction and unification of the storage layer: Many computing scenarios, such as HPC and AI training, utilize the POSIX interface. Therefore, we wanted an interface to provide unified services at this layer. Generalization of the storage solution: Our integrated computing power clusters are heterogeneous, requiring a solution that can be applied to different types of clusters. Data arrangement conditions: We have both hot and cold data. During task computation, the data being actively used is hot data. As time passes or after a few days, the data becomes cold, resulting in fewer reads and operations. High performance for storage: Efficient data read and write performance, particularly for hot data reads, is crucial. In computing power clusters, computing resources are valuable, and slow data reading leading to CPU and GPU idle waiting would result in significant waste. Why We Rejected s3fs and Alluxio Before JuiceFS, we tried two solutions, but they were not ideal. Solution 1: Object Storage With s3fs + NAS Advantages of this solution: Simple architecture Easy deployment Out-of-the-box usability Disadvantages of this solution: Directly using object storage had poor performance. The s3fs mount point for object storage frequently got lost. Once the mount point was lost, containers could not access it, and restoring the mount point required restarting the entire container. This caused significant disruption to user operations. As the cluster gradually scaled and the number of nodes expanded from initially 10+ to 100+, this solution did not work. Solution 2: Alluxio + Fluid + Object Storage Advantage of this solution: Better performance compared to object storage with s3fs + NAS. Disadvantages of this solution: Complex architecture with multiple components. Alluxio was not a file system with strong consistency but rather a caching glue layer. We operate in a highly heterogeneous multi-cluster environment, where metadata consistency is critical. Given the diverse range of user applications and the need to avoid interfering with their usage patterns, inconsistent data across clusters could lead to severe issues. The underlying object storage posed challenges to metadata performance. As the data scale grew, operations like metadata synchronization and cache layer initialization in new clusters faced significant performance bottlenecks. Alluxio's compatibility and client performance did not meet our expectations. While Alluxio provides unified access to different data sources for reading, it may not be ideal for frequent writes or updates. Why We Chose JuiceFS JuiceFS is an open-source, high-performance, distributed file system designed for the cloud. The following figure shows its architecture: JuiceFS architecture Advantages of JuiceFS JuiceFS has these advantages: Easy deployment: JuiceFS provides detailed community documentation, making it user-friendly for quick setup. It performed exceptionally well in our test clusters and final production deployment. JuiceFS supports CSI for containerized deployment, making it our chosen storage foundation for computing power. Rich metadata engine options: JuiceFS offers various metadata engines such as Redis and TiKV, resulting in excellent metadata performance. Currently, we use a three-node TiKV setup as the metadata engine in our lab. However, as the performance is no longer sufficient, we plan to gradually enhance it. Initially, we considered using Redis as the metadata engine, but it lacked horizontal scalability. Using TiKV allows us to incrementally scale as the file system grows, which is indeed better. Atomic and consistent file visibility across clusters: JuiceFS enables atomicity and consistency of files in a cross-cluster environment. Files written in Cluster A are immediately visible in Cluster B. Caching capability: JuiceFS supports client-side caching, allowing for reduced pressure on underlying storage in computing clusters. Excellent POSIX compatibility: JuiceFS has strong compatibility with the POSIX interface. An active community: JuiceFS benefits from a vibrant and engaged community. JuiceFS Outperforms s3fs and NAS in I/O Performance We conducted performance tests on JuiceFS, s3fs, and NAS using the following tools: Flexible I/O Tester (FIO) with 16 threads 4 MB block size 1 GB of data The test results are as follows: JuiceFS vs. s3fs vs. NAS The test results show that: JuiceFS outperformed both s3fs and NAS in terms of I/O performance. JuiceFS excelled over s3fs in sequential read/write performance, with sequential reads 200% faster. During the evaluation, NAS was still providing services in a production environment, with approximately seventy-plus nodes running concurrently. The limited bandwidth severely affected its performance. Evolution of Our Storage-Compute Decoupled Architecture Previous architecture Initially, the high-performance computing process consisted of multiple stages, with data scattered across different storage systems, posing challenges in efficiency and convenience. To simplify data management and flow, we implemented a unified storage infrastructure as the foundation. This storage foundation prioritizes high reliability, low cost, and high throughput, leading us to choose object storage. Storing data in object storage allows for seamless data tiering, optimizing storage space. However, using object storage directly within the computing cluster presented certain issues: Poor metadata performance, especially when dealing with a large number of files in the same directory, resulting in lengthy operations. High bandwidth consumption due to the data lake's reliance on a regular IP network rather than a high-speed remote direct memory access (RDMA) network, leading to limited overall bandwidth. Current architecture By adhering to the following principles, our evolving architecture ensures efficient storage and compute separation while optimizing data flow and performance: Alongside object storage, we established a metadata cluster utilizing the TiKV database. Building upon object storage and TiKV, we developed the JuiceFS distributed file system. The computing clusters access the file system by installing the JuiceFS client on nodes. Therefore, we enhanced metadata performance, reduced bandwidth consumption, and overcame the limitations of object storage. To enable efficient data flow, we introduced a file management system, powered by the JuiceFS S3 gateway, enabling file upload and download operations. To facilitate seamless data flow between computing clusters and the object storage data lake foundation, we deployed a high-speed caching cluster within the computing clusters. This cluster focuses on achieving optimal I/O performance. Users can seamlessly access data without being concerned about its location, whether in object storage or the high-speed caching cluster. The computing system manages data flow, utilizing a 200 G RDMA high-speed network connection between the computing clusters and the high-speed caching cluster. The high-speed caching cluster incorporates the BeeGFS high-performance parallel file system, mounted as a directory on the computing clusters. This allows for straightforward utilization of the caching system as if it were a local directory. How We’re Using JuiceFS Storage requirements and performance metrics vary across different application scenarios. To efficiently serve users, we proposed the concept of productizing storage capabilities. JuiceFS is currently being applied in the following storage product categories. General File Storage JuiceFS stores data in a designated directory and generates a unique access path based on the user's organizational structure. By directly mounting this path to containers, data isolation is achieved. Users can upload and download files through the web interface or perform file operations using our provided commands and tools. Storage Volumes During the initial development phase, general file storage faced scalability issues in terms of capacity. The underlying object storage cluster had limited capacity, preventing users from obtaining additional storage space. To address this, we introduced the concept of storage volumes. Storage volumes can be likened to cloud drives, where different volumes represent different types of cloud drives. For scenarios involving frequent read and write operations on numerous small files, a storage product with low latency and high throughput is required. To meet this demand, we repurposed our high-speed caching cluster into a high-speed storage volume. Users can directly access the file system directory, experiencing the performance advantages of high-speed storage without relying on JuiceFS. For users who need to store large amounts of data with infrequent access, we offer a standard storage volume that combines JuiceFS and object storage. This provides larger storage capacity and acceptable throughput performance while enabling network connectivity across clusters, unlike high-speed storage volumes. Some users have higher performance requirements, such as needing local disk products while ensuring data persistence. In Kubernetes environments, writing data to local disks carries the risk of data loss due to unexpected restarts or physical node issues. In such cases, a persistent solution is necessary. We allocate a portion of storage space from the affected node's local disk as a local storage volume and schedule tasks to designated nodes based on user-specified storage volumes. This solution balances performance and data persistence. Additionally, different storage products vary in capacity, throughput, and inter-cluster connectivity capabilities. For example: High-speed storage enables communication within a cluster but lacks cross-cluster capabilities. Storage products also differ in capacity and cost. High-speed storage uses all-flash clusters, resulting in higher construction costs, while object storage has relatively lower construction costs and larger storage capacities. By packaging different storage hardware capabilities into various storage products, we can cater to diverse user business scenarios. Data Orchestration We implemented data orchestration functionality for JuiceFS. Administrators can upload commonly used datasets to a specific directory within the file system, abstracted as publicly accessible datasets. Different users can mount these datasets when creating jobs. Ordinary users can also upload their private datasets and utilize JuiceFS' warm-up feature to optimize access to these datasets. In the computing clusters, we established a high-speed caching cluster. By using the warmup command, users' datasets can be warmed up from both ends to the high-speed caching cluster on computing nodes. This enables users to directly interact with their self-built high-performance clusters when performing extensive model training, eliminating the need for remote object storage cluster interaction and improving overall performance. Moreover, this setup helps alleviate network bandwidth pressure on the object storage foundation. The entire cache eviction process is automatically managed by the JuiceFS client, as the capacity limit for access directories can be configured. For users, this functionality is transparent and easy to use. Issues Encountered and Solutions Slow File Read Performance Issue We conducted internal tests and collaborated with our algorithm team to evaluate the file read performance. The test results showed that JuiceFS had significantly slower read performance compared to NAS. We investigated the reasons behind this performance disparity. Workaround When we used TiKV as the metadata engine, we discovered that certain API operations, such as directory listing, were random and did not guarantee a consistent order like NAS or other file systems. This posed a challenge when algorithms relied on random file selection or assumed a fixed order. It led to incorrect assumptions about the selected files. We realized that we needed to manually index the directory in specific scenarios. This is because processing a large number of small files incurred high metadata overhead. Without caching the metadata in memory, fetching it from the metadata engine for each operation resulted in high performance overhead. For algorithms that handled hundreds of thousands or even millions of files, maintaining consistency during training required treating these files as index files forming their own index directory tree. By reading the index files instead of invoking the list dir operation, the algorithm training process ensured consistency in the file directory tree. Response From the JuiceFS Team The JuiceFS team found that the slow read performance primarily depended on the user's specific application scenario. They did not make adjustments to the random directory read functionality after evaluation. If you encounter similar issues, contact the JuiceFS team by joining their discussions on GitHub and community on Slack. TiKV Couldn’t Perform Garbage Collection (GC) Issue We found that TiKV couldn't perform GC while using JuiceFS. Despite a displayed capacity of 106 TB and 140 million files, TiKV occupied 2.4 TB. This was abnormal. Solution We found that the lack of GC in TiKV's metadata engine might be the cause. The absence of GC metrics in the reports raised concerns. This issue arose because we deployed only TiKV without TiDB. However, TiKV's GC relies on TiDB, which can be easily overlooked. Response From the JuiceFS Team The JuiceFS team addressed this issue by incorporating a background GC task for TiKV in pull requests (PRs) #3262 and #3432. They’ve been merged in v1.0.4. High Memory Usage of the JuiceFS Client Issue When mounting the JuiceFS client, we configured the cache cluster as the storage directory with a capacity of up to 50 TB. The JuiceFS client periodically scanned the cache directory and built an in-memory index to track data in the cache. This resulted in high memory consumption. For directories with a large number of files, we recommend disabling this scanning feature. Solution During testing, we found acceptable performance for random I/O on small files. However, we encountered a significant issue with sequential I/O. For example, when using the dd command to create a 500 MB file, we noticed that JuiceFS generated an excessive number of snapshots. This indicated that the storage and operations on the object storage far exceeded what should be expected for creating a 500 MB file. Further investigation revealed that enabling the -o writeback_cache parameter transformed sequential writes into random writes, thereby reducing overall sequential write performance. This parameter is only suitable for exceptionally advanced scenarios involving high randomness. Using this parameter outside of such scenarios can lead to serious issues. Response From the JuiceFS Team This issue primarily pertains to scenarios where NAS is used as a cache. It has been addressed and optimized in JuiceFS 1.1 beta. The memory footprint during scanning has been significantly reduced, resulting in improved speed. JuiceFS introduced the - -cache-scan-interval option in PR #2692, allowing users to customize the scan interval and choose whether to perform scanning only once during startup or completely disable it. If you use local disks as cache, you do not need to make any adjustments. Our Future Plans: Expanded Range of Storage Products A Broader Range of Software and Hardware Products We are dedicated to offering a wider variety of software and hardware products and tailoring these capabilities into different storage volumes to meet the diverse storage requirements of users in various scenarios. Improved Data Isolation We plan to adopt the container storage interface (CSI) mode with customized path configurations to ensure effective data isolation. Currently, there are concerns regarding data security, as all user data is stored within a single large-scale file system and mounted on bare metal machines via hostpath. This setup presents a potential risk where users with node login permissions can access the entirety of the file system's data. We’ll introduce a quota management feature, providing users with a means to enforce storage capacity limits and gain accurate insights into their actual consumption. The existing method of using the du command to check capacity incurs high overhead and lacks convenience. The quota management functionality will resolve this concern. We'll improve our capacity management capabilities. In metering and billing scenarios, it’s vital to track user-generated traffic, power consumption, and bill users based on their actual storage usage. Monitoring and Operations When we use JuiceFS, we mount it on bare metal servers and expose a monitoring port. Our production cluster interacts with these ports, establishing a monitoring system that collects and consolidates all pertinent data. We’ll enhance our data resilience and migration capabilities. We’ve encountered a common scenario where existing clusters lack sufficient capacity, necessitating the deployment of new clusters. Managing data migration between old and new clusters, along with determining suitable migration methods for different data types, while minimizing disruptions to production users, remains a challenging task. Consequently, we’re seeking solutions to enhance these capabilities. We’re developing a versatile capability based on JuiceFS and CSI plugins to facilitate dynamic mounting across diverse storage clients. In production environments, users often require adjusting mounting parameters to cater to various application products. However, directly modifying mounting parameters may lead to disruptions across the entire physical node. Therefore, enabling dynamic mounting capability would empower users to make appropriate switches to their applications without the need for restarts or other disruptive operations. Conclusion We use JuiceFS as a storage layer for our ultra-heterogeneous computing cluster, as it offers easy deployment, rich metadata engine options, atomic and consistent file visibility across clusters, caching capability, and excellent POSIX compatibility. Compared to other solutions like s3fs and Alluxio, JuiceFS outperforms them in I/O performance. Now we use JuiceFS for general file storage, storage volumes, and data orchestration.
In this blog, you will take a look at Podman Compose. With Podman Compose, you can use compose files according to the Compose Spec in combination with a Podman backend. Enjoy! Introduction A good starting point and a must-read is this blog provided by RedHat. In short, Podman Compose is not directly maintained by the Podman team, and neither is Docker Compose, of course. Podman Compose has a more limited feature set than Docker Compose and in general, it is advised to use Kubernetes YAML files for this purpose. See "Podman Equivalent for Docker Compose" for how this can be used. However, the Podman team will fix issues in Podman when required for Podman Compose. It is also possible to use Docker Compose in combination with Podman, as is described in the post "Use Docker Compose with Podman to Orchestrate Containers on Fedora Linux." In this post, you will use Podman Compose to run some basic containers. Sources used in this blog are available at GitHub and the container image is available at DockerHub. The container image is built in a previous blog post "Is Podman a Drop-in Replacement for Docker?". You might want to check it out when you want to know more about Podman compared to Docker. The image contains a basic Spring Boot application with one REST endpoint which returns a hello message. Prerequisites The prerequisites needed for this blog are: Basic Linux knowledge Basic container knowledge Basic Podman knowledge Basic Docker Compose knowledge Podman Compose With One Container Let’s start with a basic single container defined in compose file docker-compose/1-docker-compose-one-service.yaml. One service is defined as exposing port 8080 in order to be able to access the endpoint. YAML services: helloservice-1: image: docker.io/mydeveloperplanet/mypodmanplanet:0.0.1-SNAPSHOT ports: - 8080:8080 It is assumed that you have Podman installed already. Execute the following command from the root of the repository. Shell $ podman-compose -f docker-compose/1-docker-compose-one-service.yaml up -d Error: unknown shorthand flag: 'f' in -f As expected, this does not work. Podman Compose is not included or available when you have installed Podman. Docker Compose does not require an additional installation. Podman Compose is a community project not maintained by the Podman team and that is why it is not part of the Podman installation. Installation instructions for Podman Compose can be found on the Podman Compose GitHub pages. The easiest way is to use pip, the Python package installer. This requires an installed Python version and pip. Shell $ pip3 install podman-compose Run the Podman Compose command again. Shell $ podman-compose -f docker-compose/1-docker-compose-one-service.yaml up -d podman-compose version: 1.0.6 ['podman', '--version', ''] using podman version: 3.4.4 ** excluding: set() ['podman', 'ps', '--filter', 'label=io.podman.compose.project=docker-compose', '-a', '--format', '{{ index .Labels "io.podman.compose.config-hash"}'] ['podman', 'network', 'exists', 'docker-compose_default'] podman run --name=docker-compose_helloservice-1_1 -d --label io.podman.compose.config-hash=b2928552303ab947ea3497ef5e1eff327c1c9672a8454f18f9dbee4578061370 --label io.podman.compose.project=docker-compose --label io.podman.compose.version=1.0.6 --label PODMAN_SYSTEMD_UNIT=podman-compose@docker-compose.service --label com.docker.compose.project=docker-compose --label com.docker.compose.project.working_dir=/home/.../mypodmanplanet/docker-compose --label com.docker.compose.project.config_files=docker-compose/1-docker-compose-one-service.yaml --label com.docker.compose.container-number=1 --label com.docker.compose.service=helloservice-1 --net docker-compose_default --network-alias helloservice-1 -p 8080:8080 docker.io/mydeveloperplanet/mypodmanplanet:0.0.1-SNAPSHOT 302dcf140babae07c416e8556e11fc13918bd1fe3c52b737f4ab091f3599291e exit code: 0 Verify whether the container has started successfully. Shell $ podman ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 3104b5a4c418 docker.io/mydeveloperplanet/mypodmanplanet:0.0.1-SNAPSHOT About a minute ago Up About a minute ago 0.0.0.0:8080->8080/tcp docker-compose_helloservice-1_1 Verify whether the endpoint can be reached. Shell $ curl http://localhost:8080/hello Hello Podman! Shut down the container. Shell $ podman-compose -f docker-compose/1-docker-compose-one-service.yaml down podman-compose version: 1.0.6 ['podman', '--version', ''] using podman version: 3.4.4 ** excluding: set() podman stop -t 10 docker-compose_helloservice-1_1 docker-compose_helloservice-1_1 exit code: 0 podman rm docker-compose_helloservice-1_1 3104b5a4c4189777b21c2658a3d3c4df91b3f804d5c9bced63532f0318e9e9df exit code: 0 Running a basic service just worked. Podman Compose With Two Containers Let’s run two containers defined in compose file docker-compose/2-docker-compose-two-services.yaml. One service is available at port 8080 and the other one at port 8081. Shell services: helloservice-1: image: docker.io/mydeveloperplanet/mypodmanplanet:0.0.1-SNAPSHOT ports: - 8080:8080 helloservice-2: image: docker.io/mydeveloperplanet/mypodmanplanet:0.0.1-SNAPSHOT ports: - 8081:8080 Run the Compose file from the root of the repository. Shell $ podman-compose -f docker-compose/2-docker-compose-two-services.yaml up -d podman-compose version: 1.0.6 ['podman', '--version', ''] using podman version: 3.4.4 ** excluding: set() ['podman', 'ps', '--filter', 'label=io.podman.compose.project=docker-compose', '-a', '--format', '{{ index .Labels "io.podman.compose.config-hash"}'] ['podman', 'network', 'exists', 'docker-compose_default'] podman run --name=docker-compose_helloservice-1_1 -d --label io.podman.compose.config-hash=d73c6cebbe901d7e4f27699b4308a39acaa4c3517293680c24ea1dce255177cf --label io.podman.compose.project=docker-compose --label io.podman.compose.version=1.0.6 --label PODMAN_SYSTEMD_UNIT=podman-compose@docker-compose.service --label com.docker.compose.project=docker-compose --label com.docker.compose.project.working_dir=/home/.../mypodmanplanet/docker-compose --label com.docker.compose.project.config_files=docker-compose/2-docker-compose-two-services.yaml --label com.docker.compose.container-number=1 --label com.docker.compose.service=helloservice-1 --net docker-compose_default --network-alias helloservice-1 -p 8080:8080 docker.io/mydeveloperplanet/mypodmanplanet:0.0.1-SNAPSHOT ff344fb5e22800c2c7454d66572bbcad22a47536d0dc930960ea06844b7838f3 exit code: 0 ['podman', 'network', 'exists', 'docker-compose_default'] podman run --name=docker-compose_helloservice-2_1 -d --label io.podman.compose.config-hash=d73c6cebbe901d7e4f27699b4308a39acaa4c3517293680c24ea1dce255177cf --label io.podman.compose.project=docker-compose --label io.podman.compose.version=1.0.6 --label PODMAN_SYSTEMD_UNIT=podman-compose@docker-compose.service --label com.docker.compose.project=docker-compose --label com.docker.compose.project.working_dir=/home/.../mypodmanplanet/docker-compose --label com.docker.compose.project.config_files=docker-compose/2-docker-compose-two-services.yaml --label com.docker.compose.container-number=1 --label com.docker.compose.service=helloservice-2 --net docker-compose_default --network-alias helloservice-2 -p 8081:8080 docker.io/mydeveloperplanet/mypodmanplanet:0.0.1-SNAPSHOT e88b6d4eb33b8869b2fa7c964cd153a2920558a27b7bc8cf0d2b9f4d881a89ee exit code: 0 Verify whether both containers are running. Shell $ podman ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES ff344fb5e228 docker.io/mydeveloperplanet/mypodmanplanet:0.0.1-SNAPSHOT 35 seconds ago Up 35 seconds ago 0.0.0.0:8080->8080/tcp docker-compose_helloservice-1_1 e88b6d4eb33b docker.io/mydeveloperplanet/mypodmanplanet:0.0.1-SNAPSHOT 33 seconds ago Up 33 seconds ago 0.0.0.0:8081->8080/tcp docker-compose_helloservice-2_1 Verify whether both endpoints are accessible. Shell $ curl http://localhost:8080/hello Hello Podman! $ curl http://localhost:8081/hello Hello Podman! In the output after executing the podman-compose command, you can see that a default network docker-compose_default is created. A network alias helloservice-1 is created for container 1 and a network alias helloservice-2 is created for container 2. Enter container 1 and try to access the endpoint of container 2 using the network alias. Beware to use the internal port 8080 instead of the external port mapping to port 8081! Shell $ podman exec -it docker-compose_helloservice-1_1 sh /opt/app $ wget http://helloservice-2:8080/hello Connecting to helloservice-2:8080 (10.89.1.3:8080) saving to 'hello' hello 100% |*********************************************************************************************************************| 13 0:00:00 ETA 'hello' saved As you can see, the services can access each other using the network alias. Finally, stop the containers. Shell $ podman-compose -f docker-compose/2-docker-compose-two-services.yaml down podman-compose version: 1.0.6 ['podman', '--version', ''] using podman version: 3.4.4 ** excluding: set() podman stop -t 10 docker-compose_helloservice-2_1 docker-compose_helloservice-2_1 exit code: 0 podman stop -t 10 docker-compose_helloservice-1_1 docker-compose_helloservice-1_1 exit code: 0 podman rm docker-compose_helloservice-2_1 e88b6d4eb33b8869b2fa7c964cd153a2920558a27b7bc8cf0d2b9f4d881a89ee exit code: 0 podman rm docker-compose_helloservice-1_1 ff344fb5e22800c2c7454d66572bbcad22a47536d0dc930960ea06844b7838f3 exit code: 0 Conclusion Podman Compose works with some basic Compose files. However, Podman Compose lacks features compared to Docker Compose. This list cannot be easily found (at least, I could not find it). This means that it is a risk of using Podman Compose instead of Docker Compose. Besides that, RedHat itself recommends using Kubernetes YAML files instead of Compose files. If you are using Podman, it is better to use the Kubernetes YAML files for container orchestration.
Have you ever seen the “exec /docker-entrypoint.sh: exec format error” error message on your server while running any docker image or Kubernetes pods? This is most probably because you are running some other CPU architecture container image on your server, or did you ever use --platform linux/x86_64 option on your Apple silicon M1, M2 MacBook? If yes, then you are not getting the native performance of Apple silicon, and it may be draining your MacBook battery. To avoid this kind of error and performance issue, we need to run the correct multi-arch container image, or we may need to build our own image because all container public image does not have the multi-arch image available. In this blog post, we will learn what are multi-arch container images. How does it work? How to build and promote them? and we will write a sample code for building a multi-arch image in the CI/CD pipeline. What Is a Multi-Arch Container Image? A multi-arch Docker image is a list of images that has references to binaries and libraries compiled for multiple CPU architectures. This type of image is useful when we need to run the same application on different CPU architectures (ARM, x86, RISC-V, etc) without creating separate images for each architecture. Multi-Arch Container Use Cases Performance and Cost Optimization: Container multi-arch is used to optimize performance on different CPU architectures. By building and deploying images that are optimized for a specific architecture, we can achieve better performance and reduce resource usage. Using Karpenter, we can easily deploy our workload to arm64 and get the benefit of AWS Graviton’s performance and cost savings. Cross-Platform Development: If you are developing an application that needs to run on multiple platforms, such as ARM and x86, you can use buildx to build multi-arch Docker images and test the application on different architectures. IoT Devices: Many IoT/Edge devices use ARM processors, which require different binaries and libraries than x86 processors. With multi-arch images, you can create an image that can run on ARM, x86, and RISCV devices, making it easier to deploy your application to a wide range of IoT devices. Benefits of Using Multi-Arch Container Image Several advantages of using multi-arch container images are: Ability to run Docker images on multiple CPU architectures. Enables us to choose eco-friendly CPU architecture. Seamless migration from one architecture to another. Better performance and cost saving using arm64. Ability to support more cores per CPU using arm64. How To Build Multi-Arch Container Image? There are multiple ways to build a multi-arch container, but we will be focusing on widely used and easy methods. Traditional Docker build command. Using Docker buildx Using Traditional Docker Build Command In this tutorial, we will manually build both images on different CPU architecture machines and push them to the container registry (e.g., Dockerhub) and then create the manifest file, which has both image references. A manifest file is a simple JSON file containing the index of container images and their metadata like the size of the image, sha256 digest, OS, etc. We will see more about the manifest file later in this blog. For example, this is our basic Dockerfile. HTML FROM nginx RUN echo “Hello multiarch” > /usr/share/nginx/html/index.html HTML ########## on amd64 machine ########## docker build -t username/custom-nginx:v1-amd64 . docker push username/custom-nginx:v1-amd64 ########## on arm64 machine ########## docker build -t username/custom-nginx:v1-arm64 . docker push username/custom-nginx:v1-arm64 ########## Create a manifest index file ########## docker manifest create \ username/custom-nginx:v1 \ username/custom-nginx:v1-amd64 \ username/custom-nginx:v1-arm64 ########## Push manifest index file ########## docker manifest push username/custom-nginx:v1 Using Docker Buildx With buildx, we just need to run one single command with parameterized architecture. HTML docker buildx build \ --push \ --platform linux/arm64,linux/amd64 \ -t username/custom-nginx:v1 . In the background, the Docker buildx command uses buildkit, so when we run the above command, it creates one container with moby/buildkitd image, which has QEMU binary for multiple CPU architectures, which are responsible for the emulating CPU instruction sets. We can view these QEMU binaries by running ls /usr/bin/buildkit-qemu-* inside the running buildkit container. In the above command, we passed --platform linux/arm64,linux/amd64 so it uses the /usr/bin/buildkit-qemu-aarch64 QEMU binary for building linux/arm64 image and linux/amd64 are natively built on the host machine. Once both images are built, then it uses the --push option to create the manifest file and pushes both images to the registry server with the manifest file. By inspecting the manifest file, we can see “Ref” contains the actual image link, which will be fetched when platform[0].architecture matches the host system architecture. HTML $ docker manifest inspect -v nginx [ { "Ref": "docker.io/library/nginx:latest@sha256:bfb112db4075460ec042ce13e0b9c3ebd982f93ae0be155496d050bb70006750", "Descriptor": { "mediaType": "application/vnd.docker.distribution.manifest.v2+json", "digest": "sha256:bfb112db4075460ec042ce13e0b9c3ebd982f93ae0be155496d050bb70006750", "size": 1570, "platform": { "architecture": "amd64", "os": "linux" } }, "SchemaV2Manifest": { "schemaVersion": 2, "mediaType": "application/vnd.docker.distribution.manifest.v2+json", "config": { "mediaType": "application/vnd.docker.container.image.v1+json", "size": 7916, "digest": "sha256:080ed0ed8312deca92e9a769b518cdfa20f5278359bd156f3469dd8fa532db6b" }, …. { "Ref": "docker.io/library/nginx:latest@sha256:3be40d1de9db30fdd9004193c2b3af9d31e4a09f43b88f52f1f67860f7db4cb2", "Descriptor": { "mediaType": "application/vnd.docker.distribution.manifest.v2+json", "digest": "sha256:3be40d1de9db30fdd9004193c2b3af9d31e4a09f43b88f52f1f67860f7db4cb2", "size": 1570, "platform": { "architecture": "arm64", "os": "linux", "variant": "v8" } }, "SchemaV2Manifest": { "schemaVersion": 2, "mediaType": "application/vnd.docker.distribution.manifest.v2+json", "config": { "mediaType": "application/vnd.docker.container.image.v1+json", "size": 7932, "digest": "sha256:f71a4866129b6332cfd0dddb38f2fec26a5a125ebb0adde99fbaa4cb87149ead" } We can also use buildx imagetools command to view the same output in a more human-readable format. HTML $ docker buildx imagetools inspect sonarqube:10.0.0-community Name: docker.io/library/sonarqube:10.0.0-community MediaType: application/vnd.docker.distribution.manifest.list.v2+json Digest: sha256:51588fac6153b949af07660decfe20b5754da9fd12c82db5d95a0900b6024196 Manifests: Name: docker.io/library/sonarqube:10.0.0-community@sha256:8b536568cd64faf15e1e5be916cf21506df70e2177061edfedfd22f255a7b1a0 MediaType: application/vnd.docker.distribution.manifest.v2+json Platform: linux/amd64 Name: docker.io/library/sonarqube:10.0.0-community@sha256:2163e9563bbba2eba30abef8c25e68da4eb20e6e0bb3e6ecc902a150321fae6b MediaType: application/vnd.docker.distribution.manifest.v2+json Platform: linux/arm64/v8 If you are having any issues building multi-arch images, you can run the following command to reset the /proc/sys/fs/binfmt_misc entries. HTML docker run --rm --privileged multiarch/qemu-user-static --reset -p yes We can also build multi-arch container images using Buildah as well. How Do Multi-Arch Container Images Work? As we can see in the diagram, the host machine has x86/amd64 CPU architecture, and on top of that, we install operating systems which can be Windows or Linux. Windows requires WSL or LinuxKit to run Docker. It uses QEMU to emulate multiple CPU architectures, and Dockerfile builds run inside this emulation. When we run the docker pull or build command, it fetches the requested manifest file from the registry server. These manifest files are JSON file that can have one Docker image reference or contains more than one image list. It fetches the correct image depending on the host machine’s CPU architecture. How To Integrate Multi-Arch Container Build With CI/CD? If your workload runs on multiple machines with different CPU architectures, it is always better to build multi-arch Docker images for your application. Integrating multi-arch build into CI/CD streamlines the image build and scan process easier, adds only one Docker tag, and saves time. Below we have written Jenkins and GitHub CI sample code for building multi-arch images. Jenkins Multi-Arch CI Currently, the Jenkins Docker plugin does not support multi-arch building, so we can use buildx to build multi-arch images. HTML pipeline { agent { label 'worker1' } options{ timestamps() timeout(time: 30, unit: 'MINUTES') buildDiscarder(logRotator(numToKeepStr: '10')) } environment { DOCKER_REGISTRY_PATH = "https://registry.example.com" DOCKER_TAG = "v1" } stages { stage('build-and-push') { steps{ script{ docker.withRegistry(DOCKER_REGISTRY_PATH, ecrcred_dev){ sh ''' ####### check multiarch env ########### export DOCKER_BUILDKIT=1 if [[ $(docker buildx inspect --bootstrap | head -n 2 | grep Name | awk -F" " '{print $NF}') != "multiarch" ]] then docker buildx rm multiarch | exit 0 docker buildx create --name multiarch --use docker buildx inspect --bootstrap fi ####### Push multiarch ########### docker buildx build --push --platform linux/arm64,linux/amd64 -t "$DOCKER_REGISTRY_PATH"/username/custom-nginx:"$DOCKER_TAG" . ''' } } } } } } Otherwise, we can use the traditional Docker build command as shown above in Jenkins stages with different sets of Jenkins worker nodes. GitHub CI Pipeline for Building Multi-Arch Container Images GitHub Actions also supports multi-arch container images. It also uses QEMU CPU emulation in the background. HTML name: docker-multi-arch-push on: push: branches: - 'main' jobs: docker-build-push: runs-on: ubuntu-20.04 steps: - name: Checkout Code uses: actions/checkout@v3 - name: Set up QEMU uses: docker/setup-qemu-action@v2 - name: Set up Docker Buildx uses: docker/setup-buildx-action@v2 - name: Login to docker.io container registry uses: docker/login-action@v2 with: username: $ password: $ - name: Build and push id: docker_build uses: docker/build-push-action@v3 with: context: . file: ./Dockerfile platforms: linux/amd64,linux/arm64 push: true tags: username/custom-nginx:latest How To Promote Your Multi-Arch Image to Higher Environments? Promoting Docker multi-arch requires a few extra steps, as the docker pull command only pulls a single image based on the host machine’s CPU architecture. To promote multi-arch Docker images, we need to pull all CPU architecture images one by one by using –plarform=linux/$ARCH and then create a manifest file and push them to the new registry server. To avoid these complex steps, we can leverage the following tools. Skopeo or Crane can be used to promote our multi-arch image from one account to another using just a single command. In the background, what these tools do is use Docker API to fetch all multi-arch images and then create a manifest and push all images and manifest. HTML $ skopeo login --username $USER docker.io $ skopeo copy -a docker://dev-account/custom-nginx:v1 docker://prod-account/custom-nginx:v1 What if you want to use only the Docker command to promote this image to a higher environment (Production)? HTML ####### Pull DEV images ########### docker pull --platform=amd64 "$DOCKER_IMAGE_NAME_DEV":"$DOCKER_TAG" docker pull --platform=arm64 "$DOCKER_IMAGE_NAME_DEV":"$DOCKER_TAG" ####### Tag DEV image with STAGE ########### docker tag "$DOCKER_IMAGE_NAME_DEV":"$DOCKER_TAG" "$DOCKER_IMAGE_NAME_STAGE":"$DOCKER_TAG"-amd64 docker tag "$DOCKER_IMAGE_NAME_DEV":"$DOCKER_TAG" "$DOCKER_IMAGE_NAME_STAGE":"$DOCKER_TAG"-arm64 ####### Push amd64 and arm64 image to STAGE ########### docker push "$DOCKER_IMAGE_NAME_STAGE":"$DOCKER_TAG"-amd64 docker push "$DOCKER_IMAGE_NAME_STAGE":"$DOCKER_TAG"-arm64 ####### Create mainfest and push to STAGE ########### docker manifest create \ "$DOCKER_IMAGE_NAME_STAGE":"$DOCKER_TAG" \ --amend "$DOCKER_IMAGE_NAME_STAGE":"$DOCKER_TAG"-amd64 \ --amend "$DOCKER_IMAGE_NAME_STAGE":"$DOCKER_TAG"-arm64 docker manifest push "$DOCKER_IMAGE_NAME_STAGE":"$DOCKER_TAG" How To Scan Multi-Arch Images for Vulnerabilities? We can use any tool like Trivy, Gryp, or Docker scan for image scanning, but we have to pull multi-arch images one by one and then scan them because, by default, the Docker pull command will only fetch the one image that matches with the Host CPU. We can leverage the Docker pull command with --platform={amd64, arm64} to pull different CPU arch images. Here is how we can do that: HTML ####### Pull amd64 image and scan ########### docker pull --platform=amd64 nginx:latest trivy image nginx:latest ####### Pull arm64 image and scan ########### docker pull --platform=arm64 nginx:latest trivy image nginx:latest Some Caveats of Using Multi-Arch Container There are prominent benefits of using multi-arch containers, but there are some caveats that you should certainly be aware of before taking the leap. It takes extra storage for storing other arch images. Takes time to build the multi-arch container image also, while building the arm64 on QEMU emulation consumes lots of time and resources. Performance is significantly slower with emulation to run binaries on different CPUs compared to running the binaries natively. There are still some issues with buildx building arm64 images, like the base image not being available in arm64 and also performing sudo level access or building cross-compile statically linked binary requires an extra step. Need container scanning for all images one by one. Buildx multi-arch builds are only supported on amd64 CPU architecture. Conclusion In this blog, we saw what are multi-arch containers and their use cases. We integrated a multi-arch build with Jenkins and Github CI with sample code and provides you the several ways to promote and scan multi-arch container images, and finally, we learned the caveats of using multi-arch containers. Using multi-arch images gives us the ability to build once and run everywhere. We can seamlessly migrate from one CPU arch to another CPU with ease. Also, by deploying images that are optimized for specific architectures, we can achieve better performance and reduce resource costs.
2023 has seen rapid growth in native-cloud applications and platforms. Organizations are constantly striving to maximize the potential of their applications, ensure seamless user experiences, and drive business growth. The rise of hybrid cloud environments and the adoption of containerization technologies, such as Kubernetes, have revolutionized the way modern applications are developed, deployed, and scaled. In this digital arena, Kubernetes is the platform of choice for most cloud-native applications and workloads, which is adopted across industries. According to a 2022 report, 96% of companies are already either using or evaluating the implementation of Kubernetes in their cloud system. This popular open-source utility is helpful for container orchestration and discovery, load balancing, and other capabilities. However, with this transformation comes a new set of challenges. As the complexity of applications increases, so does the need for robust observability solutions that enable businesses to gain deep insights into their containerized workloads. Enter Kubernetes observability—a critical aspect of managing and optimizing containerized applications in hybrid cloud environments. In this blog post, we will delve into Kubernetes observability, exploring six effective strategies that can empower businesses to unlock the full potential of their containerized applications in hybrid cloud environments. These strategies, backed by industry expertise and real-world experiences, will equip you with the tools and knowledge to enhance the observability of your Kubernetes deployments, driving business success. Understanding Observability in Kubernetes Let us first start with the basics. Kubernetes is a powerful tool for managing containerized applications. But despite its powerful features, keeping track of what's happening in a hybrid cloud environment can be difficult. This is where observability comes in. Observability is collecting, analyzing, and acting on data in a particular environment. In the context of Kubernetes, observability refers to gaining insights into the behavior, performance, and health of containerized applications running within a Kubernetes cluster. Kubernetes Observability is based on three key pillars: 1. Logs: Logs provide valuable information about the behavior and events within a Kubernetes cluster. They capture important details such as application output, system errors, and operational events. Analyzing logs helps troubleshoot issues, understand application behavior, and identify patterns or anomalies. 2. Metrics: Metrics are quantitative measurements that provide insights into a Kubernetes environment's performance and resource utilization. They include CPU usage, memory consumption, network traffic, and request latency information. Monitoring and analyzing metrics help identify performance bottlenecks, plan capacity, and optimize resource allocation. 3. Traces: Traces enable end-to-end visibility into the flow of requests across microservices within a Kubernetes application. Distributed tracing captures timing data and dependencies between different components, providing a comprehensive understanding of request paths. Traces help identify latency issues, understand system dependencies, and optimize critical paths for improved application performance. Kubernetes observability processes typically involve collecting and analyzing data from various sources to understand the system's internal state and provide actionable intelligence. By implementing the right observability strategies, you can gain a deep understanding of your applications and infrastructure, which will help you to: Detect and troubleshoot problems quickly Improve performance and reliability Optimize resource usage Meet compliance requirements Observability processes are being adopted at a rapid pace by IT teams. By 2026, 70% of organizations will have successfully applied observability to achieve shorter latency for decision-making while increasing distributed, organized, and simplified data management processes. 1. Use Centralized Logging and Log Aggregation For gaining insights into distributed systems, centralized logging is an essential strategy. In Kubernetes environments, where applications span multiple containers and nodes, collecting and analyzing logs from various sources becomes crucial. Centralized logging involves consolidating logs from different components into a single, easily accessible location. The importance of centralized logging lies in its ability to provide a holistic view of your system's behavior and performance. With Kubernetes logging, you can correlate events and identify patterns across your Kubernetes cluster, enabling efficient troubleshooting and root-cause analysis. To implement centralized logging in Kubernetes, you can leverage robust log aggregation tools or cloud-native solutions like Amazon CloudWatch Logs or Google Cloud Logging. These tools provide scalable and efficient ways to collect, store, and analyze logs from your Kubernetes cluster. 2. Leverage Distributed Tracing for End-to-End Visibility In a complex Kubernetes environment with microservices distributed across multiple containers and nodes, understanding the flow of requests and interactions between different components becomes challenging. This is where distributed tracing comes into play, providing end-to-end visibility into the execution path of requests as they traverse through various services. Distributed tracing allows you to trace a request's journey from its entry point to all the microservices it touches, capturing valuable information about each step. By instrumenting your applications with tracing libraries or agents, you can generate trace data that reveals each service's duration, latency, and potential bottlenecks. The benefits of leveraging distributed tracing in Kubernetes are significant. Firstly, it helps you understand the dependencies and relationships between services, enabling better troubleshooting and performance optimization. When a request experiences latency or errors, you can quickly identify the service or component responsible and take corrective actions. Secondly, distributed tracing allows you to measure and monitor the performance of individual services and their interactions. By analyzing trace data, you can identify performance bottlenecks, detect inefficient resource usage, and optimize the overall responsiveness of your system. This information is invaluable with regard to capacity planning and ensuring scalability in your Kubernetes environment. Several popular distributed tracing solutions are available. These tools provide the necessary instrumentation and infrastructure to effectively collect and visualize trace data. By integrating these solutions into your Kubernetes deployments, you can gain comprehensive visibility into the behavior of your microservices and drive continuous improvement. 3. Integrate Kubernetes With APM Solutions To achieve comprehensive observability in Kubernetes, it is essential to integrate your environment with Application Performance Monitoring (APM) solutions. APM solutions provide advanced monitoring capabilities beyond traditional metrics and logs, offering insights into the performance and behavior of individual application components. One of the primary benefits of APM integration is the ability to detect and diagnose performance bottlenecks within your Kubernetes applications. With APM solutions, you can trace requests as they traverse through various services and identify areas of high latency or resource contention. Armed with this information, you can take targeted actions to optimize critical paths and improve overall application performance. Many APM solutions offer dedicated Kubernetes integrations that streamline the monitoring and management of containerized applications. These integrations provide pre-configured dashboards, alerts, and instrumentation libraries that simplify capturing and analyzing APM data within your Kubernetes environment. 4. Use Metrics-Based Monitoring Metrics-based monitoring forms the foundation of observability in Kubernetes. It involves collecting and analyzing key metrics that provide insights into your Kubernetes clusters and applications' health, performance, and resource utilization. When it comes to metrics-based monitoring in Kubernetes, there are several essential components to consider: Node-Level Metrics: Monitoring the resource utilization of individual nodes in your Kubernetes cluster is crucial for capacity planning and infrastructure optimization. Metrics such as CPU usage, memory usage, disk I/O, and network bandwidth help you identify potential resource bottlenecks and ensure optimal allocation. Pod-Level Metrics: Pods are the basic units of deployment in Kubernetes. Monitoring metrics related to pods allows you to assess their resource consumption, health, and overall performance. Key pod-level metrics include CPU and memory usage, network throughput, and request success rates. Container-Level Metrics: Containers within pods encapsulate individual application components. Monitoring container-level metrics helps you understand the resource consumption and behavior of specific application services or processes. Metrics such as CPU usage, memory usage, and file system utilization offer insights into container performance. Application-Specific Metrics: Depending on your application's requirements, you may need to monitor custom metrics specific to your business logic or domain. These metrics could include transaction rates, error rates, cache hit ratios, or other relevant performance indicators. Metric-based monitoring architecture diagram 5. Use Custom Kubernetes Events for Enhanced Observability Custom events communicate between Kubernetes components and between Kubernetes and external systems. They can signal important events, such as deployments, scaling operations, configuration changes, or even application-specific events within your containers. By leveraging custom events, you can achieve several benefits in terms of observability: Proactive Monitoring: Custom events allow you to define and monitor specific conditions that require attention. For example, you can create events to indicate when resources are running low, when pods experience failures, or when specific thresholds are exceeded. By capturing these events, you can proactively detect and address issues before they escalate. Contextual Information: Custom events can include additional contextual information that helps troubleshoot and analyze root causes. You can attach relevant details, such as error messages, timestamps, affected resources, or any other metadata that provides insights into the event's significance. This additional context aids in understanding and resolving issues more effectively. Integration with External Systems: Kubernetes custom events can be consumed by external systems, such as monitoring platforms or incident management tools. Integrating these systems allows you to trigger automated responses or notifications based on specific events. This streamlines incident response processes and ensures the timely resolution of critical issues. To leverage custom Kubernetes events, you can use Kubernetes event hooks, custom controllers, or even develop your event-driven applications using the Kubernetes API. By defining event triggers, capturing relevant information, and reacting to events, you can establish a robust observability framework that complements traditional monitoring approaches. 6. Incorporating Synthetic Monitoring for Proactive Observability Synthetic monitoring simulates user journeys or specific transactions that represent everyday interactions with your application. These synthetic tests can be scheduled to run regularly from various geographic locations, mimicking user behavior and measuring key performance indicators. There are several key benefits to incorporating synthetic monitoring in your Kubernetes environment: Proactive Issue Detection: Synthetic tests allow you to detect issues before real users are affected. By regularly simulating user interactions, you can identify performance degradations, errors, or unresponsive components. This early detection enables you to address issues proactively and maintain high application availability. Performance Benchmarking: Synthetic monitoring provides a baseline for performance benchmarking and SLA compliance. You can measure response times, latency, and availability under normal conditions by running consistent tests from different locations. These benchmarks serve as a reference for detecting anomalies and ensuring optimal performance. Geographic Insights: Synthetic tests can be configured to run from different geographic locations, providing insights into the performance of your application from various regions. This helps identify latency issues or regional disparities that may impact user experience. By optimizing your application's performance based on these insights, you can ensure a consistent user experience globally. You can leverage specialized tools to incorporate synthetic monitoring into your Kubernetes environment. These tools offer capabilities for creating and scheduling synthetic tests, monitoring performance metrics, and generating reports. An approach for gaining Kubernetes observability for traditional and microservice-based applications is by using third-party tools like Datadog, Splunk, Middleware, and Dynatrace. This tool captures metrics and events, providing several out-of-the-box reports, charts, and alerts to save time. Wrapping Up This blog explored six practical strategies for achieving Kubernetes observability in hybrid cloud environments. By utilizing centralized logging and log aggregation, leveraging distributed tracing, integrating Kubernetes with APM solutions, adopting metrics-based monitoring, incorporating custom Kubernetes events, and synthetic monitoring, you can enhance your understanding of the behavior and performance of your Kubernetes deployments. Implementing these strategies will provide comprehensive insights into your distributed systems, enabling efficient troubleshooting, performance optimization, proactive issue detection, and improved user experience. Whether you are operating a small-scale Kubernetes environment or managing a complex hybrid cloud deployment, applying these strategies will contribute to the success and reliability of your applications.
Docker technology has revolutionized the infrastructure management landscape in such a way that Docker has now become a synonym for containers. It is important to understand that all dockers are containers, but all containers are not dockers. While Docker is the most commonly used container technology, there are several other alternatives to Docker. In this blog, we will explore the Docker alternatives to your SaaS application. What Is Docker? Docker is an application containerization platform that is quite popular in IT circles. This open-source software enables developers to easily package applications along with their dependencies, OS, libraries, and other run-time-related resources in containers and automatically deploy them on any infrastructure. With cloud-native architecture and multi-cloud environments becoming popular choices for most organizations, Docker is the most convenient choice for building, sharing, deploying, and managing containers using APIs and simple commands in these environments. How Does It Work? Docker was initially created for the Linux platform. However, it now supports Apple OS X and Windows environments. Unlike virtual machines that encapsulate the entire OS, Docker isolates the resources in the OS kernel, enabling you to run multiple containers in the same operating system. Docker Engine is the main component of the Docker ecosystem. The Docker engine creates a server-side daemon and a client-side CLI. The server-side daemon hosts containers, images, data, and network images, while the client-side CLI enables you to communicate with the server using APIs. Docker containers are called Dockerfiles. What Are Docker’s Features and Benefits? Docker offers multiple benefits to organizations. Here are some of the key benefits offered by the tool: Increased Productivity Seamless Movement across Infrastructures Lightweight Containers Container Creation Automation Optimize Costs Extensive community support Increased Productivity Docker containers are easy to build, deploy and manage compared to virtual machines. They complement the cloud-native architecture and DevOps-based CI/CD pipelines allowing developers to deliver quality software faster. Seamless Movement Across Infrastructures Contrary to Linux containers that use machine-specific configurations, Docker containers are machine-agnostic, platform-agnostic, and OS-agnostic. As such, they are easily portable across any infrastructure. Lightweight Containers Each Docker container contains a single process making it extremely lightweight. At the same time, it allows you to update the app granularly. You can edit/modify a single process without taking down the application. Container Creation Automation Docker can take your application source code and automatically build a container. It can also take the existing container as a base image template and recreate containers enabling you to reuse containers. It also comes with a versioning mechanism, meaning each Docker image can be easily rolled back. Optimize Costs The ability to run more code on each server allows you to increase productivity with minimal costs. Optimized utilization of resources ultimately results in cost savings. In addition, standardized operations allow automation and save time and human resources, saving costs. Extensive Community Support Docker enjoys a large and vibrant community support. You enjoy the luxury of thousands of user-uploaded containers in the open-source registry instead of spending time reinventing the wheel. Why Is Microservices Better Than Monolith Architecture? Microservices architecture has become the mainstream architecture in recent times. Before understanding the importance of Microservices, it is important to know the downsides of a monolith architecture. Traditionally organizations used a monolithic architecture to build applications. This architecture uses a waterfall model for software development wherein the software is designed and developed first. The code is then sent to the QA team for testing purposes. When bugs are found, the code is sent back to the developers. After successful testing, the code is pushed to a testing environment and then to a live production environment. You must repeat the entire process for any code changes or updates. When you look at monolithic software from a logical perspective, you’ll find 3 layers: the front-end layer, the business layer, and the data layer. When a user makes a request, the business layer runs the business logic, the data layer manages the data, and the presentation layer displays it to the user. Code related to all 3 layers is maintained in a single codebase. Everyone commits changes to the same codebase. As the codebase grows, the complexity of managing it grows as well. When a developer is working on a single feature, he has to pull out the entire code to the local machine to make it work. Moreover, for every change, all artifacts have to be generated. The biggest problem is seamless coordination between teams. Monolithic architecture is not flexible, scalable, and is expensive. Microservices architecture solves all these challenges. Microservices architecture facilitates a cloud-native software development approach wherein the software is developed as loosely-coupled, independently deployable microservices that communicate with each other via APIs. Each service comes with its technology stack that can be categorized by business capability, allowing you to update or scale components with ease independently. Microservices uses a cloud-native architecture which is highly suitable for DevOps-based continuous delivery. As each app runs inside a container, you can easily make changes to the app inside the container without distributing the underlying infrastructure, gaining 99.99% uptime. CI/CD environments and the ability to easily move apps between various environments bring faster time to market. It also gives the flexibility to monitor market trends and quickly make changes to your application to always stay competitive. As each app runs in a separate container, developers have the luxury of choosing a diverse technology stack to build quality software instead of getting stuck with a specialized tool for a specific function. It also optimizes costs. Microservices and Docker While microservices architecture offers multiple benefits to organizations, it comes with certain challenges. Firstly, tracking services that are distributed across multiple hosts is a challenge. Secondly, as the microservices architecture scales, the number of services grows. As such, you need to allocate resources for each small host carefully. Moreover, certain services are so small that they don’t fully utilize the AWS EC2 instance. So, wasted resources can increase your overall costs. Thirdly, the microservices architecture comprises multiple services that are developed using multiple programming languages, technologies, and frameworks. When it comes to deploying microservices code, different sets of libraries and frameworks increase the complexity and costs. Docker technology solves all these challenges while delivering more. Docker enables you to package each microservice into a separate container. You can run multiple containers for a single instance, eliminating overprovisioning issues. Docker helps you abstract data storage by hosting data on a container and referencing it from other containers. Another advantage of this approach is persistent data storage, which is stored separately even after you destroy the container. The same approach can be applied to programming languages. You can group libraries and frameworks required for a language, package them inside a container, and link them to the required containers to efficiently manage cross-platform solutions. Using a log monitoring tool, you can monitor logs of individual containers to get clear insights into data flow and app performance. Why Do Some IT Managers Look For Docker Alternatives? While Docker is the most popular containerization technology, few IT managers are looking for Docker alternatives. Here are some reasons for them to do so. Docker is not easy to use. There is a steep learning curve. There are several issues that administrators have to handle. For instance, application performance monitoring doesn’t come out of the box. While Docker offers basic statistics, you need to integrate 3rd party tools for this purpose. Persistent data storage is not straightforward, so you must move data outside the container and securely store it. Container orchestration requires considerable expertise in configuring and managing an orchestration tool such as Docker Swarm, Kubernetes, or Apache Mesos. Docker containers require more layers to be secured when compared with a traditional stack. All these factors add up to the administrative burden. Without properly understanding the tool, running Docker becomes complex and expensive. However, the benefits of Docker outweigh these minor disadvantages. Moreover, these challenges will also greet you when you use alternatives to Docker. The time and effort spent in understanding Docker will reward you big time in the long run. In case you are still curious about alternatives to Docker, here are the top 10 Docker alternatives for your SaaS application: Docker Alternatives 1: Serverless Architecture Serverless architecture is a popular alternative to Docker containerization technology. As the name points out, a serverless architecture eliminates the need to manage a server or the underlying infrastructure to run an application. It doesn’t mean that servers are not needed but the cloud vendor handles that job. Developers can simply write an application code, package it and deploy it on any platform. They can choose to buy specific backend services needed for the app and deploy them on the required platform. Serverless architecture removes infrastructure management burdens or Docker/Kubernetes configuration complexities, scalability, and upgrades to deliver faster time to market. The ability to trigger events makes it a good choice for sequenced workflows and CI/CD pipelines. One of the biggest advantages of serverless computing is that you can extend applications beyond the cloud provider capacities. The flexibility to purchase specific functionalities required for the application significantly reduces costs. For instance, when you run docker containers and experience unpredictable traffic spikes, you’ll have to increase the ECS environment capacity. However, you’ll be paying more for the extra service containers and container instances. With a SaaS business, cost optimization is always a priority. When you implement serverless architecture using AWS Lambda, you will only scale functions that are required at the application runtime and not the entire infrastructure. That way, you can optimize costs. Moreover, it streamlines the deployment process allowing you to deploy multiple services without the configuration hassles. As you can run code from anywhere, you can use the nearest server to reduce latency. On the downside, application troubleshooting gets complex as the application grows, as you don’t know what’s happening inside. For this reason, serverless is termed as a black box technology. It is important to design the right app strategy. Otherwise, you will pay more for the expensive overhead human resource costs. Autodesk, Droplr, PhotoVogue, and AbstractAI are a few examples of companies using a serverless model. Docker Alternatives 2: Virtual Machines (VMs) from VMware Deploying virtual machines from VMware is another alternative for Docker. VMware is the leader in the virtualization segment. While Docker abstracts resources at the OS level, VMware virtualizes the hardware layer. One of the important offerings of VMware is the vSphere suite that contains different tools for facilitating cloud computing virtualization OS. vSphere uses ESXi, which is the hypervisor that enables multiple OSs to run on a single host. So, each OS runs with its dedicated resources. When it comes to containerization, VMware virtualizes the hardware along with underlying resources which means they are not fully isolated. Compared to Docker, VMware VMs are more resource-intensive and not lightweight and portable. For apps that require a full server, VMware works best. Though Docker apps are lightweight and run faster, VMware is quickly catching up. The current ESXi versions equal or outperform bare-metal machines. There are multiple options to use VMware for containerization tasks. For instance, you can install VMware vSphere ESXi hypervisor and then install any OS on top of it. Photon is an open-source, container-focused OS offered by VMware. It is optimized for cloud platforms such as Google Compute Engine and Amazon Elastic Compute. It offers a lifecycle management system called tdnf that is package-based and yum-compatible. Photon apps are lightweight, boot faster and consume a lesser footprint. Alternatively, you can run any Linux distributions on top of ESXi and run containers inside the OS. Docker containers contain more layers to be secured compared to VMware virtual machines. VMware is a good choice for environments requiring high security and persistent storage. VMware VMs are best suited for machine virtualization in an IaaS environment. While VMware VMs can be used as alternatives to Docker, they are not competing technologies and complement each other. To get the best of both worlds, you can run Docker containers inside a VMware virtual machine, making it ultra-lightweight and highly portable. Docker Alternatives 3: Monolithic Instances From AWS, Azure, and GCP Another alternative to Docker is to deploy your monolithic applications using AWS instances or Azure and GCP VMs. When you implement an AWS EC2 instance, it will install the basic components of the OS and other required packages. You can use Amazon Machine Image (AMI) to create VMs within the EC2 instance. They contain instructions to launch an instance. AMIs should be specified by developers in AWS. There are preconfigured AMIs for specific use cases. You can use Amazon ECS for orchestration purposes. AWS AMI images are not lightweight when compared with Docker containers. Docker Alternatives 4: Apache Mesos Apache Mesos is an open-source container and data center management software developed by Apache Software Foundation. It was formerly known as Nexus. Mesos is written in C++. It acts as an abstraction tool separating virtual resources from the physical hardware and provides resources to apps running on it. You can run apps such as Kubernetes, Elastic Search, Hadoop, Spark, etc., on top of Mesos. Mesos was created as a cluster management tool similar to Tupperware and Borg but differs in the fact that it is open-source. It uses a modular architecture. An important feature of Mesos is that it abstracts data center resources while grouping them into a single pool, enabling administrators to efficiently manage resource allocation tasks while delivering a consistent and superior user experience. It offers higher extensibility wherein you can add new applications and technologies without disturbing the clusters. It comes with a self-healing and fault-tolerant environment powered by Zookeeper. It reduces footprint and optimizes resources by allowing you to run diverse workloads on the same infrastructure. For instance, you can run traditional applications, distributed data systems, or stateless microservices on the same infrastructure while individually managing workloads. Apache Mesos allows you to run a diverse set of workloads on top of it, including container orchestration. For container orchestration, Mesos uses an orchestration framework called Marathon. It can easily run and manage mission-critical workloads, which makes it a favorite for enterprise architecture. Mesos doesn’t support service discovery. However, you can use apps running on Mesos, such as Kubernetes, for this purpose. It is best suited for data center environments that involve the complex configuration of several clusters of Kubernetes. It is categorized as a cluster management tool and enables organizations to run, build and manage resource-efficient distributed systems. Mesos allows you to isolate tasks within Linux containers and rapidly scales to hundreds and thousands of nodes. Easy scaling is what differentiates it from Docker. If you want to run a mission-critical and diverse set of workloads on a reliable platform along with portability across clouds and data centers, Mesos is a good choice. Twitter, Uber, Netflix, and Apple (Siri) are some of the popular enterprises that use Apache Mesos. Docker Alternatives 5: Cloud Foundry Container Technology Cloud Foundry is an open-source Platform-as-a-Service (PaaS) offering that the Cloud Foundry Foundation manages. The tool was written in Ruby, Go, and Java by VMware engineers and released in 2011. Cloud Foundry is popular for its continuous delivery support, facilitating product life cycle management. Its container-based architecture is famous for multi-cloud environments as it facilitates the deployment of containers on any platform while allowing you to seamlessly move workloads without disturbing the application. The key feature of Cloud Foundry is its ease of use which allows rapid prototyping. It allows you to write and edit code from any local IDE and deploy containerized apps to the cloud. Cloud Foundry picks the right build pack and automatically configures it for simple apps. The tool limits the number of opened ports for increased security. It supports dynamic routing for high performance. Application health monitoring and service discovery come out of the box. Cloud Foundry uses its own container format called Garden and a container orchestration engine called Diego. However, as Docker gained popularity and the majority of users started using Docker containers, Cloud Foundry had to support Docker. To do so, it encapsulated docker containers in Garden image format. However, moving those containers to other orchestration engines was not easy. Another challenge for Cloud Foundry came in the form of Kubernetes. While Cloud Foundry supported stateless applications, Kubernetes was flexible enough to support stateful and stateless applications. Bowing down to user preferences, Cloud Foundry replaced its orchestration engine Diego with Kubernetes. Without its container runtime and orchestration platform, the Cloud Foundry container ecosystem became less relevant. The failure of Cloud Foundry emphasizes the importance of making an organization future-proof. It also emphasizes the importance of using Docker and Kubernetes solutions. Docker Alternatives 6: Rkt from CoreOS Rkt from CoreOS is a popular alternative for Docker container technology. Rkt was introduced in 2014 as an interoperable, open-source, and secure containerization technology. It was formerly known as CoreOS Rocket. Rkt comes with a robust ecosystem and offers end-to-end container support, making it a strong contender in the containerization segment. The initial releases of Docker ran as root, enabling hackers to gain super-user privileges when the system was compromised. Rkt was designed with security and fine-grained control in mind. Rkt uses the appt container format and can be easily integrated with other solutions. It uses Pods for container configuration and gRPC framework for RESTful APIs. Kubernetes support comes out of the box. You can visually manage containers. Rkt offers a comprehensive container technology ecosystem. However, there is a steep learning curve. The community support is good. While the tool is open-source and free, Rkt charges for the support. For instance, Kubernetes support is $3000 for 10 servers. Verizon, Salesforce.com, CA Technologies, and Viacom are prominent enterprises using CoreOS Rkt. Though Rkt quickly became popular, its future is now in the dark. In 2018, RedHat acquired CoreOS. Since then, Rkt lost its direction. Adding to its woes is the withdrawal of support by the Cloud Native Computing Foundation (CNCF) in 2019. The Rkt Github page shows that the project has ended. Being an open-source project, anyone can fork it to develop their own code project. Docker Alternatives 7: LXD LXD is a container and virtual machine manager that is powered by the Linux Container technology (LXC and is managed by Canonical Ltd., a UK-based software company. It enables administrators to deliver a unified and superior user experience across the Linux ecosystem of VMs and containers. LXD is written in Go and uses a privileged daemon that can be accessed from the CLI via REST APIs using simple commands. LXD focuses on OS virtualization, allowing you to run multiple VMs or processes inside a single container. For instance, you can run Linux, Apache, MySQL, and PHP servers inside a single container. You can also run nested Docker containers. As it runs VMs that start quickly, it is cost-effective compared to regular VMs. LXD is more like a standalone OS that offers the benefits of both containers and VMs. As it uses a full OS image with network and storage dependencies, it is less portable when compared with Docker. LXD offers limited options when it comes to interoperability. You can integrate it with fewer technologies, such as OpenNebula or OpenStack. LXD runs only on Linux distributions and doesn’t support the Windows platform. LXD uses Ubuntu and Ubuntu-daily image repositories for Ubuntu distributions. For other distributions, it uses a public image server. Docker Alternatives 8: Podman Podman is a popular containerization technology that is rapidly maturing to compete with Docker. Unlike Docker, which uses a daemon for managing containers, Podman approaches containers with a Daemon-less technology called Conmon that handles the tasks of creating containers, storing the state and pulling out container images, etc. This ability to manage multiple containers out-of-the-box using pod-level commands is what makes Podman special. Compared to Docker technology, Common uses a lesser memory footprint. To create a pod, you need to create a manifest file using the declarative format and YAML data serialization language. Kubernetes consumes these manifests for its container orchestration framework. Podman is similar to Docker, which means you can interact with Podman containers using Docker commands. Being daemon-less, Podman is more secure with a lesser attack surface. To remotely access all supported resources, you can use REST APIs. Moreover, Podman containers don’t need root access, which means you can control them from being run as the host’s root user for greater security. Another ability that separates Podman from Docker is that you can group containers as pods and efficiently manage a cluster of containers. To use this feature in Docker, you need to create a docker-compose YAML file. The ability to efficiently manage pods is what gives Podman an advantage over other containerization tools. Here is the link to the Podman site. Docker Alternatives 9: Containerd Containerd is not a replacement for Docker, but it is actually a part of Docker technology. Containerd is a container runtime that creates, manages, and destroys containers in real time, implementing the Container Runtime Interface (CRI) specifications. It is a kernel abstraction layer that abstracts OS functionality or Syscalls. It pulls out docker images and sends them to the low-level runtime called runc that manages containers. When Docker was released in 2013, it was a comprehensive containerization tool that helped organizations create and run containers. But it lacked the container orchestration system. So, Kubernetes was introduced in 2014 to simplify container orchestration processes. However, Kubernetes had to use Docker to interact with containers. As Kubernetes only needed components that are required to manage containers, it was looking for a way to bypass certain aspects of the tool. The result was Dockershim. While Kubernetes developed Dockershim to bypass Docker, Docker came up with a container orchestration tool called Docker Swarm that performed the tasks of Kubernetes. As the containerization technology evolved and multiple 3rd party integrations came into existence, managing docker containers became a complex task. To standardize container technology, Open Container Initiative (OCI) was introduced. The job of OCI was to define specifications for container and runtime standards. To make docker technology more modular, Docker extracted its runtime into another tool called containerd which was later donated to the Cloud Native Computing Foundation (CNCF). With containerd, Kubernetes was able to access low-level container components without Docker. In today’s distributed network systems, containerd helps administrators to abstract Syscalls and provide users with what they need. The latest containerd version comes with a complete storage and distribution system supporting Docker images and OCI formats. To summarize, containerd helps you build a container platform without worrying about the underlying OS. To learn more about containerd, visit this link. Docker Alternatives 10: runC Similar to containerd, runC is a part of the Docker container ecosystem that provides low-level functionality for containers. In 2015, Docker released runC as a standalone container runtime tool. As Docker is a comprehensive containerization technology that runs distributed apps on various platforms and environments, it uses a sandboxing environment to abstract the required components of the underlying host without rewriting the app. To make this abstraction possible, Docker integrated these features into a unified low-level container runtime component called runC. runC is highly portable, secure, lightweight, and scalable, making it suitable for large deployments. As there is no dependency on the Docker platform, runC makes standard containers available everywhere. runC offers native support for Windows and Linux containers and hardware manufacturers such as Arm, IBM, Intel, Qualcomm and bleeding-edge hardware features such as tpm and DPSK. runC container configuration format is governed by the Open Container Project. It is OCI-complaint and implements OCI specs. Extra Docker Alternative: Vagrant Vagrant is an open-source software tool from Hashicorp that helps organizations to build and manage portable software deployment environments such as VirtualBox. With its easy workflow and automation, Vagrant enables developers to set up portable development environments automatically. While Docker can cost-effectively run software on a containerized Windows, Linux, and macOS system, it doesn’t offer full functionality on certain operating systems such as BSD. When you are deploying apps to BSD environments, vagrant production parity is better than Docker. However, Vagrant doesn’t offer full containerization features. When you are using microservices-heavy environments, vagrant lacks the full functionality. So, vagrant is useful when you are looking for consistent and easy development workflows or when BDS deployments are involved. The direct alternative to Docker technology is the serverless architecture. However, it makes organizations heavily dependent on cloud providers. It doesn’t suit long-term applications as well. VMware doesn’t offer a comprehensive containerization system. Rkt and Cloud Foundry are heading toward a dead end. Apache Mesos was on the verge of becoming obsolete but got the support of the members at the last hour. Containerd and runC are low-level tools and work well with high-level container software such as Docker. Most of the Docker alternatives are developer-focused. Docker offers a comprehensive and robust container ecosystem that suits DevOps, microservices, and cloud-native architectures! Container Orchestration Solutions When you use containers, you need a container orchestration tool to manage deployments of container clusters. Container orchestration is about automating container management tasks such as scheduling, deploying, scaling, and monitoring containers. For instance, in a containerized environment, each server runs multiple applications that are written in different programming languages using different technologies and frameworks. When you scale this setup to hundreds and thousands of deployments, it becomes a challenge to maintain operational efficiencies and security. And if you have to move them between on-premise, cloud, and multi-cloud environments, the complexity adds up. Identifying overprovisioning of resources, load-balancing across multiple servers, updates, and rollbacks, and implementing organization security standards across the infrastructure are some additional challenges you face. Manually performing these operations for enterprise-level deployments is not feasible. A container orchestration tool helps you in this regard. Container orchestration uses a declarative programming model wherein you define the required outcome, and the platform will ensure that the environment is maintained at that desired state. It means your deployments always match the predefined state. When you deploy containers, the orchestration tool will automatically schedule those deployments choosing the best available host. It simplifies container management operations, boosts resilience, and adds security to operations. Kubernetes, Docker Swarm, Apache Mesos are some of the popular container orchestration tools available in the market. Kubernetes has become so popular in recent times that many container management tools were built on top of Kubernetes, such as Amazon Kubernetes Services (AKS), Google Kubernetes Engine (GKS), Amazon Elastic Container Service for Kubernetes (EKS), etc. Container Orchestration Solution 1: Kubernetes Kubernetes, or K8S is the most popular container orchestration tool that helps organizations efficiently manage containers at a massive scale. It was released in 2014 by Google engineers and is now offered as an open-source tool. The tool is written in Go and uses declarative programming and YAML-based deployment. Kubernetes is a comprehensive container management and container orchestration engine. It offers load-balancing, auto-scaling, secrets management, and volume management out-of-the-box. It uses ‘pods’ that allow you to group containers and provision resources based on predefined values. It also supports web UI to view and manage clusters of containers. Kubernetes uses serverless architecture, is vendor-agnostic, and comes with built-in security. It offers comprehensive support for Docker containers. It also supports the rkt engine from CoreOS. Kubernetes enjoys vibrant community support. Google Container Engine (GCE) natively supports Kubernetes. Similarly, Azure and Redhat OpenShift also support Kubernetes. However, Kubernetes is not easy to configure and use. There is a steep learning curve. Container Orchestration Solution 2: Amazon ECS Amazon Elastic Container Service (ECS) is a comprehensive container orchestration tool offered by Amazon for Docker containers. It allows organizations to efficiently run clusters of VMs on the Amazon cloud while being able to manage container groups on these VMs across the infrastructure easily. Running a serverless architecture, ECS deploys VMs and manages containers, so you have to operate containers without worrying about managing VMs. You can define apps as tasks using JSON. The biggest USP of ECS is its simplicity and ease of use. Deployments can be made right from the AWS management console. It is free to use. ECS comes integrated with an extensive set of AWS tools such as CloudWatch, IAM, CloudFormation, ELB, etc. which means you don’t have to look otherwise for container management tasks. You can write code and programmatically manage container operations, perform health checks, or easily access other AWS services. Leveraging the immutable nature of containers, you can use AWS spot instances and save costs up to 90%. All containers are launched inside a virtual private cloud so that you can enjoy added security out-of-the-box. Container Orchestration Solution 3: Amazon EKS Amazon Elastic Kubernetes Service is another powerful offering from AWS to manage Kubernetes running on the AWS cloud efficiently. It is a certified Kubernetes tool, meaning you can run all the tools used in the Kubernetes ecosystem. It supports hybrid and multi-cloud environments. While AWS ECS is easy to use, EKS can take some time to get used to as it is a complex task deploying and configuring CloudFormation or Kops templates. However, it allows more customization and portability across multi-cloud and hybrid environments and best suits large deployments. Amazon EKS adds $144 per cluster per month to your AWS bill. Container Orchestration Solution 4: Azure Kubernetes Service Azure Kubernetes Service (AKS) is a managed Kubernetes service offered by Azure. Formerly, it was called Azure Container Service and supported Docker Swarm, Mesos, and Kubernetes. The best thing with AKS is that the tool is quickly updated in line with Kubernetes’ newer releases compared with EKS and GKE. If you are a strong Microsoft user, AKS works well for you as you can easily integrate it with other Microsoft services. For instance, you can have seamless integration with Azure Active Directory. Azure Monitor and Application Insights help you monitor and log environmental issues. The Azure policy is integrated with AKS. Automatic node health repair is a useful feature of the tool. A Kubernetes extension in Visual Studio Code allows you to edit and deploy Kubernetes from the editor. The developer community is good AKS charges only for the node, and the Control plane is free. On the downside, AKS offers 99.9% SLAs only when matched with the chargeable Azure Availability Zones. For free clusters, the uptime SLA is 99.5%. Container Orchestration Solution 5: Google Kubernetes Engine Google Kubernetes Engine is the managed Kubernetes service offered by Google. As Google engineers developed Kubernetes, Google stood first in introducing the managed Kubernetes services in the form of GKE. Moreover, it offers the most advanced solutions compared to EKS and AKS. It automatically updates master and node machines. CLI support is available. You can use the Stackdriver tool for resource monitoring. Autoscaling is available out of the box. It supports node pools wherein you can choose the best available resource to deploy each service. When it comes to pricing, cluster management is free. You will be charged for the resources used. EKS vs. AKS vs. GKE Which Is the Best Tool for Container Orchestration? Using the right technology stack, you can efficiently schedule containers, gain high availability, perform health checks, and perform load balancing and service discovery. When it comes to containerization technology, Docker is the most comprehensive and feature-rich container ecosystem that is second to none. Docker is a de facto containerization standard. When it comes to container orchestration tools, Kubernetes is the best choice. It offers robust performance to efficiently manage thousands of clusters while allowing you to move workloads between different platforms seamlessly. Going for a Docker alternative can be risky. As mentioned above, organizations that used Cloud Foundry and Rkt had to realign their containerization strategies. I recommend using AWS ECS or EKS with Docker! AWS ECS with Docker is a powerful and cost-effective choice for organizations that implement simple app deployments. If your organization deals with containerization at a massive scale, AWS EKS with Docker is a good choice. AWS is the leading provider of cloud platform solutions. AWS EKS comes with high interoperability and flexibility and is cost-effective. So, AWS ECS or EKS with Docker gives you the best of the breed! Conclusion As businesses aggressively embrace cloud-native architecture and move workloads to the cloud, containerization has become mainstream in recent times. With its robust standalone ecosystem, Docker has become the de facto standard for containerization solutions. Though Docker is implemented by millions of users across the globe, there are other containerization tools available in the market that cater to specific needs. However, when exploring new Docker alternatives, it is important to clearly identify your containerization requirements and check the alternatives to Docker host OS and use cases before making a decision.
Importance of Business Continuity Business continuity is having a strategy to deal with major disruptions and disasters. Disaster recovery (DR) helps an organization to recover and resume business critical functions or normal operations when there are disruptions or disasters. High-availability clusters are groups of servers that support business-critical applications. Applications are run on a primary server, and in the event of a failure, application operation is moved to the secondary server, where they continue to operate. DR strategies work significantly different compared to pre-container days. Then the relationship is simple and straightforward, mapping one-to-one between the application and the application server. Taking a backup or snapshot of everything to restore in case of failure is the dated approach. Disaster Recovery Types Before we discuss different DR approaches, it’s important to understand the different types of disaster recovery sites. There are three types of DR sites: cold site, warm site, and hot site. Cold Site: This is the basic option with minimal or no hardware/equipment. There will be no connectivity, backup, or data synchronization. Although this is one of the most basic, least expensive options, this is not ready to take the hit of failover. Warm Site: This type has few options upgraded compared to the cold site. There can be options for network connectivity and hardware. This has data synchronization, and the failover can be addressed within hours or days, depending on the type of setup. Hot Site: This is the most premium option in the lot, with fully equipped hardware and connectivity with near-perfect data synchronization. This is an expensive type of setup compared to the other two types of sites. The impact of a disaster on an organization can be very expensive, so it is important to make the best choice in the first place. Disaster recovery management can mitigate the impact of the disaster due to its disruptive incidents. No approach/option is perfect, and it can vary depending on the requirement and type of the business/organization. Traditional DR Approaches Option 1: We can have a cold standby by having the periodic backups, or you can have a warm standby by having the batch/scheduled replication of the data. The major difference here is the type of replication from the main primary data center to DR. In this option, application, and data are not available until the standby is bought online, and there are high chances of data loss due to periodic/scheduled backups. Option 2: In this case, we have continuous replication in place with a very minimal baseline time between the replication. This is a hot standby, and the other one is the hot standby with read replicas. That means both will be the same in terms of reading the data, while the data can be written only at the primary data center location. Standby can be available for immediate use in case of a disruption. Option 3: This is the most robust way of having the disaster recovery setup. In this case, you need to maintain two active data centers with real-time data replication seamlessly. This model required an advanced setup with the latest technology and tool stack. This is a comprehensive model, but it can be expensive. Configuration and maintenance can be complicated — niche skills are required to run this kind of setup. Disaster Recovery for Containers Now, let’s discuss how we can do disaster recovery management using the containerized ecosystem. Kubernetes Cluster: When you deploy Kubernetes, you get a cluster. A Kubernetes cluster consists of a set of worker machines, called nodes, that run containerized applications. Every cluster has at least one worker node. The worker node(s) host the Pods that are the components of the application workload. The control plane manages the worker nodes and the Pods in the cluster. In production environments, the control plane usually runs across multiple computers, and a cluster usually runs multiple nodes, providing fault tolerance and high availability. To learn more about cluster components, refer to the link. In this setup, the application will not be deployed into one defined server. It can be scheduled on any of the worker nodes. Capacity management will be done in the cluster as Kubernetes is an orchestration tool — deployments are assigned as per the availability of the nodes. What We Need To Backup We understand the nature of the Kubernetes ecosystem is very dynamic, and it makes it harder for more traditional backup systems and techniques to work well in the context of Kubernetes nodes and applications. Both RPO and RTO may need to be far stricter since applications need to constantly be up and running. Below is the list of important things for backup: Configurations Container Images Policies Certificates User Access Controls Persistent Volumes There are two types of components in the cluster stateful and stateless. State full components are mindful, expect a response, track information, and resend the request if no response is received. ETCD and Volumes are the stateful components. While remaining of the Kubernetes plane, worker nodes and workloads are the stateless components. It’s important to take the backup of all the stateful components. ETCD Backup ETCD is a distributed key-value store used to hold and manage the critical information that distributed systems need to keep running. Most notably, it manages the configuration data, state data, and metadata for Kubernetes, the popular container orchestration platform. We can back up ETCD by using the built-in snapshot feature of the ETCD. The other option is to take a snapshot of the storage volume. The third option is to take the backup of the Kubernetes objects/resources. The restore can be done from the snapshot, volume, and objects respectively. Persistent Volume Backup Kubernetes persistent volumes are administrator-provisioned volumes. These are created with a particular filesystem, size, and identifying characteristics such as volume IDs and names. A Kubernetes persistent volume has the following attributes It is provisioned either dynamically or by an administrator Created with a particular filesystem Has a particular size Has identifying characteristics such as volume IDs and a name In order for pods to start using these volumes, they need to be claimed, and the claim referenced in the spec for a pod. A Persistent Volume Claim describes the amount and characteristics of the storage required by the pod, finds any matching persistent volumes, and claims these. Storage Classes describe default volume information. Create volume snapshot from persistent volumes: YAML apiVersion: snapshot.storage.k8s.io/v1 kind: VolumeSnapshot metadata: name: new-snapshot-test spec: volumeSnapshotClassName: csi-hostpath-snapclass source: persistentVolumeClaimName: pvc-test Restore the Volume Snapshot You can reference a VolumeSnapshot in a PersistentVolumeClaim to provision a new volume with data from an existing volume or restore a volume to a state that you captured in the snapshot. To reference a VolumeSnapshotin a PersistentVolumeClaim, add the data source field to your PersistentVolumeClaim. In this example, you reference the VolumeSnapshot that you created in a new PersistentVolumeClaim and update the Deployment to use the new claim. YAML apiVersion: v1 kind: PersistentVolumeClaim metadata: name: pvc-restore spec: dataSource: name: my-snapshot kind: VolumeSnapshot apiGroup: snapshot.storage.k8s.io storageClassName: standard-rwo accessModes: - ReadWriteOnce resources: requests: storage: 1Gi Restore Kubernetes Platform Operations We can restore the k8s platform in two ways: rebuilding or restoring. Below are a few strategies to restore platform operations: Platform backup and restore We need to run this operation using a backup tool that will take the backup from the source cluster related to applications ETCD, configurations, and images, which will store this information in the backup repository. Once the backup is done, you need to run this restore operation from the target cluster using the same backup tool and can restore the information from the replication repository. Restore VMs from the snapshot This strategy is only for ETCD recovery. The steps involved in restoring a Kubernetes cluster from an ETCD snapshot can vary depending on how the Kubernetes environment is set up, but the steps described below are intended to familiarize you with the basic process. It is also worth noting that the process described below replaces the existing ETCD database, so if an organization needs to preserve the database contents, it must create a backup copy of a database before moving forward. Install the ETCD client Identify appropriate IP addresses Edit a manifest file to update paths Locate the Spec section Add the initial cluster token to the file Update the mount path Replace the name of the hose path Verify the newly restored database Failover to another cluster In case of failure of one cluster, we use a failover cluster. These clusters are identical to infrastructure and stateless applications. However, the configurations and secrets can be different. At setup, these two types of clusters can be in sync with CI/CD. This can be expensive in the setup and maintenance aspects as we have double clusters run in parallel. Failover to another site in case of multisite In this strategy, we need to build a cluster across multiple sites. This is applicable to the cloud as well as on-premises. It is always recommended to have more than two sites and an odd number of sites due to the ETCD quorum to keep the cluster operation in case of failure at one site. This is a popular and efficient way compared to other options. Savings yield on how we manage the capacity. Rebuild from the scratch This is called GitOps, and the concept here is, why don’t we rebuild the system in the case of failure instead of repair? If there is a failure in the cluster, we can build the entire cluster from the git wrapper, and no backup is required for the ETCD. This works perfectly for the stateless applications, but if you are combining this with persistence data, then we need to look for options on backing and restoring the storage. Conclusion/Summary It is very important to plan and design your own disaster recovery strategy depending on the requirement, complexity, and budget. Planning well ahead of time is very important. We need to know what the tolerance level of the infrastructure is and how much service loss you can bear, etc., to design a cost-efficient disaster recovery strategy. One other key understanding required is about the workloads. Are we running stateful or stateless workloads? We need to know what the underlying technologies and dependencies are connected to the backup and restore. when it comes to DevOps for mission-critical cloud-native applications requiring 100% uptime and availability. In the event of a disaster, the applications need to continue to be available and perform without a hitch. It is important to understand the components and strategies they need to consider for effective disaster recovery for Kubernetes.
In this blog, you will learn how to use Podman with the built-in equivalent for Docker Compose. You will learn how to use Podman kube play and how to deploy your Podman pod to a local Minikube cluster. Enjoy! Introduction The first reaction to the short intro will be: “You need to use Podman Compose for that!” However, this blog is not about Podman Compose, but about using the basic concept of Podman by using pods and deploying them to a Kubernetes cluster. Podman Compose is a different concept and deserves its own blog. So, what will you do and learn in this blog? You will create a pod locally, generate a Kubernetes YAML file for it, and use the YAML file to recreate the pod locally, but also for deploying it to a local Minikube Kubernetes cluster. You will notice that Podman has built-in functionality which resembles the functionality of Docker Compose. In other words, you can accomplish exactly the same thing. So, you might not need something like Podman Compose at all! Sources used in this blog are available at GitHub and the container image is available at DockerHub. The container image is built in a previous blog, you might want to check it out when you want to know more about Podman compared to Docker. The image contains a basic Spring Boot application with one REST endpoint which returns a hello message. Prerequisites The prerequisites needed for this blog are: Basic Linux knowledge Basic container knowledge Basic Podman knowledge Basic Kubernetes knowledge Create Pod Locally The first thing to do is to create a Podman pod locally. The pod contains two containers based on the same image. Create the pod with the following command. The port range 8080 up to and including 8081 is exposed externally. One container will expose the endpoint at port 8080 and the other container at port 8081. Shell $ podman pod create -p 8080-8081:8080-8081 --name hello-pod Create both containers. With the environment variable added to container 1, you can configure the Spring Boot application to run on a different port. Otherwise, the default port 8080 is used. Shell $ podman create --pod hello-pod --name mypodmanplanet-1 --env 'SERVER_PORT=8081' docker.io/mydeveloperplanet/mypodmanplanet:0.0.1-SNAPSHOT env $ podman create --pod hello-pod --name mypodmanplanet-2 docker.io/mydeveloperplanet/mypodmanplanet:0.0.1-SNAPSHOT Start the pod. Shell $ podman pod start hello-pod Check the status of the pod. It is in the running state. Shell $ podman pod ps POD ID NAME STATUS CREATED INFRA ID # OF CONTAINERS bef893686468 hello-pod Running 3 minutes ago 1f0c0ebf2248 3 Verify whether you can access both endpoints. Both endpoints return the same hello message. Shell $ curl http://localhost:8080/hello Hello Podman! $ curl http://localhost:8081/hello Hello Podman! Generate Kubernetes YAML Based on the local pod you created, you can generate a Kubernetes YAML file which will contain the configuration of your pod. The generate kube command is used for that, followed by the pod name hello-pod, followed by the file you want to generate the configuration to. Shell $ podman generate kube hello-pod -f kubernetes/hello-pod-1-initial.yaml Take a closer look at the generated Kubernetes YAML file. It contains the pod definition and the two containers that need to run in the pod. YAML # Save the output of this file and use kubectl create -f to import # it into Kubernetes. # # Created with podman-3.4.4 apiVersion: v1 kind: Pod metadata: creationTimestamp: "2023-05-13T08:21:40Z" labels: app: hello-pod name: hello-pod spec: containers: - args: - env image: docker.io/mydeveloperplanet/mypodmanplanet:0.0.1-SNAPSHOT name: mypodmanplanet-1 ports: - containerPort: 8080 hostPort: 8080 - containerPort: 8081 hostPort: 8081 resources: {} securityContext: capabilities: drop: - CAP_MKNOD - CAP_NET_RAW - CAP_AUDIT_WRITE - image: docker.io/mydeveloperplanet/mypodmanplanet:0.0.1-SNAPSHOT name: mypodmanplanet-2 resources: {} securityContext: capabilities: drop: - CAP_MKNOD - CAP_NET_RAW - CAP_AUDIT_WRITE restartPolicy: Never status: {} Recreate the Pod Now that the configuration is stored in a Kubernetes YAML file, you can verify whether the pod can be recreated based on this file. If this is the case, you can commit this file to a Git repository and you and your colleagues can use it to set up a development environment for example. First, stop and remove the running pod. Shell $ podman pod stop hello-pod $ podman pod rm hello-pod Verify whether the containers and pod are really removed. Shell $ podman pod ps POD ID NAME STATUS CREATED INFRA ID # OF CONTAINERS $ podman ps --all CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES Start the pod based on the generated Kubernetes YAML file with the play kube command. Shell $ podman play kube kubernetes/hello-pod-1-initial.yaml Verify the status of the pod. You will notice that the status is degraded. Shell $ podman pod ps POD ID NAME STATUS CREATED INFRA ID # OF CONTAINERS a7eac7991adc hello-pod Degraded 53 seconds ago 8471932f5741 3 Verify the status of the containers. The status of container hello-pod-mypodmanplanet-2 shows you that something went wrong with this container. It exited for some reason. Shell $ podman ps --all CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 8471932f5741 k8s.gcr.io/pause:3.5 About a minute ago Up About a minute ago 0.0.0.0:8080-8081->8080-8081/tcp a7eac7991adc-infra 0f25b7105d2b docker.io/mydeveloperplanet/mypodmanplanet:0.0.1-SNAPSHOT env About a minute ago Up About a minute ago 0.0.0.0:8080-8081->8080-8081/tcp hello-pod-mypodmanplanet-1 840f307cb67b docker.io/mydeveloperplanet/mypodmanplanet:0.0.1-SNAPSHOT About a minute ago Exited (1) About a minute ago 0.0.0.0:8080-8081->8080-8081/tcp hello-pod-mypodmanplanet-2 Stop and remove the pod again. Fix the Kubernetes YAML File What went wrong here? When you take a closer look at the generated Kubernetes YAML file, you will notice that the generation of the file was a bit messed up for the container mypodmanplanet-1. The environment variable is not correctly setup and it contains port mappings for port 8080 and for port 8081. The container mypodmanplanet-2 does not contain any port mapping at all. YAML ... spec: containers: - args: - env image: docker.io/mydeveloperplanet/mypodmanplanet:0.0.1-SNAPSHOT name: mypodmanplanet-1 ports: - containerPort: 8080 hostPort: 8080 - containerPort: 8081 hostPort: 8081 ... Let’s fix this in file hello-pod-2-with-env.yaml. Add the environment variable to the container mypodmanplanet-1 and remove the port mapping for port 8080. Add the port mapping for port 8080 to the container mypodmanplanet-2. YAML ... spec: containers: - env: - name: SERVER_PORT value: 8081 image: docker.io/mydeveloperplanet/mypodmanplanet:0.0.1-SNAPSHOT name: mypodmanplanet-1 ports: - containerPort: 8081 hostPort: 8081 resources: {} ... - image: docker.io/mydeveloperplanet/mypodmanplanet:0.0.1-SNAPSHOT name: mypodmanplanet-2 ports: - containerPort: 8080 hostPort: 8080 resources: {} ... Start the pod again based on this new configuration. Shell $ podman play kube kubernetes/hello-pod-2-with-env.yaml Verify the status of the pod. It is now in the running state. Shell $ podman pod ps POD ID NAME STATUS CREATED INFRA ID # OF CONTAINERS ea387d67b646 hello-pod Running 41 seconds ago c62a0f7f1975 3 Verify the status of the containers. All are running now. Shell $ podman ps --all CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES c62a0f7f1975 k8s.gcr.io/pause:3.5 About a minute ago Up About a minute ago 0.0.0.0:8080-8081->8080-8081/tcp ea387d67b646-infra 97c47b2420cf docker.io/mydeveloperplanet/mypodmanplanet:0.0.1-SNAPSHOT About a minute ago Up About a minute ago 0.0.0.0:8080-8081->8080-8081/tcp hello-pod-mypodmanplanet-1 16875b941867 docker.io/mydeveloperplanet/mypodmanplanet:0.0.1-SNAPSHOT About a minute ago Up About a minute ago 0.0.0.0:8080-8081->8080-8081/tcp hello-pod-mypodmanplanet-2 The endpoints are accessible as well. Shell $ curl http://localhost:8080/hello Hello Podman! $ curl http://localhost:8081/hello Hello Podman! Minikube Let’s see whether you can use the generated Kubernetes YAML file in order to run the pod in a Minikube Kubernetes cluster. Minikube allows you to run a Kubernetes cluster locally, mainly used during application development. Generate Kubernetes YAML You need to generate the Kubernetes YAML file just like you did before, but this time you need to add the -s option to the command. This will generate a Kubernetes service, which allows you to access the containers from outside the Kubernetes cluster. Execute the following command: Shell $ podman generate kube hello-pod -s -f kubernetes/hello-pod-3-minikube.yaml Replace the pod part from hello-pod-2-with-env.yaml in this newly generated YAML file hello-pod-3-minikube.yaml because the issues with the environment variable and port mapping are again present in this newly generated file. The generated YAML file contains the following extra service description: YAML apiVersion: v1 kind: Service metadata: creationTimestamp: "2023-05-18T07:07:31Z" labels: app: hello-pod name: hello-pod spec: ports: - name: "8081" nodePort: 32696 port: 8081 targetPort: 8081 - name: "8080" nodePort: 31435 port: 8080 targetPort: 8080 selector: app: hello-pod type: NodePort --- ... In short, without going into too much detail, this service will map port 8080 to external port 31435 and it will map port 8081 to external port 32696. External means external to the pod. Before continuing, stop and remove the locally running pod. Install and Start Minikube If you have not installed Minikube yet, it is now time to do so. The installation instructions can be found here. The following instructions are executed on a Ubuntu 22.04 OS. Shell $ curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64 $ sudo install minikube-linux-amd64 /usr/local/bin/minikube Start Minikube. Shell $ minikube start minikube v1.30.1 on Ubuntu 22.04 Using the docker driver based on existing profile Starting control plane node minikube in cluster minikube Pulling base image ... Updating the running docker "minikube" container ... Preparing Kubernetes v1.26.3 on Docker 23.0.2 ... Using image gcr.io/k8s-minikube/storage-provisioner:v5 Verifying Kubernetes components... Enabled addons: storage-provisioner, default-storageclass kubectl not found. If you need it, try: 'minikube kubectl -- get pods -A' Done! kubectl is now configured to use "minikube" cluster and "default" namespace by default Verify whether the default Minikube Pods are running and whether kubectl is available. You will need kubectl to load the Kubernetes YAML file. Shell $ minikube kubectl -- get pods -A NAMESPACE NAME READY STATUS RESTARTS AGE kube-system coredns-787d4945fb-qb56z 1/1 Running 1 (86s ago) 2m52s kube-system etcd-minikube 1/1 Running 2 (85s ago) 3m7s kube-system kube-apiserver-minikube 1/1 Running 2 (75s ago) 3m7s kube-system kube-controller-manager-minikube 1/1 Running 2 (85s ago) 3m4s kube-system kube-proxy-cl5bh 1/1 Running 2 (85s ago) 2m52s kube-system kube-scheduler-minikube 1/1 Running 1 (91s ago) 3m7s kube-system storage-provisioner 1/1 Running 0 63s Create Pod In Minikube Now that a Minikube cluster is running, you can create the pod based on the Kubernetes YAML file. Shell $ minikube kubectl -- create -f kubernetes/hello-pod-3-minikube.yaml service/hello-pod created Error from server (BadRequest): error when creating "kubernetes/hello-pod-3-minikube.yaml": Pod in version "v1" cannot be handled as a Pod: json: cannot unmarshal number into Go struct field EnvVar.spec.containers.env.value of type string Unfortunately, this returns an error. The reason is that the environment variable port value must be enclosed with double quotes. Replace the following snippet: YAML spec: containers: - env: - name: SERVER_PORT value: 8081 With the following: YAML spec: containers: - env: - name: SERVER_PORT value: "8081" The new Kubernetes YAML file is hello-pod-4-minikube.yaml. Execute the command again, but this time with the new Kubernetes YAML file. Shell $ minikube kubectl -- create -f kubernetes/hello-pod-4-minikube.yaml pod/hello-pod created The Service "hello-pod" is invalid: spec.ports[0].nodePort: Invalid value: 32696: provided port is already allocated Now an error is returned indicating that the external port 32696 is already allocated. Verify whether any service is running. Shell $ minikube kubectl -- get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE hello-pod NodePort 10.99.254.70 <none> 8081:32696/TCP,8080:31435/TCP 4m59s kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 6m56s It appears that the Kubernetes service is created although initially, the creation of the pod failed. Also, the pod is created. Shell $ minikube kubectl -- get pod NAME READY STATUS RESTARTS AGE hello-pod 2/2 Running 0 3m8s Remove the pod and the service. Shell $ minikube kubectl delete pod hello-pod pod "hello-pod" deleted $ minikube kubectl delete svc hello-pod service "hello-pod" deleted Final Attempt Create the pod and the service again based on the hello-pod-4-minikube.yaml file. This time it is successful. Shell $ minikube kubectl -- create -f kubernetes/hello-pod-4-minikube.yaml service/hello-pod created pod/hello-pod created Verify the status of the service. The service is created. Shell $ minikube kubectl -- get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE hello-pod NodePort 10.105.243.28 <none> 8081:32696/TCP,8080:31435/TCP 71s kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 12m Check the status of the pod. It is running. Shell $ minikube kubectl -- get pods NAME READY STATUS RESTARTS AGE hello-pod 2/2 Running 0 118s But can you access the endpoints? Retrieve the IP address of the Minikube cluster. Shell $ minikube kubectl -- describe pods | grep Node: Node: minikube/192.168.49.2 Verify whether the endpoints can be accessed using the Minikube IP address and the external ports defined in the service. Beware that the external ports can be different when you generated the YAML files yourself. Shell $ curl http://192.168.49.2:32696/hello Hello Podman! $ curl http://192.168.49.2:31435/hello Hello Podman! Cleanup In order to clean up, you first stop the local Kubernetes cluster. Shell $ minikube stop Stopping node "minikube" ... Powering off "minikube" via SSH ... 1 node stopped. Finally, you delete the cluster. Shell $ minikube delete Deleting "minikube" in docker ... Deleting container "minikube" ... Removing /home/<user>/.minikube/machines/minikube ... Removed all traces of the "minikube" cluster. Conclusion In this blog, you created a local pod, generated a Kubernetes YAML file for it with Podman, and used this YAML file to recreate the pod locally and to create the pod into a Minikube Kubernetes cluster. It did not work out of the box, but with some minor tweaks, it worked just fine. The Kubernetes YAML file can be stored in a Git repository and shared with your colleagues just like you would do with a Docker Compose file. This way, a development environment can be set up and shared quite easily.
Yitaek Hwang
Software Engineer,
NYDIG
Abhishek Gupta
Principal Developer Advocate,
AWS
Alan Hohn
Director, Software Strategy,
Lockheed Martin
Marija Naumovska
Product Manager,
Microtica