Integrating PostgreSQL Databases with ANF: Join this workshop to learn how to create a PostgreSQL server using Instaclustr’s managed service
Mobile Database Essentials: Assess data needs, storage requirements, and more when leveraging databases for cloud and edge applications.
The cultural movement that is DevOps — which, in short, encourages close collaboration among developers, IT operations, and system admins — also encompasses a set of tools, techniques, and practices. As part of DevOps, the CI/CD process incorporates automation into the SDLC, allowing teams to integrate and deliver incremental changes iteratively and at a quicker pace. Together, these human- and technology-oriented elements enable smooth, fast, and quality software releases. This Zone is your go-to source on all things DevOps and CI/CD (end to end!).
Docker and Kubernetes Transforming Modern Deployment
Deploy Like a Pro: Mastering the Best Practices for Code Deployment
This GitHub Actions workflow builds a Docker image, tags it, and pushes it to one of three container registries. Here’s a Gist with the boilerplate code. Building Docker Images and Pushing to a Container Registry If you haven’t yet integrated GitHub Actions with your private container registry, this tutorial is a good place to start. The resulting workflow will log in to your private registry using the provided credentials, build existing Docker images by path, and push the resulting images to a container registry. We’ll discuss how to do this for GHCR, Docker Hub, and Harbor. Benefits and Use Cases Building and pushing Docker images using your CI/CD platform is a best practice. Here’s how it can improve your developer QoL: Shared builds: Streamline the process, configuration, and dependencies across all builds for easy reproducibility. Saves build minutes: Team members can access existing images instead of rebuilding from the source. Version control: Easily duplicate previous builds with image tags, allowing teams to trace and pinpoint bugs. Building a Docker Image Using GitHub Actions to automate Docker builds will ensure you keep your build config consistent. This only requires substituting your existing build command(s) into the workflow YAML. In this workflow, the image is named after your GitHub repo using the GITHUB_REPOSITORY environment variable as {{ github.repository }. YAML name: Build Docker image on: push: branches: - main jobs: build: runs-on: ubuntu-latest steps: - name: Checkout code uses: actions/checkout@v3 - name: Build and tag image COMMIT_SHA=$(echo $GITHUB_SHA | cut -c1-7) run: docker build -t ${{ github.repository }:$COMMIT_SHA -f path/to/Dockerfile . Versioning Your Docker Image Tags Never rely on latest tags to version your images. We recommend choosing one of these two versioning conventions when tagging your images: using the GitHub commit hash or following the SemVer spec. Using the GitHub Hash GitHub Actions sets default environment variables that you can access within your workflow. Among these is GITHUB_SHA, which is the commit hash that triggered the workflow. This is a valuable versioning approach because you can trace each image back to its corresponding commit. In general, this convention uses the hash's first seven digits. Here's how we can access the variable and extract these digits: YAML - name: Build and tag image run: | COMMIT_SHA=$(echo $GITHUB_SHA | cut -c1-7) docker build -t ${{ github.repository }:$COMMIT_SHA -f path/to/Dockerfile . Semantic Versioning When using version numbers, it is best practice to follow the SemVer spec. This way, you can increment your version numbers following a consistent structure when releasing new updates and patches. Assuming you store your app’s version in a root file version.txt, you can extract the version number from this file and tag the image in two separate actions: YAML - name: Get version run: | export VERSION=$(cat version.txt) echo "Version: $VERSION" - name: Build and tag image run: docker build -t ${{ github.repository }:$VERSION -f path/to/Dockerfile . Pushing a Docker Image to a Container Registry You can easily build, tag, and push your Docker image to your private container registry of choice within only two or three actions. Here’s a high-level overview of what you’ll be doing: Manually set your authentication token or access credential(s) as repository secrets. Use the echo command to pipe credentials to standard input for registry authentication. This way, no action is required on the user’s part. Populate the workflow with your custom build command. Remember to follow your registry’s tagging convention. Add the push command. You can find the proper syntax in your registry's docs. You may prefer to split each item into its own action for better traceability on a workflow failure. Pushing to GHCR Step 1: Setting up GHCR Credentials In order to access the GitHub API, you’ll want to generate a personal access token. You can do this by going to Settings → Developer → New personal access token (classic) from where you’ll generate a custom token to allow package access. Make sure to select write:packages in the Select scopes section. Store this token as a repository secret called GHCR_TOKEN. Step 2: Action Recipe To Push to GHCR You can add the following actions to your GitHub Actions workflow. This code will log into GHCR, build, and push your Docker image. YAML - name: Log in to ghcr.io run: echo "${{ secrets.GHCR_TOKEN }" | docker login ghcr.io -u ${{ github.actor } --password-stdin - name: Build and tag image run: | COMMIT_SHA=$(echo $GITHUB_SHA | cut -c1-7) docker build -t ghcr.io/${{ github.repository_owner }/${{ github.repository }:$COMMIT_SHA -f path/to/Dockerfile . - name: Push image to GHCR run: docker push ghcr.io/${{ github.repository_owner }/${{ github.repository }:$COMMIT_SHA Pushing to Docker Hub Step 1: Store Your Docker Hub Credentials Using your Docker Hub login credentials, set the following repository secrets: DOCKERHUB_USERNAME DOCKERHUB_PASSWORD Note: You'll need to set up a repo on Docker Hub before you can push your image. Step 2: Action Recipe To Push to Docker Hub Adding these actions to your workflow will automate logging in to Docker Hub, building and tagging an image, and pushing it. YAML - name: Log in to Docker Hub run: | echo ${{ secrets.DOCKERHUB_PASSWORD } | docker login -u ${{ secrets.DOCKERHUB_USERNAME } --password-stdin - name: Build and tag image run: | COMMIT_SHA=$(echo $GITHUB_SHA | cut -c1-7) docker build -t ${{ secrets.DOCKERHUB_USERNAME }/${{ github.repository }:$COMMIT_SHA -f path/to/Dockerfile . - name: Push image to Docker Hub run: docker push ${{ secrets.DOCKERHUB_USERNAME }/${{ github.repository }:$COMMIT_SHA Pushing to Harbor Step 1: Store Your Harbor Access Credentials Create two new repository secrets to store the following info: HARBOR_CREDENTIALS: Your Harbor username and password formatted as username:password HARBOR_REGISTRY_URL: The URL corresponding to your personal Harbor registry Note: You'll need to create a Harbor project before you can push an image to Harbor. Step 2: Action Recipe To Push to Harbor The actions below will authenticate into Harbor, build and tag an image using Harbor-specific conventions, and push the image. YAML - name: Log in to Harbor run: | echo ${{ secrets.HARBOR_CREDENTIALS } | base64 --decode | docker login -u $(cut -d ':' -f1 <<< "${{ secrets.HARBOR_CREDENTIALS }") --password-stdin ${{ secrets.HARBOR_REGISTRY_URL } - name: Build and tag image run: | COMMIT_SHA=$(echo $GITHUB_SHA | cut -c1-7) docker build -t ${{ secrets.HARBOR_REGISTRY_URL }/project-name/${{ github.repository }:$COMMIT_SHA -f path/to/Dockerfile . - name: Push image to Harbor run: docker push ${{ secrets.HARBOR_REGISTRY_URL }/project-name/${{ github.repository }:$COMMIT_SHA Thanks for Reading! I hope you enjoyed today's featured recipes. I'm looking forward to sharing more easy ways you can automate repetitive tasks and chores with GitHub Actions.
Continuous Integration (CI) has become an integral part of modern software development practices. CI servers automate the process of building, testing, and integrating code changes, enabling development teams to deliver high-quality software with efficiency and confidence. In this article, we will explore several popular Continuous Integration servers, their features, and how they facilitate seamless integration and collaboration in software development workflows. What Is Continuous Integration? Continuous Integration (CI) is a software development practice that involves regularly integrating code changes from multiple developers into a shared repository. The main goal of CI is to catch integration issues and bugs early in the development cycle, ensuring that the codebase remains stable and functional. It emphasizes the importance of frequent and automated builds, tests, and code integration. In a CI workflow, developers frequently daily merge their changes into a central code repository. Every merge starts an automatic build process that compiles the code, runs tests automatically, and looks for any build or test failures. As soon as possible, this process aids in the identification of integration problems, conflicts, and errors. The management of the build process by a specialized CI server or platform is essential to CI. This server automatically starts the build and test procedures when changes are detected. It continuously checks the repository for code modifications. It gives developers feedback on the effectiveness of their changes, such as whether the build was successful or whether any tests were unsuccessful. This quick feedback loop enables programmers to address problems right away, cutting down on the time and work needed for bug fixing. Principles and Benefits of CI Key principles and benefits of Continuous Integration include: Automated Builds: CI emphasizes the automation of build processes to ensure consistent and reproducible builds. This reduces the risk of errors introduced by manual builds and helps catch issues early. Automated Testing: CI promotes the use of automated testing frameworks to run tests on the integrated code. This includes unit tests, integration tests, and other forms of automated verification. Automated testing ensures that the codebase remains functional and meets the expected requirements. Early Bug Detection: By integrating code frequently and running automated tests, CI helps identify integration issues, conflicts, and bugs at an early stage. This prevents the accumulation of issues and reduces the time and effort required for bug fixing. Continuous Feedback: CI provides developers with rapid feedback on the status of their changes, including the outcome of builds and tests. This enables them to quickly address failures or issues, fostering a collaborative and responsive development environment. Collaboration and Integration: CI encourages a collaborative approach to software development, where developers regularly integrate their changes into a shared codebase. This promotes better communication, reduces conflicts, and facilitates smoother teamwork. Continuous Delivery and Deployment: CI is often a precursor to Continuous Delivery and Deployment practices. By ensuring a stable and tested codebase, CI sets the foundation for automated release processes, allowing for frequent and reliable software deployments. Jenkins Jenkins is a widely adopted open-source Continuous Integration server that offers a flexible and extensible platform for automating the software development lifecycle. With its extensive plugin ecosystem, Jenkins supports a wide range of programming languages, build systems and version control systems. Its key features include continuous integration, continuous delivery, and distributed build capabilities. Jenkins allows developers to define and automate complex build pipelines, run tests, generate reports, and trigger deployments. Its web-based interface and user-friendly configuration make it accessible to both beginners and experienced developers. Jenkins enjoys a large and active community, providing continuous support and regular updates. Travis CI Travis CI is a popular cloud-based CI server primarily used for testing and deploying code hosted on GitHub repositories. It offers seamless integration with Git, making it effortless to trigger builds whenever changes are pushed to a repository. Travis CI supports various programming languages and provides a simple YAML-based configuration file to define build processes. It offers a range of build environments and allows parallel builds, enabling faster feedback loops. Travis CI also integrates with popular cloud platforms and deployment services, facilitating streamlined deployment pipelines. Its user-friendly interface and built-in pull request testing make it a preferred choice for open-source projects. CircleCI CircleCI is a cloud-based CI/CD platform that simplifies the process of automating builds, tests, and deployments. It supports a wide range of programming languages, build systems and cloud platforms. CircleCI offers a highly customizable environment with extensive configuration options. It allows developers to define complex build pipelines, run tests in parallel, and deploy applications to various environments. CircleCI provides seamless integration with popular version control systems, including GitHub and Bitbucket. Its cloud-based infrastructure eliminates the need for maintaining dedicated build servers, enabling faster scaling and reducing infrastructure management overhead. GitLab CI/CD GitLab CI/CD is an integrated CI/CD platform provided by GitLab, a popular web-based Git repository management solution. It offers built-in CI/CD capabilities within the same platform, simplifying the setup and configuration process. GitLab CI/CD leverages YAML-based configuration files, known as .gitlab-ci.yml, to define CI/CD pipelines. It supports parallel and distributed builds, enabling efficient utilization of resources. GitLab CI/CD provides a comprehensive set of features, including built-in code quality analysis, container-based deployments, and multi-project pipeline visualization. Its seamless integration with GitLab's version control system makes it an attractive choice for organizations using GitLab as their primary code repository. TeamCity TeamCity is a powerful CI server developed by JetBrains. It offers extensive features for automating the build, test, and deployment processes. TeamCity supports a wide range of programming languages, build systems and version control systems. It provides a user-friendly web interface and supports complex build pipelines with customizable workflows. TeamCity offers distributed builds, allowing parallel and concurrent testing on multiple agents. It provides comprehensive test reporting, code coverage analysis, and integration with popular development tools. TeamCity also offers integrations with cloud platforms, issue trackers, and other external services. Its commercial license allows for scaling across large enterprise environments. Bamboo Bamboo, developed by Atlassian, is a commercial CI server that offers a robust set of features for automating the build, test, and deployment processes. Bamboo integrates seamlessly with other Atlassian products, such as Jira and Bitbucket, creating a unified development ecosystem. It provides a user-friendly interface and supports the creation of complex build pipelines through its intuitive configuration. Bamboo offers parallel and distributed builds, allowing for efficient resource utilization. It also provides comprehensive test reporting, code coverage analysis, and integration with various testing frameworks. Bamboo offers deployment capabilities to multiple environments, enabling streamlined release management. Its seamless integration with Atlassian's suite of tools makes it a preferred choice for organizations already utilizing other Atlassian products. Conclusion The build, test, and deployment workflows in contemporary software development workflows are greatly aided by continuous integration servers. Jenkins, Travis CI, Circle CI, GitLab CI/CD, TeamCity, and Bamboo are just a few of the tools mentioned in this article that offer a variety of features to speed up development, increase code quality, and foster collaboration. Each CI server has its strengths and target audience, and the choice depends on project requirements, preferred programming languages, integration with other tools, and scalability needs. Evaluating the specific needs of your development team and project is essential to select the most suitable CI server. By integrating a robust CI server into the development process, teams can accelerate software delivery, reduce manual errors, and foster a culture of continuous improvement in software development practices. Jenkins is unique in that it has a large ecosystem of plugins and is highly customizable, making it ideal for teams with complex build pipelines and specialized requirements. A complete solution for end-to-end DevOps workflows is provided by the close integration that GitLab CI/CD offers with GitLab. JetBrains' TeamCity is a popular option for teams looking for an easy-to-use CI server because it combines powerful features with user-friendliness. For open-source projects and lone developers, Travis CI is practical due to its simplicity and seamless integration with GitHub repositories. CircleCI is a versatile option for teams of all sizes thanks to its scalable cloud-based infrastructure. It is crucial to take into account aspects like compatibility with current tools and version control systems, support for programming languages, simplicity of installation and configuration, scalability, and user community when selecting a continuous integration server. The best CI server will be found by weighing these considerations in accordance with the needs of the team and the project's requirements. Teams can gain advantages by integrating a Continuous Integration server into the software development process, including quicker feedback on code quality, fewer integration problems, improved teamwork, and quicker delivery of software updates. These servers automate necessary tasks so that teams can concentrate on writing high-quality code and providing value to end users. Finally, it should be noted that Continuous Integration servers are essential to contemporary software development procedures. The wide range of options available includes Jenkins, GitLab CI/CD, TeamCity, Travis CI, and CircleCI, to name just a few. Each server has special features and advantages that help development teams deliver high-quality software quickly while streamlining their processes and fostering collaboration. To ensure the best integration and success in the software development process, it is essential to carefully consider the unique requirements and objectives of the project when selecting a Continuous Integration server.
Shift-left is an approach to software development and operations that emphasizes testing, monitoring, and automation earlier in the software development lifecycle. The goal of the shift-left approach is to prevent problems before they arise by catching them early and addressing them quickly. When you identify a scalability issue or a bug early, it is quicker and more cost-effective to resolve it. Moving inefficient code to cloud containers can be costly, as it may activate auto-scaling and increase your monthly bill. Furthermore, you will be in a state of emergency until you can identify, isolate, and fix the issue. The Problem Statement I would like to demonstrate to you a case where we managed to avert a potential issue with an application that could have caused a major issue in a production environment. I was reviewing the performance report of the UAT infrastructure following the recent application change. It was a Spring Boot microservice with MariaDB as the backend, running behind Apache reverse proxy and AWS application load balancer. The new feature was successfully integrated, and all UAT test cases are passed. However, I noticed the performance charts in the MariaDB performance dashboard deviated from pre-deployment patterns. This is the timeline of the events. On August 6th at 14:13, The application was restarted with a new Spring Boot jar file containing an embedded Tomcat. Application restarts after migration At 14:52, the query processing rate for MariaDB increased from 0.1 to 88 queries per second and then to 301 queries per second. Increase in query rate Additionally, the system CPU was elevated from 1% to 6%. Raise in CPU utilization Finally, the JVM time spent on the G1 Young Generation Garbage Collection increased from 0% to 0.1% and remained at that level. Increase in GC time on JVM The application, in its UAT phase, is abnormally issuing 300 queries/sec, which is far beyond what it was designed to do. The new feature has caused an increase in database connection, which is why the increase in queries is so drastic. However, the monitoring dashboard showed that the problematic measures were normal before the new version was deployed. The Resolution It is a Spring Boot application that uses JPA to query a MariaDB. The application is designed to run on two containers for minimal load but is expected to scale up to ten. Web - app - db topology If a single container can generate 300 queries per second, can it handle 3000 queries per second if all ten containers are operational? Can the database have enough connections to meet the needs of the other parts of the application? We had no other choice but to go back to the developer's table to inspect the changes in Git. The new change will take a few records from a table and process them. This is what we observed in the service class. List<X> findAll = this.xRepository.findAll(); No, using the findAll() method without pagination in Spring's CrudRepository is not efficient. Pagination helps to reduce the amount of time it takes to retrieve data from the database by limiting the amount of data fetched. This is what our primary RDBMS education taught us. Additionally, pagination helps to keep memory usage low to prevent the application from crashing due to an overload of data, as well as reducing the Garbage Collection effort of Java Virtual Machine, which was mentioned in the problem statement above. This test was conducted using only 2,000 records in one container. If this code were to move to production, where there are around 200,000 records in up to 10 containers, it could have caused the team a lot of stress and worry that day. The application was rebuilt with the addition of a WHERE clause to the method. List<X> findAll = this.xRepository.findAllByY(Y); The normal functioning was restored. The number of queries per second was decreased from 300 to 30, and the amount of effort put into garbage collection returned to its original level. Additionally, the system's CPU usage decreased. Query rate becomes normal Learning and Summary Anyone who works in Site Reliability Engineering (SRE) will appreciate the significance of this discovery. We were able to act upon it without having to raise a Severity 1 flag. If this flawed package had been deployed in production, it could have triggered the customer's auto-scaling threshold, resulting in new containers being launched even without an additional user load. There are three main takeaways from this story. Firstly, it is best practice to turn on an observability solution from the beginning, as it can provide a history of events that can be used to identify potential issues. Without this history, I might not have taken a 0.1% Garbage Collection percentage and 6% CPU consumption seriously, and the code could have been released into production with disastrous consequences. Expanding the scope of the monitoring solution to UAT servers helped the team to identify potential root causes and prevent problems before they occur. Secondly, performance-related test cases should exist in the testing process, and these should be reviewed by someone with experience in observability. This will ensure the functionality of the code is tested, as well as its performance. Thirdly, cloud-native performance tracking techniques are good for receiving alerts about high utilization, availability, etc. To achieve observability, you may need to have the right tools and expertise in place. Happy Coding!
DevOps is an advanced framework that connects development and operations, and it is a vital change in the software development industry. DevOps's primary objective is to improve automation, collaboration, and continuous delivery. It's changing the way businesses design, deploy, and manage software solutions. This post will guide you about the DevOps essentials, core principles, and how they allow collaboration, automation, and continuous delivery. Businesses can achieve new levels of productivity, efficiency, and creativity in their software development processes by using DevOps. What Is DevOps? DevOps is a modern technique implemented in software development whose fundamental function is to link development and operations teams together to enhance communication, automation, and collaboration across the whole software development lifecycle. The purpose of DevOps is to narrow the gap between software development, quality assurance, and IT operations inside a software developer company by creating a culture of shared responsibility and continuous enhancement. How DevOps Enhances Collaboration, Automation, and Continuous Delivery DevOps puts teams, procedures, and tools together to improve collaboration, automation, and continuous delivery. Let's explore all of them separately to have a better idea. Collaboration DevOps breaks down common barriers between development and operations teams, promoting a culture of collaboration and shared responsibility. It improves team members' closer alignment and communication, resulting in increased productivity, faster feedback loops, and better decision-making. Collaboration in DevOps is increased through the following practices: Cross-Functional Teams DevOps promotes the production of cross-functional teams, where everyone from different fields collaborates closely. This allows knowledge sharing, reduces transfers, and boosts a comprehensive approach and understanding of the entire software development lifecycle. Continuous Feedback DevOps develops a feedback cycle that encourages regular communication and learning between teams. Automated testing, monitoring systems, or direct engagement among team members can all provide daily feedback. This enables teams to identify and fix issues quickly, creating a culture of continuous improvement. Shared Tools and Processes DevOps enhances the use of shared tools and uniform processes. Collaborative tools like version control systems, issue trackers, and communication platforms allow smooth information sharing and collaboration. Standardized procedures ensure consistency, reduce errors, and enhance overall efficiency. Automation Automation is an essential feature of DevOps that allows companies to simplify processes, reduce manual errors, and enhance software delivery. Teams can concentrate on higher-value activities by automating repetitive and time-consuming tasks. DevOps accomplishes automation through the following key areas: Infrastructure as Code (IaC) DevOps uses Infrastructure as Code technologies such as Terraform and Ansible to automatically create and manage infrastructure resources. This strategy promises uniform infrastructure provisioning, configuration, and scaling, minimizing the chance of mistakes and increasing scalability. Continuous Integration and Continuous Deployment (CI/CD) DevOps supports CI/CD pipelines to automate the build, integration, testing, and deployment of software changes. Automated procedures are initiated with every code change, allowing quick feedback, early problem discovery and guaranteeing that only high-quality code enters production environments. Testing and Quality Assurance DevOps integrates automated testing frameworks into the CI/CD pipeline. These frameworks include unit tests, integration tests, and regression tests that validate software changes continuously. Automated testing reduces manual effort, accelerates testing cycles, and increases whole product quality. Continuous Delivery Continuous Delivery is a core principle of DevOps that aims to deliver software changes to production environments frequently, reliably, and with minimum manual involvement. It allows companies to respond quickly to market demands and customer feedback. Fundamental elements of continuous delivery in DevOps include: Version Control DevOps highlights the use of version control systems like Git to manage code databases. Version control ensures that changes are tracked, allows collaboration, and provides a safety net for going backward to previous states if necessary. Build Automation DevOps automates the build process to ensure consistent and repeatable results. Automated builds package code objects efficiently, minimizing human error and maintaining a reliable basis for deployment. Deployment Automation DevOps automates the deployment process, allowing faster and more reliable releases. Techniques such as blue-green deployments and canary releases improve controlled implementations, reduce downtime, and eliminate risks associated with software deployments. Advantages of DevOps We will look into the numerous benefits that DevOps offers to organizations, ranging from increased efficiency and agility to improved quality and customer satisfaction. Accelerated Software Delivery One of the primary advantages of adopting DevOps is the ability to accelerate software delivery cycles. Organizations can automate workflows, establish continuous integration and delivery strategies, and streamline processes by reducing barriers between development and operations teams. This results in faster deployment of features and bug fixes, enabling businesses to deliver value to their customers quickly. Enhanced Collaboration and Communication DevOps promotes a culture of collaboration and effective communication among cross-functional teams. DevOps reduces information gaps and promotes common objectives and responsibilities by bringing together developers, operations engineers, testers, and other stakeholders. Collaboration tools, shared metrics, and regular meetings ensure that teams work cohesively towards common objectives, leading to higher productivity and better outcomes. Improved Quality and Stability DevOps emphasizes automated testing, continuous monitoring, and rapid feedback loops, which enhance software quality and stability. By automating testing processes, organizations can detect issues early in the development cycle, allowing for quicker resolution and preventing the accumulation of technical debt. Continuous monitoring enables proactive identification and remediation of performance bottlenecks and other issues, leading to more stable and reliable software deployments. Increased Efficiency and Cost Savings By automating repetitive tasks, leveraging infrastructure-as-code, and adopting cloud technologies, DevOps enables organizations to achieve higher levels of efficiency and cost savings. Automated provisioning and configuration management reduce manual errors, shorten deployment times, and minimize downtime. Cloud infrastructure allows for flexible and scalable resource allocation, optimizing costs and eliminating the need for extensive on-premises hardware investments. Agility and Adaptability In today's rapidly evolving business landscape, agility is paramount. DevOps enables organizations to respond swiftly to changing market conditions and customer needs. Continuous integration and delivery practices allow for rapid iterations and feedback incorporation, enabling businesses to iterate and adapt their software quickly. This agility gives organizations a competitive edge, allowing them to seize new opportunities and deliver value faster. Improved Customer Satisfaction DevOps places customer satisfaction at the core of the software development process. By continuously delivering features, bug fixes, and enhancements, organizations can respond to customer feedback and address their needs promptly. With faster response times, higher software quality, and greater stability, organizations can create a positive customer experience, leading to increased customer satisfaction and loyalty. Cultural Transformation DevOps is not just a set of tools and practices; it is a cultural shift that promotes collaboration, learning, and innovation. Organizations may encourage employee empowerment and engagement by developing a culture of trust, shared responsibility, and constant improvement. This cultural transformation promotes knowledge sharing, experimentation, and a growth mindset, driving innovation and paving the way for future success. Conclusion DevOps has shown to be an inspiration for software development collaboration, automation, and continuous delivery. It encourages teams to work together smoothly, boosting efficiency and enhancing communication by breaking down barriers, promoting a culture of collaboration, and using shared tools and procedures. Automation is critical for optimizing processes, reducing manual errors, and allowing quick software delivery via continuous integration, deployment, and testing. Finally, DevOps helps businesses to accomplish continuous delivery, reducing time to market and assuring high-quality software releases. Implementing DevOps principles and techniques allows companies to stay ahead in the fast-paced, competitive technology world by providing value to consumers through speed, reliability, and innovation.
Machine learning (ML) has seen explosive growth in recent years, leading to increased demand for robust, scalable, and efficient deployment methods. Traditional approaches often need help operationalizing ML models due to factors like discrepancies between training and serving environments or the difficulties in scaling up. This article proposes a technique using Docker, an open-source platform designed to automate application deployment, scaling, and management, as a solution to these challenges. The proposed methodology encapsulates the ML models and their environment into a standardized Docker container unit. Docker containers offer numerous benefits, including consistency across development and production environments, ease of scaling, and simplicity in deployment. The following sections present an in-depth exploration of Docker, its role in ML model deployment, and a practical demonstration of deploying an ML model using Docker, from the creation of a Dockerfile to the scaling of the model with Docker Swarm, all exemplified by relevant code snippets. Furthermore, the integration of Docker in a Continuous Integration/Continuous Deployment (CI/CD) pipeline is presented, culminating with the conclusion and best practices for efficient ML model deployment using Docker. What Is Docker? As a platform, Docker automates software application deployment, scaling, and operation within lightweight, portable containers. The fundamental underpinnings of Docker revolve around the concept of 'containerization.' This virtualization approach allows software and its entire runtime environment to be packaged into a standardized unit for software development. A Docker container encapsulates everything an application needs to run (including libraries, system tools, code, and runtime) and ensures that it behaves uniformly across different computing environments. This facilitates the process of building, testing, and deploying applications quickly and reliably, making Docker a crucial tool for software development and operations (DevOps). When it comes to machine learning applications, Docker brings forth several advantages. Docker's containerized nature ensures consistency between ML models' training and serving environments, mitigating the risk of encountering discrepancies due to environmental differences. Docker also simplifies the scaling process, allowing multiple instances of an ML model to be easily deployed across numerous servers. These features have the potential to significantly streamline the deployment of ML models and reduce associated operational complexities. Why Dockerize Machine Learning Applications? In the context of machine learning applications, Docker offers numerous benefits, each contributing significantly to operational efficiency and model performance. Firstly, the consistent environment provided by Docker containers ensures minimal discrepancies between the development, testing, and production stages. This consistency eliminates the infamous "it works on my machine" problem, making it a prime choice for deploying ML models, which are particularly sensitive to changes in their operating environment. Secondly, Docker excels in facilitating scalability. Machine learning applications often necessitate running multiple instances of the same model for handling large volumes of data or high request rates. Docker enables horizontal scaling by allowing multiple container instances to be deployed quickly and efficiently, making it an effective solution for scaling ML models. Finally, Docker containers run in isolation, meaning they have their runtime environment, including system libraries and configuration files. This isolation provides an additional layer of security, ensuring that each ML model runs in a controlled and secure environment. The consistency, scalability, and isolation provided by Docker make it an attractive platform for deploying machine learning applications. Setting up Docker for Machine Learning This section focuses on the initial setup required for utilizing Docker with machine learning applications. The installation process of Docker varies slightly depending on the operating system in use. For Linux distributions, Docker is typically installed via the command-line interface, whereas for Windows and MacOS, a version of Docker Desktop is available. In each case, the Docker website provides detailed installation instructions that are straightforward to follow. The installation succeeded by pulling a Docker image from Docker Hub, a cloud-based registry service allowing developers to share applications or libraries. As an illustration, one can pull the latest Python image for use in machine learning applications using the command: Shell docker pull python:3.8-slim-buster Subsequently, running the Docker container from the pulled image involves the docker run command. For example, if an interactive Python shell is desired, the following command can be used: Shell docker run -it python:3.8-slim-buster /bin/bash This command initiates a Docker container with an interactive terminal (-it) and provides a shell (/bin/bash) inside the Python container. By following this process, Docker is effectively set up to assist in deploying machine learning models. Creating a Dockerfile for a Simple ML Model At the heart of Docker's operational simplicity is the Dockerfile, a text document that contains all the commands required to assemble a Docker image. Users can automate the image creation process by executing the Dockerfile through the Docker command line. A Dockerfile comprises a set of instructions and arguments laid out in successive lines. Instructions are Docker commands like FROM (specifies the base image), RUN (executes a command), COPY (copies files from the host to the Docker image), and CMD (provides defaults for executing the container). Consider a simple machine learning model built using Scikit-learn's Linear Regression algorithm as a practical illustration. The Dockerfile for such an application could look like this: Dockerfile # Use an official Python runtime as a parent image FROM python:3.8-slim-buster # Set the working directory in the container to /app WORKDIR /app # Copy the current directory contents into the container at /app ADD . /app # Install any needed packages specified in requirements.txt RUN pip install --no-cache-dir -r requirements.txt # Make port 80 available to the world outside this container EXPOSE 80 # Run app.py when the container launches CMD ["python", "app.py"] The requirements.txt file mentioned in this Dockerfile lists all the Python dependencies of the machine learning model, such as Scikit-learn, Pandas, and Flask. On the other hand, the app.py script contains the code that loads the trained model and serves it as a web application. By defining the configuration and dependencies in this Dockerfile, an image can be created that houses the machine learning model and the runtime environment required for its execution, facilitating consistent deployment. Building and Testing the Docker Image Upon successful Dockerfile creation, the subsequent phase involves constructing the Docker image. The Docker image is constructed by executing the docker build command, followed by the directory that contains the Docker file. The -t flag tags the image with a specified name. An instance of such a command would be: Shell docker build -t ml_model_image:1.0 Here, ml_model_image:1.0 is the name (and version) assigned to the image, while '.' indicates that the Dockerfile resides in the current directory. After constructing the Docker image, the following task involves initiating a Docker container from this image, thereby allowing the functionality of the machine learning model to be tested. The docker run command aids in this endeavor: Shell docker run -p 4000:80 ml_model_image:1.0 In this command, the -p flag maps the host's port 4000 to the container's port 80 (as defined in the Dockerfile). Therefore, the machine learning model is accessible via port 4000 of the host machine. Testing the model requires sending a request to the endpoint exposed by the Flask application within the Docker container. For instance, if the model provides a prediction based on data sent via a POST request, the curl command can facilitate this: Shell curl -d '{"data":[1, 2, 3, 4]}' -H 'Content-Type: application/json' http://localhost:4000/predict The proposed method ensures a seamless flow from Dockerfile creation to testing the ML model within a Docker container. Deploying the ML Model With Docker Deployment of machine learning models typically involves exposing the model as a service that can be accessed over the internet. A standard method for achieving this is by serving the model as a REST API using a web framework such as Flask. Consider an example where a Flask application encapsulates a machine learning model. The following Python script illustrates how the model could be exposed as a REST API endpoint: Python from flask import Flask, request from sklearn.externals import joblib app = Flask(__name__) model = joblib.load('model.pkl') @app.route('/predict', methods=['POST']) def predict(): data = request.get_json(force=True) prediction = model.predict([data['features']]) return {'prediction': prediction.tolist()} if __name__ == '__main__': app.run(host='0.0.0.0', port=80) In this example, the Flask application loads a pre-trained Scikit-learn model (saved as model.pkl) and defines a single API endpoint /predict. When a POST request is sent to this endpoint with a JSON object that includes an array of features, the model makes a prediction and returns it as a response. Once the ML model is deployed and running within the Docker container, it can be communicated using HTTP requests. For instance, using the curl command, a POST request can be sent to the model with an array of features, and it will respond with a prediction: Shell curl -d '{"features":[1, 2, 3, 4]}' -H 'Content-Type: application/json' http://localhost:4000/predict This practical example demonstrates how Docker can facilitate deploying machine learning models as scalable and accessible services. Scaling the ML Model With Docker Swarm As machine learning applications grow in scope and user base, the ability to scale becomes increasingly paramount. Docker Swarm provides a native clustering and orchestration solution for Docker, allowing multiple Docker hosts to be turned into a single virtual host. Docker Swarm can thus be employed to manage and scale deployed machine learning models across multiple machines. Inaugurating a Docker Swarm is a straightforward process, commenced by executing the 'docker swarm init' command. This command initializes the current machine as a Docker Swarm manager: Shell docker swarm init --advertise-addr $(hostname -i) In this command, the --advertise-addr flag specifies the address at which the Swarm manager can be reached by the worker nodes. The hostname -i command retrieves the IP address of the current machine. Following the initialization of the Swarm, the machine learning model can be deployed across the Swarm using a Docker service. The service is created with the docker service create command, where flags like --replicas can dictate the number of container instances to run: Shell docker service create --replicas 3 -p 4000:80 --name ml_service ml_model_image:1.0 In this command, --replicas 3 ensures three instances of the container are running across the Swarm, -p 4000:80 maps port 4000 of the Swarm to port 80 of the container, and --name ml_service assigns the service a name. Thus, the deployed machine learning model is effectively scaled across multiple Docker hosts by implementing Docker Swarm, thereby bolstering its availability and performance. Continuous Integration/Continuous Deployment (CI/CD) With Docker Continuous Integration/Continuous Deployment (CI/CD) is a vital aspect of modern software development, promoting automated testing and deployment to ensure consistency and speed in software release cycles. Docker's portable nature lends itself well to CI/CD pipelines, as Docker images can be built, tested, and deployed across various stages in a pipeline. An example of integrating Docker into a CI/CD pipeline can be illustrated using a Jenkins pipeline. The pipeline is defined in a Jenkinsfile, which might look like this: Groovy pipeline { agent any stages { stage('Build') { steps { script { sh 'docker build -t ml_model_image:1.0 .' } } } stage('Test') { steps { script { sh 'docker run -p 4000:80 ml_model_image:1.0' sh 'curl -d '{"features":[1, 2, 3, 4]}' -H 'Content-Type: application/json' http://localhost:4000/predict' } } } stage('Deploy') { steps { script { sh 'docker service create --replicas 3 -p 4000:80 --name ml_service ml_model_image:1.0' } } } } } In this Jenkinsfile, the Build stage builds the Docker image, the Test stage runs the Docker container and sends a request to the machine learning model to verify its functionality, and the Deploy stage creates a Docker service and scales it across the Docker Swarm. Therefore, with Docker, CI/CD pipelines can achieve reliable and efficient deployment of machine learning models. Conclusion and Best Practices Wrapping up, this article underscores the efficacy of Docker in streamlining the deployment of machine learning models. The ability to encapsulate the model and its dependencies in an isolated, consistent, and lightweight environment makes Docker a powerful tool for machine learning practitioners. Further enhancing its value is Docker's potential to scale machine learning models across multiple machines through Docker Swarm and its seamless integration with CI/CD pipelines. However, to extract the most value from Docker, certain best practices are recommended: Minimize Docker image size: Smaller images use less disk space, reduce build times, and speed up deployment. This can be achieved by using smaller base images, removing unnecessary dependencies, and efficiently utilizing Docker's layer caching. Use .dockerignore: Similar to .gitignore in Git, .dockerignore prevents unnecessary files from being included in the Docker image, reducing its size. Ensure that Dockerfiles are reproducible: Using specific versions of base images and dependencies can prevent unexpected changes when building Docker images in the future. By adhering to these guidelines and fully embracing the capabilities of Docker, it becomes significantly more feasible to navigate the complexity of deploying machine learning models, thereby accelerating the path from development to production. References Docker Official Documentation. Docker, Inc. Docker for Machine Learning. O'Reilly Media, Inc. Continuous Integration with Docker. Jenkins Documentation. Scikit-learn: Machine Learning in Python. Scikit-learn Developers. Kalade, S., Crockett, L. H., & Stewart, R. (2018). Using Sequence to Sequence Learning for Digital BPSK and QPSK Demodulation. Blog — Page 3 — Liran Tal. Introduction to the Dockerfile Part II | by Hakim | Medium. Spring Boot 2.2 with Java 13 CRUD REST API Tutorial: Using JPA Hibernate & MySQL | Techiediaries.
In the fast-paced world of software development, Continuous Integration and Continuous Deployment (CI/CD) have become indispensable practices in DevOps services. CI/CD enables teams to deliver software updates more frequently, efficiently, and with higher quality. To achieve these goals, developers rely on a range of cutting-edge tools and technologies that streamline their workflows and automate various stages of the development process. In this blog post, we will explore the key tools and technologies that contribute to an effective CI/CD pipeline in DevOps services, ensuring smooth and reliable software delivery. Version Control System (VCS) A solid foundation for any CI/CD process in DevOps services is a robust version control system. Git is the most widely used VCS, offering powerful branching and merging capabilities. Developers can collaborate seamlessly, track changes, and resolve conflicts efficiently, ensuring that code remains stable and secure. GitHub, GitLab, and Bitbucket are popular platforms that integrate Git and provide additional features like issue tracking, code reviews, and project management. Continuous Integration Tools Continuous Integration is a fundamental step in the CI/CD pipeline for DevOps services, where developers automatically integrate code changes into a shared repository multiple times a day. CI tools like Jenkins, Travis CI, CircleCI, and GitLab CI/CD help automate the build, test, and deployment process. These tools can run unit tests, check for code quality, and package the application for deployment, ensuring that each code change is verified and validated. Automated Testing Frameworks Automated testing is a crucial aspect of CI/CD in DevOps services to guarantee the stability and functionality of the software. Testing frameworks like Selenium, JUnit, Pytest, and Jest enable developers to create and execute tests automatically. By running these tests after every code change, developers can identify and fix issues early in the development cycle, improving the overall quality of the software. Containerization Technologies Containerization has revolutionized the way applications are deployed and managed in DevOps services. Docker is a popular containerization tool that allows developers to package applications and their dependencies into lightweight containers. Containers provide consistency across different environments, making it easier to test and deploy applications reliably. Kubernetes, an orchestration tool, is often used in DevOps services to automate container deployment, scaling, and management in production environments. Configuration Management Tools Configuration management tools such as Ansible, Puppet, and Chef automate the setup and management of infrastructure and application configurations in DevOps services. By employing these tools, developers can ensure that all environments, from development to production, are consistent and aligned with the desired state. This reduces the risk of errors caused by environment discrepancies. Continuous Deployment Tools Continuous Deployment takes the CI/CD process one step further in DevOps services by automatically deploying code changes to production environments once they pass all tests. Tools like Spinnaker and AWS CodeDeploy facilitate this process by managing deployments and ensuring zero-downtime updates. This automation minimizes manual intervention, reduces deployment time, and allows for faster feature delivery. Monitoring and Logging Tools After deploying code changes in DevOps services, it is essential to monitor the application's performance and track potential issues. Monitoring tools like Prometheus and Grafana provide real-time insights into the application's health, resource usage, and response times. Additionally, logging tools such as ELK (Elasticsearch, Logstash, Kibana) or Splunk help developers track and analyze logs, making it easier to identify and troubleshoot issues in production environments. Conclusion A well-implemented CI/CD pipeline can significantly enhance a development team's productivity and product quality in DevOps services. By leveraging the right set of tools and technologies, developers can automate repetitive tasks, streamline code delivery, and improve collaboration across the team. Version control systems, continuous integration tools, automated testing frameworks, containerization technologies, configuration management tools, continuous deployment tools, and monitoring and logging tools form the backbone of an effective CI/CD pipeline in DevOps services. As technology continues to evolve, developers in DevOps services should stay updated with the latest advancements in CI/CD practices and incorporate new tools that align with their project requirements. Embracing these tools and technologies not only fosters a culture of continuous improvement but also ensures that software delivery is efficient, reliable, and scalable in today's fast-paced software development landscape.
In this post, I will provide a step-by-step guide on deploying a MuleSoft application to CloudHub2 using GitHub Actions. Prerequisites GitHub account and basic knowledge of git. Anypoint Platform account. Anypoint Studio and basic knowledge of MuleSoft. Before we start, let's learn about GitHub Actions. GitHub Actions is a versatile and powerful automation platform provided by GitHub. It enables developers to define and automate workflows for their software development projects. With GitHub Actions, you can easily set up custom workflows to build, test, deploy, and integrate your code directly from your GitHub repository. Deploying the MuleSoft Application We will outline three key steps involved in this process. 1. Creating a Connected App Go to Access Management in the Anypoint Platform. Click on "Connected Apps" from the left side menu. Click on the "Create App" button. Give a suitable name to the application and select the "App acts on its own behalf (client credentials)" radio button. "Click on Add Scopes" button. Add the following scopes to the connected app and click on the "Save" button. The Connected App will be created. Copy the Id and Secret and keep it aside for further use. 2. Configuring the MuleSoft App Open the project in the Anypoint studio and go to the pom.xml file. In the pom.xml file, replace the value of "groupId" with the "Business Group Id" of your Anypoint Platform. Remove the "-SNAPSHOT" from the version. Go to the project folder in system explorer and add a folder named ".maven" inside the project folder. Inside the ".maven" folder, create a file named "settings.xml" and add the following configuration in the settings.xml file. XML <settings> <servers> <server> <id>ca.anypoint.credentials</id> <username>~~~Client~~~</username> <password>${CA_CLIENT_ID}~?~${CA_CLIENT_SECRET}</password> </server> </servers> </settings> Add the CloudHub2 Deployment configurations in the "mule-maven-plugin" inside the "build" tag like the image below. After the "build" tag, add the "distributionManagement." XML <configuration> <cloudhub2Deployment> <uri>https://anypoint.mulesoft.com</uri> <provider>MC</provider> <environment>Sandbox</environment> <target>Cloudhub-US-East-2</target> <muleVersion>4.4.0</muleVersion> <server>ca.anypoint.credentials</server> <applicationName>ashish-demo-project-v1</applicationName> <replicas>1</replicas> <vCores>0.1</vCores> <skipDeploymentVerification>${skipDeploymentVerification}</skipDeploymentVerification> <integrations> <services> <objectStoreV2> <enabled>true</enabled> </objectStoreV2> </services> </integrations> </cloudhub2Deployment> </configuration> XML <distributionManagement> <repository> <id>ca.anypoint.credentials</id> <name>Corporate Repository</name> <url>https://maven.anypoint.mulesoft.com/api/v2/organizations/${project.groupId}/maven</url> <layout>default</layout> </repository> </distributionManagement> Note: Keep the "applicationName" unique. "skipDeploymentVerification" is optional. "server" should match with the "id" provider in "distributionManagement". "id" provided in "distributionManagement" should match with the 'id" provided in the "settings.xml" file. For more information, visit the MuleSoft documentation. 3. Creating a Workflow File and Deploying the App Create a GitHub repository and push the project code to the repository. In this post, we will be using the "main" branch. Click on the "Settings" tab and select "Actions" from the "Secrets and variables" dropdown menu from the left side panel on the "Settings" page. Click on the "New Repository Secret" button and add the Client-Id of the Connected app that we created in Step 1. Similarly, add the Client-Secret also. Click on the "Actions" tab and select "Simple workflow" from the "Actions" page. Change the name of the pipeline and replace the default code with the pipeline code given below. YAML # This workflow will build a MuleSoft project and deploy to CloudHub name: Build and Deploy to Sandbox on: push: branches: [ main ] workflow_dispatch: jobs: build: runs-on: ubuntu-latest env: CA_CLIENT_ID: ${{ secrets.CA_CLIENT_ID } CA_CLIENT_SECRET: ${{ secrets.CA_CLIENT_SECRET } steps: - uses: actions/checkout@v3 - uses: actions/cache@v3 with: path: ~/.m2/repository key: ${{ runner.os }-maven-${{ hashFiles('**/pom.xml') } restore-keys: | ${{ runner.os }-maven- - name: Set up JDK 11 uses: actions/setup-java@v3 with: java-version: 11 distribution: 'zulu' - name: Print effective-settings (optional) run: mvn help:effective-settings - name: Build with Maven run: mvn -B package -s .maven/settings.xml - name: Stamp artifact file name with commit hash run: | artifactName1=$(ls target/*.jar | head -1) commitHash=$(git rev-parse --short "$GITHUB_SHA") artifactName2=$(ls target/*.jar | head -1 | sed "s/.jar/-$commitHash.jar/g") mv $artifactName1 $artifactName2 - name: Upload artifact uses: actions/upload-artifact@master with: name: artifacts path: target/*.jar upload: needs: build runs-on: ubuntu-latest env: CA_CLIENT_ID: ${{ secrets.CA_CLIENT_ID } CA_CLIENT_SECRET: ${{ secrets.CA_CLIENT_SECRET } steps: - uses: actions/checkout@v3 - uses: actions/cache@v3 with: path: ~/.m2/repository key: ${{ runner.os }-maven-${{ hashFiles('**/pom.xml') } restore-keys: | ${{ runner.os }-maven- - uses: actions/download-artifact@master with: name: artifacts - name: Upload to Exchange run: | artifactName=$(ls *.jar | head -1) mvn deploy \ -s .maven/settings.xml \ -Dmule.artifact=$artifactName \ deploy: needs: upload runs-on: ubuntu-latest env: CA_CLIENT_ID: ${{ secrets.CA_CLIENT_ID } CA_CLIENT_SECRET: ${{ secrets.CA_CLIENT_SECRET } steps: - uses: actions/checkout@v3 - uses: actions/cache@v3 with: path: ~/.m2/repository key: ${{ runner.os }-maven-${{ hashFiles('**/pom.xml') } restore-keys: | ${{ runner.os }-maven- - uses: actions/download-artifact@master with: name: artifacts - name: Deploy to Sandbox run: | artifactName=$(ls *.jar | head -1) mvn deploy -DmuleDeploy \ -Dmule.artifact=$artifactName \ -s .maven/settings.xml \ -DskipTests \ -DskipDeploymentVerification="true" This workflow contains three jobs. 1. Build: This step sets up the required environment, such as the Java Development Kit (JDK) version 11. It then executes Maven commands to build the project, package it into a JAR file, and append the commit hash to the artifact's filename. The resulting artifact is uploaded as an artifact for later use. 2. Upload: This step retrieves the previously built artifact and prepares it for deployment. It downloads the artifact from the artifacts repository and uses Maven to upload the artifact to the desired destination, such as the MuleSoft Exchange. The necessary credentials and settings are provided to authenticate and configure the upload process. 3. Deploy: The final step involves deploying the uploaded artifact to the CloudHub Sandbox environment. The artifact is downloaded, and the Maven command is executed with specific parameters for deployment, including the artifact name and necessary settings. Tests are skipped during deployment, and deployment verification is disabled. Commit the workflow file and click on the "Actions" tab. The workflow will automatically start since we made a commit. Click on the workflow and observe the steps as they execute. After Completion of the "Upload" stage, go to "Anypoint Exchange" and go to "root" from the left side menu and in the address bar, append "&type=app" and hit enter. You will see the uploaded artifact. Wait for the workflow to complete execution. After all three stages get executed successfully, go to "Runtime Manager" in "Anypoint Platform," and you will see your app being deployed there. Note: If you change the name of Client-Id and Client-Secret, make sure to update it in the Workflow file and the Repository Secrets as well. In this tutorial, we have used the main branch; you can change the branch in the workflow file to target some other branch. Changes in the CloudHub2 deployment configurations can be made according to the MuleSoft documentation. I hope this tutorial will help you. You can find the source code here.
In this 15-minute lightning talk, Diptesh “Dips” Mishra, CTO for Shoal (a Standard Chartered Venture) talks about the governance challenges that financial services organizations face when they look to adopt DevSecOps. Dips has worked for Nationwide, Lloyds Banking Group, and RBS and he’ll share key strategies behind successful implementations. Transcript I’ve been in financial services looking at the overlap between regulations and DevSecOps over the last six years. It’s been an interesting journey, and I’ve looked at it from the startup lens, both within larger corporate banks, more traditional investment banks, and retail banks, but also upcoming startups, starting from pre-seed seed grounds. My last role before Shoal was at Kroo Bank, where we built a digital bank in the UK from scratch, which is one of the most highly regulated environments. And now I’m at Shoal, where we are building a sustainability marketplace, where we are going to offer financial services products, but with a sustainability angle to it. It’s part of Standard Chartered bank, so we have all the regulations that we need to comply with. Hopefully I’ll keep it nice and interesting — let’s see how much I succeed! As you can get from the theme, it’s based on Alien vs. Predator. Can I get a quick show of hands from everyone who has at least seen one of the movies or the game? Great, thank you, that really helps — I was worried for a bit! So there’s this complex dichotomy between regulations and DevSecOps, and if you have been in modern financial services organizations, you can’t really escape it. It’s constantly there, be it whether you’re a fintech startup, a scale-up, a traditional or digital bank, retail private investment, in central banks — irrespective of what you do, you have got those challenges and it’s about picking a side, right? Before we pick any sides who you think might come out on top, it’s probably worth thinking about what each one of them are and their salient points, right? Regulations: why do people love them? They’re very well defined, clearly documented, there’s limited scope of interpretation. So you know exactly what you are going to get, what you need to do, right? There’s less room for interpretation there. DevSecOps on the other hand: it’s a bird, it’s a plane, no, it’s an alien! Even Superman is an alien for those who don’t know the full story. So it’s one of those where people are still trying to agree on one term for DevSecOps. If all of us went into a room and everyone had to write what they thought the definition of DevSecOps was, I’m sure we’d probably have tens of different answers in the room. Who are the main cheerleaders, the fan groups that drive regulations? You’ve got the regulators themselves obviously; you’ve got the people who hold senior management functions as part of the senior management certification regimes; risk and compliance teams (every single regulated organization has them: the gatekeepers, the change haters, they’re known by different names, none of them were given by me!); and then you’ve got DevSecOps. Who are the cheering squad for them? You’ve got digital and cloud-natives, you’ve got startups, as well as established tech behemoths like the FAANG organizations who are big, big proponents of it. You’ve got the disruptors, you’ve got agile coaches who are coming into organizations and saying: “You should have DevSecOps!” And of course you have got tooling companies, not naming anyone. But at the same time, there are some common themes here. They’re not entirely different, but the regulators and DevSecOps see that it’s non-negotiable: you absolutely must have us for regulations. It’s about the license to operate. You’ve got a mandate from the regulators within your jurisdiction and you need to be compliant with those regulations to operate. Similarly on the DevSecOps, it’s about being agile, but able to promote continuous changes quickly and efficiently. And so there are some common reasons they both think it’s non-negotiable, but at the same time, it’s about making sure that they have different reasons for it. Final part before we kick off the battle, the weapons! What do they bring to the table? So DevSecOps have got agile and digital transformation on their side. They’ve got innovation, that’s the new kid on the block. The most important weapon at their disposal is DevSecOps tooling. Now, please don’t strain your eyes trying to understand any or all of it — I’m sure you’ll find it online. But the idea is there’s a growing plethora of different tools, new ones coming up every day, all of which provide different and new features as part of it, right? It’s a proper sprawl. Regulations: what are their weapons? You’ve got forms, forms, more forms, death by forms. It starts with frameworks, you’ve got policies, controls, and audits. So there are different aspects that you would have all come across in any regulated space. What’s their nuclear weapon? You do not have our approval to deploy to production. That one sentence just kills everything. So that’s their main weapon. But is it really a fight? We have looked at how they are set up. We have seen they are potentially trying to achieve different goals and objectives. But are they really completely opposites? Can we not come to a middle ground somewhere? Let’s take a step back and think about where they are coming from — what are the drivers? And that’s one of the cool aspects, like when you get into a conflict resolution mode where you’ve got different viewpoints, you take a step back and start from the beginning, see where the people’s viewpoints are coming from. The regulations have a duty of care which starts from a good place — they want to protect the customers, but at the same time, some of it you sometimes wonder, "Are they doing this to protect the system itself?" They love inertia. A majority of these issues that you see are caused by deployments or releases. Any changes you make to the system are very likely to have been the cause of problems. And of course they want to ensure there are proper processes, governance, and controls. In terms of DevSecOps, they’ve come from that mindset of being lean, having principles like the ability to fail fast, test and learn, continuous rapid change, almost on an ongoing basis. And lots and lots of automation. Does anyone recognize this? This is the famous Agile Manifesto. Some of it has been lost over the years in translation, where people think that what was meant was only the things on the right are important. So while the manifesto itself was very clear that while the things on the left are also important, those on the right are the ones we would look to do more of. But some of the things, like if you talk about a plan in an agile organization, you’ll hear things like, “We don’t do any plans, there is no need to plan, we are fully agile!” That’s not what is meant. Same for documentation — we have working software, our code works, right? But it doesn’t say don’t document anything. So there is room for all of the eight elements on this, and that’s kind of where you start looking at what’s the common ground, where do you get regulations which are kind of represented on this side to work with those from DevSecOps? So now that we understand a bit more about them, perhaps we can negotiate? So to start with, we can look at testing everything. Figure out how we test — that can be a whole topic in itself, right? We were just discussing during the break that you can test everything, have 100% good coverage, every single test passing, but actually not test anything useful. So there is a lot to be done in that space. It’s about continuously learning in all environments. That is another area where if you look at changes that fail, once you get your pipeline and your processes sorted in terms of automation, one common thing that’s still different across environments are things like configuration. You will have a very different configuration in your production environment than what you would have in staging. And that might mean because you are connecting to a different backend, it could be because you have a third party system that has a sandbox and a production environment that’s separate. Any of those will give you differences and it’s about how you test work across these environments. And work hard to avoid failing — what people who are on the regulation side do not like is a more loose attitude where people are like, “It’s OK to fail.” But it is OK, and that’s where the psychological safety of the team and understanding of that comes in. But it’s not an excuse for taking things lightly or not being diligent about "Have you tested everything? Have you talked about possible exception and error scenarios?" So it’s about working through those. And when you do fail, fix it, learn it, and then work harder for next time Continuous everything — one important part for this one is regulations. People think regulators have a lot of inertia, or risk and compliance have a lot of inertia to change. But inertia is not just about being addressed. It also can be about uniform motion. So you can actually move towards a continuous compliance pipeline which enables these to move at pace. And yes, automation of processes, governance, and controls. I think that’s the recurring theme that you will see today at least. So what does it take? You’ve got fully engaged organizations — who must have buy-in from the leadership all the way to the grassroot levels. You can’t bring them to the table without one or the other. You need to have empathy and patience. That’s a key part of leadership. It enables teams to understand each other, different viewpoints, and it requires a lot of work. Regulations have been built up over decades, sometimes even centuries, right? DevSecOps hasn’t had that many years to grow. If we had to work together, empathy and patience are both important from all sides. And having a structured process. Think about what the requirements are. What are we trying to achieve? At the end of it, we are trying to release a deploy code. That code itself has a purpose. Why is it needed? What happens if it is not there? So every time we come across a control, think about being clear around why it is needed. An important question is: if I now remove the control away, what is it that can happen? What are the risks? And the third part is where it starts getting interesting. How can these requirements be achieved? That’s an open-ended question, but sometimes you might get the standard answer: fill out this form. By the time you complete this process, you will be well set. But what is it that filling out that form achieves? So the second question comes in: what are the different ways in which this requirement can be achieved? If you ask everyone who has a different viewpoint, you will get different answers, but in the end you will be better off because you can understand that there’s not just one way — which is a document-driven Excel spreadsheet based on some system where you need to fill out forms — it’s where you can bring different types of automation into place as well. Finally, what would provide assurance on the effectiveness of controls, governance, and processes? That’s the most important part from a regulatory standpoint, but that’s what we’re trying to achieve through automation. And so again, you boil it down to what are the outcomes and key results that we are looking for? And once you’ve got all of these answers, it’s just figuring out the next steps: who is going to do what and by when? How long does it take? It takes forever. It’s a continuous process, right? It’s the journey. There is no final destination here. It’s about making sure the journey is as enjoyable for the alien as well as the predator. So where do we start? Why not define and start with an MVP? Define the minimum process that you need, pick up one that you think you can automate, figure out how automated governance can be introduced in that area, and then maybe you can get them both to sit across the table from each other and play a nice game of chess instead! And that’s it from me. Thank you very much.
GitOps is a relatively new addition to the growing list of "Ops" paradigms taking shape in our industry. It all started with DevOps, and while the term DevOps has been around for some years now, it seems we still can't agree whether it's a process, mindset, job title, set of tools, or some combination of them all. We captured our thoughts about DevOps in our introduction to DevOps post, and we dive even deeper in our DevOps engineer's handbook. The term GitOps suffers from the same ambiguity, so in this post we look at: The history of GitOps GitOps goals and ideals The limitations of GitOps The tools that support GitOps The practical implications of adopting GitOps in your own organization The Origins of GitOps The term GitOps was originally coined in a blog post by WeaveWorks called GitOps - Operations by Pull Request. The post described how WeaveWorks used Git as a source of truth, leading to the following benefits: Since that original blog post, initiatives like the GitOps Working Group have been organized to: This working group recently released version one of their principles, which states that: The contrast between low level implementations of GitOps found in most blog posts and the high level ideals of a GitOps system described by the working group is worth discussion, as the differences between them is a source of much confusion. GitOps Doesn't Imply the Use of Git Most discussions around GitOps center on how building processes on Git give rise to many of the benefits ascribed to the GitOps paradigm. Git naturally provides an (almost) immutable history of changes, with changes annotated and approved via pull requests, and where the current state of the Git repository naturally represents the desired state of a system, thus acting as a source of truth. The overlap between Git and GitOps is undeniable. However, you may have noticed that Git was never mentioned as a requirement of GitOps by the working group. So while Git is a convenient component of a GitOps solution, GitOps itself is concerned with the functional requirements of a system rather than checking your declarative templates into Git. This distinction is important, because many teams fixate on the "Git" part of GitOps. The term GitOps is an unfortunate name for the concept it's trying to convey, leading many to believe Git is the central aspect of GitOps. But GitOps has won the marketing battle and gained mind share in IT departments. While it may be a restrictive term to describe functional requirements unrelated to Git, GitOps is now the shorthand for describing processes that implement a set of high level concerns. GitOps Doesn't Imply the Use of Kubernetes Kubernetes was the first widely used platform to combine the ideas of declarative state and continuous reconciliation with an execution environment to implement the reconciliation and host running applications. It really is magic to watch a Kubernetes cluster reconfigure itself to match the latest templates applied to the system. So it's no surprise that Kubernetes is the foundation of GitOps tools like Flux and Argo CD, while posts like 30+ Tools List for GitOps mention Kubernetes 20 times. While continuous reconciliation is impressive, it's not really magic. Behind the scenes Kubernetes runs a number of operators that are notified of configuration changes and execute custom logic to bring the cluster back to the desired state. The key requirements of continuous reconciliation are: Access to the configuration or templates declaratively expressing the desired state The ability to execute a process capable of reconciling a system when configuration is changed An environment in which the process can run Kubernetes bakes these requirements into the platform, making it easy to achieve continuous reconciliation. But these requirements can also be met with some simple orchestration, Infrastructure as Code (IaC) tools like Terraform, Ansible, Puppet, Chef, CloudFormation, Arm Templates, and an execution environment like a CI server: IaC templates can be stored in Git, file hosting platforms like S3 or Azure Blob Storage, complete with immutable audit histories. CI/CD systems can poll the storage, are notified of changes via webhooks, or have builds or deployments triggered via platforms like GitHub Actions. The IaC tooling is then executed, bringing the system in line with the desired state. Indeed, a real world end-to-end GitOps system inevitably incorporates orchestration outside of Kubernetes. For example, Kubernetes is unlikely to manage your DNS records, centralized authentication platforms, or messaging systems like Slack. You'll also likely find at least one managed service for things like databases, message queues, scheduling, and reporting more compelling than attempting to replicate them in a Kubernetes cluster. Also, any established IT department is guaranteed to have non-Kubernetes systems that would benefit from GitOps. So while the initial selection of specialized GitOps tools tends to be tightly integrated into Kubernetes, achieving the functional requirements of GitOps across established infrastructure will inevitably require orchestrating one or more IaC tools. Continuous Reconciliation Is Half the Battle Continuous reconciliation, as described by the working group, describes responses to two types of system changes. The first is what you expect, where deliberate changes to the configuration held in Git or other versioned storage is detected and applied to the system. This is the logical flow of configuration change and represents the normal operation of a correctly configured GitOps workflow. The second is where an agent detects undesirable changes to the system that are not described in the source configuration. In this case, your system no longer reflects the desired state, and the agent is expected to reconcile the system back to the configuration maintained in Git. This ability to resolve the second situation is a neat technical capability, but represents an incomplete business process. Imagine the security guards from your front desk reporting they had evicted an intruder. As a once-off occurrence, this report would be mildly concerning, but the security team did their job and resolved the issue. But now imagine you were receiving these reports every week. Obviously there is a more significant problem forcing the security team to respond to weekly intrusions. In the same manner, a system that continually removes undesirable system states is an incomplete solution to a more fundamental root problem. The real question is who is making those changes, why are the changes being made, and why are they not being made through the correct process? The fact your system can respond to undesirable states is evidence of a robust process able to adapt to unpredictable events, and this ability should not be underestimated. It's a long established best practice that teams should exercise their recovery processes, so in the event of disaster, teams are able to run through a well-rehearsed restoration. Continuous reconciliation can be viewed as a kind of automated restoration process, allowing the process to be tested and verified with ease. But if your system has to respond to undesirable states, it's evidence of a flawed process where people have access that they shouldn't or are not following established processes. An over-reliance on a system that can undo undesirable changes after they've been made runs the risk of masking a more significant underlying problem. GitOps Is Not a Complete Solution While GitOps describes many desirable traits of well-managed infrastructure and deployment processes, it's not a complete solution. In addition to the 4 functional requirements described by GitOps, a robust system must be: Verifiable - infrastructure and applications must be testable once they are deployed. Recoverable - teams must be able to recover from an undesirable state. Visible - the state of the infrastructure and the applications deployed to it must be surfaced in an easily consumed summary. Secure - rules must exist around who can make what changes to which systems. Measurable - meaningful metrics must be collected and exposed in an easily consumed format. Standardized - applications and infrastructure must be described in a consistent manner. Maintainable - support teams must be able to query and interact with the system, often in non-declarative ways. Coordinated - changes to applications and infrastructure must be coordinated between teams. GitOps offers little advice or insight into what happens before configuration is committed to a Git repo or other versioned and immutable storage, but it is "left of the repo" where the bulk of your engineering process will be defined. If your Git repo is the authoritative representation of your system, then anyone who can edit a repo essentially has administrative rights. However, Git repos don't provide a natural security boundary for the kind of nuanced segregation of responsibility you find in established infrastructure. This means you end up creating one repo per app per environment per role. Gaining visibility over each of these repos and ensuring they have the correct permissions is no trivial undertaking. You also quickly find that just because you can save anything in Git doesn't mean you should. It's not hard to imagine a rule that says development teams must create Kubernetes deployment resources instead of individual pods, use ingress rules that respond to very specific hostnames, and always include a standard security policy. This kind of standardization is tedious to enforce through pull requests, so a much better solution is to give teams standard resource templates that they populate with their specific configuration. But this is not a feature inherent to Git or GitOps. We then have those processes "right of the cluster," where management and support tasks are defined. Reporting on the intent of a Git commit is almost impossible. If you looked at a diff between two commits and saw that a deployment image tag was increased, new secret values were added, and a config map was deleted, how would you describe the intent of that change? The easy answer is to read the commit message, but this isn't a viable option for reporting tools that must map high level events like "deployed a new app version" or "bug fix release" (which are critical if you want to measure yourself against standard metrics like those presented in the DORA report) to the diff between two commits. Even if you could divine an algorithm that understood the intent of a Git commit, a Git repo was never meant to be used as a time-series database. GitOps also provides no guidance on how to perform support tasks after the system is in its desired state. What would you commit to a Git repo to delete misbehaving pods so they can be recreated by their parent deployment? Maybe a job could do this, but you have to be careful that Kubernetes doesn't try to apply that job resource twice. But then what would you commit to the repo to view the pod logs of a service like an ingress controller that was preinstalled on your cluster? My mind boggles at the thought of all the asynchronous message handling you would need to implement to recreate kubectl logs mypod in a GitOps model. Adhoc reporting and management tasks like this don't have a natural solution in the GitOps model. This is not to say that GitOps is flawed or incomplete, but rather that it solves specific problems, and must be complemented with other processes and tools to satisfy basic operational requirements. Git Is the Least Interesting Part of GitOps I'd like to present you with a theory and a thought experiment to apply it to: In any sufficiently complex GitOps process, your Git repo is just another structured database. You start your GitOps journey using the common combination of Git and Kubernetes. All changes are reviewed by pull request, committed to a Git repo, consumed by a tool like Argo CD or Flux, and deployed to your cluster. You have satisfied all the functional requirements of GitOps, and enjoy the benefits of a single source of truth, immutable change history, and continuous reconciliation. But it becomes tedious to have a person open a pull request to bump the image property in a deployment resource every time a new image is published. So you instruct your build server to pull the Git repo, edit the deployment resource YAML file, and commit the changes. You now have GitOps and CI/CD. You now need to measure the performance of your engineering teams. How often are new releases deployed to production? You quickly realize that extracting this information from Git commits is inefficient at best, and that the Kubernetes API was not designed for frequent and complex queries, so you choose to populate a more appropriate database with deployment events. As the complexity of your cluster grows, you find you need to implement standards regarding what kind of resources can be deployed. Engineering teams can only create deployments, secrets, and configmaps. The deployment resources must include resource limits, a set of standard labels, and the pods must not be privileged. In fact, it turns out that of the hundreds of lines of YAML that make up the resources deployed to the cluster, only about 10 should be customized. As you did with the image tag updates, you lift the editing of resources from manual Git commits to an automated process where templates have a strictly controlled subset of properties updated with each deployment. Now that your CI/CD is doing most of the commits to Git, you realize that you no longer need to use Git repos as a means of enforcing security rules. You consolidate the dozens of repos that were created to represent individual applications and environments to a single repo that only the CI/CD system interacts with on a day-to-day basis. You find yourself having to roll back a failed deployment, only to find that the notion of reverting a Git commit is too simplistic. The changes to the one application you wanted to revert have been mixed in with a dozen other deployments. Not that anyone should be touching the Git repo directly anyway, because merge conflicts can have catastrophic consequences. But you can use your CI/CD server to redeploy an old version of the application, and because the CI/CD server has the context of what makes up a single application, the redeployment only changes the files relating to that application. At this point, you concede that your Git repo is another structured database reflecting a subset of "the source of truth:" Humans aren't to touch it. All changes are made by automated tools. The automated tools require known files of specific formats in specific locations. The Git history shows a list of changes made by bots rather than people. The Git history now reads "Deployment #X.Y.Z", and other commit information only makes sense in the context of the automated tooling. Pull requests are no longer used. The "source of truth" is now found in the Git repo (showing changes to files), the CI/CD platform's history (showing the people who initiated the changes, and the scripts that made them), and the metrics database. You consolidated your Git repos, meaning you have limited ability to segregate access to humans even if you want to. You also realize that the parts of your GitOps process that are adding unique business value are "left of the repo" with metrics collection, standardized templates, release orchestration, rollbacks, and deployment automation; and "right of the cluster" with reports, dashboards, and support scripts. The process between the Git repo and cluster is now so automated and reliable that it's not something you need to think about. Conclusion GitOps has come to encapsulate a subset of desirable functional requirements that are likely to provide a great deal of benefit for any teams that fulfill them. While neither Git nor Kubernetes are required to satisfy GitOps, they are the logical platforms on which to start your GitOps journey, as they're well supported by the more mature GitOps tools available today. But GitOps tooling tends to be heavily focused on what happens between a commit to a Git repo and the Kubernetes cluster. While this is no doubt a critical component of any deployment pipeline, there's much work to be done "left of the repo" and "right of the cluster" to implement a robust CI/CD pipeline and DevOps workflow. GitOps tools also tend to assume that because everything is in Git, the intent of every change is annotated with commit messages, associated with the author, put through a review process, and is available for future inspection. However, this is overly simplistic, as any team advanced enough to consider implementing GitOps will immediately begin iterating on the process by automating manual touch points, usually with respect to how configuration is added to the Git repo in the first place. As you project the natural evolution of a GitOps workflow, you're likely to conclude that so many automated processes rely on the declarative configuration being in a specific location and format, that Git commits must be treated in much the same way as a database migration. The inputs to a GitOps process must be managed and orchestrated, and the outputs must be tested, measured, and maintained. Meanwhile the processing between the Git repo and cluster should be automated, rendering much of what we talk about as GitOps today as simply an intermediate step in a specialized CI/CD pipeline or DevOps workflow. Perhaps the biggest source of confusion around GitOps is the misconception that it represents an end-to-end solution, and that you implement GitOps and GitOps-focused tooling to the exclusion of alternative processes and platforms. In practice, GitOps encapsulates one step in your infrastructure and deployment pipelines, and must be complemented with other processes and platforms to fulfill common business requirements. Happy deployments!
Though I always worked on the Dev side of IT, I was also interested in the Ops side. I even had a short experience being a WebSphere admin: I used it several times, helping Ops deal with the Admin console while being a developer. Providing a single package that Ops can configure and deploy in different environments is very important. As a JVM developer, I've been happy using Spring Boot and its wealth of configuration options: command-line parameters, JVM parameters, files, profiles, environment variables, etc. In this short post, I'd like to describe how you can do the same with Apache APISIX in the context of containers. File-Based Configuration The foundation of configuring Apache APISIX is file-based. The default values are found in the /usr/local/apisix/conf/apisix/config-default.yaml configuration file. For example, by default, Apache APISIX runs on port 9080, and the admin port is 9180. That's because of the default configuration: YAML apisix: node_listen: - 9080 #1 #... deployment: admin: admin_listen: ip: 0.0.0.0 port: 9180 #2 Regular port Admin port To override values, we need to provide a file named config.yaml in the /usr/local/apisix/conf/apisix directory: YAML apisix: node_listen: - 9090 #1 deployment: admin: admin_listen: port: 9190 #1 Override values Now, Apache APISIX should run on port 9090, and the admin port should be 9190. Here's how to run the Apache APISIX container with the above configuration: Shell docker run -it --rm apache/apisix:3.4.1-debian \ -p 9090:9090 -p 9190:9190 \ -v ./config.yaml:/usr/local/apisix/conf/apisix/config.yaml Environment-Based Configuration The downside of a pure file-based configuration is that you must provide a dedicated file for each environment, even if only a single parameter changes. Apache APISIX allows replacement via environment variables in the configuration file to account for that. YAML apisix: node_listen: - ${{APISIX_NODE_LISTEN:=} #1 deployment: admin: admin_listen: port: ${{DEPLOYMENT_ADMIN_ADMIN_LISTEN:=} #1 Replace the placeholder with its environment variable value at runtime We can reuse the same file in every environment and hydrate it with the context-dependent environment variables: Shell docker run -it --rm apache/apisix:3.4.1-debian \ -e APISIX_NODE_LISTEN=9090 \ -e DEPLOYMENT_ADMIN_ADMIN_LISTEN=9190 \ -p 9090:9090 -p 9190:9190 \ -v ./config.yaml:/usr/local/apisix/conf/apisix/config.yaml Icing on the cake, we can also offer a default value: YAML apisix: node_listen: - ${{APISIX_NODE_LISTEN:=9080} #1 deployment: admin: admin_listen: port: ${{DEPLOYMENT_ADMIN_ADMIN_LISTEN:=9180} #1 If no environment variable is provided, use those ports; otherwise, use the environment variables' value The trick works in standalone mode with the apisix. yaml file. You can parameterize every context-dependent variable and secrets with it: YAML routes: - uri: /* upstream: nodes: "httpbin:80": 1 plugins: openid-connect: client_id: apisix client_secret: ${{OIDC_SECRET} discovery: https://${{OIDC_ISSUER}/.well-known/openid-configuration redirect_uri: http://localhost:9080/callback scope: openid session: secret: ${{SESSION_SECRET} Conclusion When configuring Apache APISIX, we should ensure it's as operable as possible. In this post, I've described several ways to make it so. Happy Apache APISIX! To Go Further: Default configuration Configuration file switching based on environment variables
Boris Zaikin
Lead Solution Architect,
CloudAstro GmBH
Pavan Belagatti
Developer Evangelist,
SingleStore
Nicolas Giron
Site Reliability Engineer (SRE),
KumoMind
Alireza Chegini
DevOps Architect / Azure Specialist,
Coding As Creating