Integrating PostgreSQL Databases with ANF: Join this workshop to learn how to create a PostgreSQL server using Instaclustr’s managed service
Mobile Database Essentials: Assess data needs, storage requirements, and more when leveraging databases for cloud and edge applications.
Cloud architecture refers to how technologies and components are built in a cloud environment. A cloud environment comprises a network of servers that are located in various places globally, and each serves a specific purpose. With the growth of cloud computing and cloud-native development, modern development practices are constantly changing to adapt to this rapid evolution. This Zone offers the latest information on cloud architecture, covering topics such as builds and deployments to cloud-native environments, Kubernetes practices, cloud databases, hybrid and multi-cloud environments, cloud computing, and more!
[DZone Research] Join Us for Our 5th Annual Kubernetes Survey!
Simplifying Your Kubernetes Infrastructure With CDK8s
Workflow automation tools play a pivotal role in today's dynamic digital landscape. They help streamline routine tasks, eradicate human errors, and increase productivity. With the help of workflow automation, organizations can automate processes, allowing teams to focus on strategic tasks. Whether it's data processing, application integration, or system monitoring, these tools provide scalable solutions to meet diverse needs. Amazon Web Services (AWS) offers a plethora of services geared towards automating process workflows. AWS Step Functions and AWS Managed Workflow for Apache Airflow (MWAA) are two such prominent services. Step Functions is a serverless workflow service that allows developers to coordinate multiple AWS services into serverless workflows. On the other hand, MWAA is a managed orchestration service for Apache Airflow, which is an open-source platform used to programmatically author, schedule, and monitor workflows. These robust tools have revolutionized businesses across sectors by simplifying complex processes and enhancing operational efficiency. In this article, I will delve into a comprehensive comparison between these two powerful tools, exploring their features, cost implications, ease of use, integration capabilities, and more. Task Management and Control Flow Logic AWS Step Functions shines in task management with its ability to break down complex procedures into manageable tasks. It coordinates different components of distributed applications and microservices using visual workflows, helping developers visualize and manage their applications' execution flow. Its error-handling capabilities allow automatic retries and fallback strategies, providing resilience against task failures and ensuring smooth execution of workflows. AWS MWAA, built on Apache Airflow, offers a rich set of task management features. It allows for defining, scheduling, and monitoring complex pipelines as code. This means developers can create Directed Acyclic Graphs (DAGs), which are series of tasks executed in a particular order. With Airflow's extensive operator library, developers can use pre-built tasks for common operations, enhancing the efficiency of task management. Both AWS Step Functions and AWS MWAA offer flexibility in implementing control flow logic. Step Functions support a range of state types, including Choice State for branching logic, Parallel State for concurrent execution, and Map State for processing array elements in parallel. Meanwhile, AWS MWAA uses Airflow's programming model, where workflows are defined as code, offering maximum flexibility in designing complex workflows. However, MWAA may require a steeper learning curve due to its coding requirement, while Step Functions provides a more intuitive, visual approach. Scalability and Performance AWS Step Functions is built to scale. It can handle thousands of tasks per second, making it a robust solution for large-scale, high-throughput use cases. Step Functions automatically scales the underlying infrastructure to meet the demands of your application, ensuring consistent performance without manual intervention. Moreover, it maintains the execution history for one year, aiding in process transparency and debugging. AWS MWAA offers auto-scaling capability with its integration with AWS Fargate. It scales out the number of task instances based on the demand of your workflows, ensuring efficient resource utilization. Furthermore, AWS MWAA isolates the resources for each environment, providing robust performance even for complex workloads. However, due to its reliance on the Airflow scheduler, there might be some delays in task execution during peak times. Both AWS Step Functions and AWS MWAA demonstrate high efficiency in handling large workloads. Step Functions, with its serverless architecture, ensures seamless scalability and consistent performance. AWS MWAA, with its auto-scaling feature and isolated environments, guarantees robust performance for complex workflows. The choice between the two depends on the specific requirements of your workload — if you prefer code-based workflows, AWS MWAA would be the better fit; if you prefer visual workflows with easy scalability, AWS Step Functions would be the way to go. Pricing and Cost Efficiency AWS Step Functions operates on a pay-as-you-go pricing model. You pay for each state transition, with the first 4,000 transitions free each month. For Express Workflows, which are designed for high-volume, short-duration use cases, you pay for the number of invocations, duration, and payload size. This pricing model can be cost-effective for sporadic or low-volume usage but may escalate for high-volume, complex workflows. AWS MWAA pricing is primarily based on the amount of vCPU and memory used per hour. You are billed for the total time your environment is running, regardless of the workload. Additionally, you pay for the storage used by your environment and the logs generated. Hence, while AWS MWAA provides robust capabilities, its pricing structure may result in higher costs for smaller workloads or idle environments. The cost-effectiveness of AWS Step Functions and AWS MWAA depends largely on your specific requirements. Step Functions, with its pay-per-use model, may prove more economical for small, infrequent workflows. Conversely, MWAA, with its charge based on resources, could be more cost-effective for large, complex workflows that run regularly. Therefore, a thorough understanding of your workflow patterns is crucial for a cost-benefit analysis. User Experience and Ease of Use AWS Step Functions boasts a sleek, visual interface for developing workflows. It enables users to design and manage workflows easily without needing extensive coding skills. The service also comes with pre-built templates for common tasks, further simplifying the user experience. However, complex workflows may still necessitate a solid understanding of JSON and the State Machine Language. AWS MWAA leverages the power of Apache Airflow, bringing along its rich feature set. With Airflow's code-as-configuration model, developers can define complex workflows with high precision. Although the learning curve for AWS MWAA might be steeper due to Airflow's programming paradigm, once mastered, it provides a high degree of flexibility and control over your workflows. While both AWS Step Functions and AWS MWAA offer powerful capabilities, there are differences in the user experience they provide. Step Functions, with its visual interface and pre-built templates, offers a more accessible entry point for beginners. On the other hand, AWS MWAA, with its code-based approach, provides a greater level of control and customization, making it a preferred choice for experienced developers and those handling complex, code-driven workflows. Integration With Other AWS Services AWS Step Functions offers seamless integration with a wide array of AWS services such as Lambda, ECS, SNS, DynamoDB, and more. This allows developers to orchestrate multiple AWS services into coordinated, automated workflows. Additionally, the service also supports integration with Amazon EventBridge, enabling event-driven workflows. AWS MWAA benefits from Apache Airflow's extensive ecosystem, which includes a rich collection of operators for integrating with various AWS services like RDS, Redshift, EMR, and more. Moreover, with the PythonOperator, developers can write custom Python code to interact with any AWS service, offering limitless possibilities for integration. Both AWS Step Functions and AWS MWAA excel in their integration capabilities. Step Functions' built-in support for various AWS services makes it easy to build workflows that leverage multiple AWS services. On the other hand, AWS MWAA, with its comprehensive library of operators and support for custom Python code, provides unmatched versatility in terms of integration. The choice between the two would again depend on your specific use case and requirements. Use Cases and Applicability AWS Step Functions is particularly suited for simplifying complex, distributed applications. It's ideal for orchestrating multiple AWS services, building serverless applications, automating operational tasks, and creating reliable, scalable data processing pipelines. With its ability to handle thousands of concurrent tasks, it's also a great fit for high-volume, high-throughput use cases. AWS MWAA shines in managing complex, code-based workflows. It's an excellent choice for data engineering tasks where intricate ETL jobs need to be scheduled and monitored. Furthermore, with its support for custom Python code, it's also a great tool for tasks requiring complex logic or integration with non-AWS services. Its auto-scaling capability makes it suitable for handling fluctuating workloads efficiently. The decision between AWS Step Functions and AWS MWAA largely depends on your specific application needs. If you're looking for an easy-to-use visual interface for building serverless workflows that integrate multiple AWS services, then AWS Step Functions might be the right choice. However, if you require highly customizable, code-based workflows, particularly for data engineering tasks, then AWS MWAA could be a better fit. Conclusion To make an informed decision between AWS Step Functions and AWS MWAA, it's crucial to thoroughly evaluate your specific system needs, workflow complexity, integration requirements, and budget constraints. Consider factors such as the level of coding expertise in your team, the scale and nature of your workflows, and the AWS services you need to integrate with. By aligning these considerations, you can select the workflow automation tool that best suits your organization's needs. Through this comparison, we've seen that both AWS Step Functions and AWS MWAA offer powerful features for workflow automation, albeit with distinct approaches. Step Functions excels in the visual orchestration of AWS services, error handling, scalability, and ease of use. AWS MWAA, on the other hand, stands out with its code-based workflows, extensive operator library, integration versatility, and robust performance for complex workloads. While pricing models differ, cost efficiency ultimately depends on your specific usage patterns.
If you are looking to create App Connect resources programmatically or provide your own monitoring and administration capabilities, this article offers an introduction to using the public API, with a worked example using Bash for those looking to get started. With the App Connect public API, you can programmatically perform the following tasks: Create and administer integration runtimes. Create and administer configurations. Create and administer BAR Files. Create and administer traces for an integration runtime. In Part One, we are going to: Create a configuration that gives us access to a GitHub repository that stores a BAR file. If you use a public repository, the configuration won’t actually be used, since my repository is public. If you were to use a private repository, the configuration would be necessary so App Connect can authenticate and pull in the BAR file from your repository. Create an integration runtime that will use the configuration. Call the flow’s endpoint that's running on our integration runtime. In Part Two, we will then do something more advanced that involves: Creating a new BAR file that contains a different flow from what is in the GitHub repository. Creating a new integration runtime that will run this BAR file. Editing the first integration runtime to use the BAR file that we have created and observing the changes. We will finish by cleaning up all of the above resources. At the time of this post, you can use this feature in: North Virginia Frankfurt London Sydney Jakarta Mumbai Prerequisites You need a current App Connect trial or paid subscription. For more information, see App Connect on AWS. From the App Connect Dashboard for your instance, navigate to the Settings panel and click the "Public API credentials" tab. Click on the "Generate" button and enter a name when prompted. This will give you a client ID and a client secret. Keep these safe and consider them both sensitive. From the same page, you will see a link to generate an API key. You will need to follow the steps on that page as well to get an API key. Once you have the client ID, client secret, and API key, these are not expected to change frequently. The current expiry date at the time of this post is two years. You will use these three pieces of information as well as your App Connect instance ID to generate an access token that you can use to interact with App Connect. You will need to use this access token for your operations. Note that at the time of this post, this access token will be active for a period of 12 hours only. To get the access token, you will need to use your client ID, client secret, API key, and instance ID, with our /api/v1/tokens POST endpoint. Your token will not automatically renew, so make sure you call this often enough to be able to continue with your App Connect API requirements. Here is the API overview. Note that the format of the token is a JSON Web Token and within the token, you will be able to determine its expiry date. Continue only once you have your token. What follows is a worked example that combines the above information with code snippets for you to adjust for your own needs. Note that for this example I have chosen to use Bash. As our API accepts JSON payloads, you will find lots of quotes around the strings: it is expected that you might use a request helper application or perform your API calls using a language that lets you easily create HTTP requests such as JavaScript, TypeScript, Node.js, Go, or Java – the choice is yours. To view the API documentation, click "API specification" on the Public API credentials page. You can also see the complete API specification in our documentation. Getting Started Make sure you have the following values to hand (in my example, I am going to set them as variables in my Bash shell, because they are going to be used a lot): Your App Connect instance ID, which I will be setting and using in the cURL commands with:export appConInstanceID=<the instance ID> Your App Connect client ID, which I will be setting and using in the cURL commands withexport appConClientID=<the client ID> Your App Connect authentication token (that you made earlier and lasts twelve hours), which I will be setting and using in the cURL commands with:export appConToken=<the App Connect authentication token> The App Connect endpoint that your instance is valid for, which I will be setting and using in the cURL commands with:export appConEndpoint=<the endpoint> The endpoint can be set to any of the regions that we mention above (check the documentation for the latest availability) but be aware of your own data processing and any regional needs - you may want to use the endpoint closest to you, or you may have legal, or data handling, requirements to use only a particular region. You can determine this endpoint from the OpenAPI document that can be downloaded from within App Connect, or via the public documentation we provide. Part One Creating a Configuration Note that configuration documentation applicable for this environment is available on the configuration types for integration runtimes page. In this example, you will create a configuration that stores a personal access token for a GitHub repository that contains a BAR file. Make sure that you have your personal access token to hand, with sufficient access scope and restrictions as you see fit. It is important that you keep this token to yourself. This personal access token is not to be confused with the token you will be using with App Connect. It is an example of a sensitive piece of data (in the form of an App Connect configuration) and is used for simplicity’s sake. Shell myGitHubToken="thetokenhere" gitHubAuthData=" { \"authType\":\"BASIC_AUTH\", \"credentials\":{ \"username\":\"token\", \"password\":\"${myGitHubToken}\" } } encodedAuthData=$(echo -e "${gitHubAuthData}" | base64) configurationBody="{ \"metadata\": { \"name\": \"my-github-configuration\" }, \"spec\": { \"data\": \"${encodedAuthData}\", \"description\": \"Authentication for GitHub\", \"type\": \"barauth\" } }" curl -X POST https://${appConEndpoint}/api/v1/configurations \ -H "x-ibm-instance-id: ${appConInstanceID}" \ -H "Content-Type: application/json" \ -H "Accept: application/json" \ -H "X-IBM-Client-Id: ${appConClientID}" \ -H "authorization: Bearer ${appConToken}" \ -d "${configurationBody}" If successful, you will then be able to see the configuration in the App Connect Dashboard. You could also perform an HTTP GET call to either list all configurations you have access to, or to get a particular configuration’s details. To get all instances of a particular resource (in this case, a configuration), you would use the following command: Shell curl -X GET https://${appConEndpoint}/api/v1/configurations \ -H "x-ibm-instance-id: ${appConInstanceID}" \ -H "Content-Type: application/json" \ -H "Accept: application/json" \ -H "X-IBM-Client-Id: ${appConClientID}" \ -H "authorization: Bearer ${appConToken}" To perform an HTTP GET operation on a named resource (in this case, called "my-github-configuration" that we created earlier), you would use the following command: Shell curl -X GET https://${appConEndpoint}/api/v1/configurations/my-github-configuration \ -H "x-ibm-instance-id: ${appConInstanceID}" \ -H "Content-Type: application/json" \ -H "Accept: application/json" \ -H "X-IBM-Client-Id: ${appConClientID}" \ -H "authorization: Bearer ${appConToken}" Creating an Integration Runtime Using the Configuration For more information on creating an integration runtime, see the creating an integration runtime documentation. Now that we have a configuration, we can create an integration runtime that uses the configuration like so: Shell irBody=’ { "metadata": { "name": “http-echo-service" }, "spec": { "template": { "spec": { "containers": [ { "name": "runtime" } ] } }, "barURL": [ "https://github.com/<your GitHub org>/<your repo with Bar files in>/raw/main/<your BAR file name>.bar" ], "configurations": [ "my-github-configuration" ], "version": "12.0", "replicas": 1 } }’ curl -X POST https://${appConEndpoint}/api/v1/integration-runtimes \ -H "x-ibm-instance-id: ${appConInstanceID}" \ -H "Content-Type: application/json" \ -H "Accept: application/json" \ -H "X-IBM-Client-Id: ${appConClientID}" \ -H "authorization: Bearer ${appConToken}" \ -d "${irBody}" Use the "spec" field to shape your resource. For more information, see the documentation.Note that we prevent the use of resource names that have certain prefixes in their name; e.g., "default-integration-runtime" (nor do we allow you to get, retrieve, create again, or delete it).You can also see this integration runtime running and using the configuration in the App Connect Dashboard. If you click to edit the integration runtime, you can see the configuration that is in use. In my case, I am using my own GitHub repository so I would see this: Once the integration runtime has started successfully, you will be able to invoke your flow as you would any other flow in App Connect. Invoking the Endpoint This depends on the flow that's defined in the BAR file that you've used. In my example, it is a simple HTTP echo service, and I can use the following cURL command to invoke it and receive a response. Consult the App Connect documentation for how you would retrieve the endpoint to use in this case. This request:curl -X POST https://http-echo-service-https-<my App Connect endpoint which has my instance ID in>/EchoGave me this response:<Echo><DateStamp>2023-08-07T13:04:13.143955Z</DateStamp></Echo> Part Two At this point, we know how to use the API to create a couple of simple resources. What we have not yet covered is uploading a BAR file for use in a new integration runtime. We have not yet covered the editing or deleting of resources either. Uploading a New BAR File These instructions assume that you already have a BAR file. To learn how to create a BAR file, refer to the App Connect Enterprise documentation. For more information about what resources are supported in BAR files that you import to App Connect, see the supported resources in imported BAR files documentation. You can use the App Connect API to upload a BAR file like so: Shell curl -X PUT https://${appConEndpoint}/api/v1/bar-files/TestBlogAPI \ -H "x-ibm-instance-id: ${appConInstanceID}" \ -H "Content-Type: application/octet-stream" \ -H "Accept: application/json" \ -H "X-IBM-Client-Id: ${appConClientID}" \ -H "authorization: Bearer ${appConToken}" \ --data-binary @/Users/adam/Downloads/TestBlogAPI.bar Notice the use of "--data-binary" here in order to prevent the BAR file from being unusable once uploaded. It is important to note that at the time of this post, the validation of the BAR file occurs when it is used by an integration runtime. In my case, I have downloaded the BAR file from my GitHub repository. You don’t need to include ".bar" in the path for the API call, either. If successful, you will be able to see this BAR file in the App Connect Dashboard. Also, in the HTTP response, you will see the location of this BAR URL on the App Connect content server. It will be of the form: {"name":"TestBlogAPI.bar","url":"https://dataplane-api-dash.appconnect:3443/v1/ac0ikbdsupj/directories/TestBlogAPI?" Important: you will need to use the "url" part of this, in your next command in order to have the integration runtime use this BAR file. Creating a New Integration Runtime That Uses the New Bar File Shell irBody=’ { "metadata": { "name": "second-ir-using-bar" }, "spec": { "template": { "spec": { "containers": [ { "name": "runtime" } ] } }, "barURL": [ "the exact URL from the previous step – including the question mark" ], "version": "12.0", "replicas": 1 } }’ curl -X POST https://${appConEndpoint}/api/v1/integration-runtimes \ -H "x-ibm-instance-id: ${appConInstanceID}" \ -H "Content-Type: application/json" \ -H "Accept: application/json" \ -H "X-IBM-Client-Id: ${appConClientID}" \ -H "authorization: Bearer ${appConToken}" \ -d "${irBody}" The key difference here is the removal of the configurations section and the difference in barURL.On success, you should be able to see this integration runtime in the App Connect Dashboard. Updating the First Integration Runtime To Use This Bar File Instead In my example, I now have two integration runtimes that use the same BAR file because I used the one from my GitHub repository. Let’s assume that I want to: Keep the first integration runtime Have it use this BAR file that I’ve uploaded using the API (instead of pulling from GitHub) Delete the second integration runtime We can do this like so: Shell irBody=’ { "metadata": { "name": "http-echo-service" }, "spec": { “template": { "spec": { "containers": [ { "name": "runtime" } ] } }, "barURL": [ "the exact URL from earlier – including the question mark" ], "version": "12.0", "replicas": 1 } }’ curl -X PUT https://${appConEndpoint}/api/v1/integration-runtimes/http-echo-service \ -H "x-ibm-instance-id: ${appConInstanceID}" \ -H "Content-Type: application/json" \ -H "Accept: application/json" \ -H "X-IBM-Client-Id: ${appConClientID}" \ -H "authorization: Bearer ${appConToken}" \ -d "${irBody}" The BAR URL differs and we no longer need to provide a configurations section, because no authorization is required to access a particular GitHub repository. On success, you will again be able to see this integration runtime in the App Connect Dashboard. Cleaning Up Each resource can be cleaned up programmatically through its appropriate delete HTTP request API calls. The order in which you perform these operations doesn't matter. To delete the configuration created in this example: Shell curl -X DELETE https://${appConEndpoint}/api/v1/configurations/my-github-configuration\ -H "x-ibm-instance-id: ${appConInstanceID}" \ -H "Content-Type: application/json" \ -H "Accept: application/json" \ -H "X-IBM-Client-Id: ${appConClientID}" \ -H "authorization: Bearer ${appConToken}" To delete the BAR file created in this example: Shell curl -X DELETE https://${appConEndpoint}/api/v1/bar-files/TestBlogAPI \ -H "x-ibm-instance-id: ${appConInstanceID}" \ -H "Content-Type: application/json" \ -H "Accept: application/json" \ -H "X-IBM-Client-Id: ${appConClientID}" \ -H "authorization: Bearer ${appConToken}" To delete both integration runtimes created in this example (although if you were following my commands, you should only have the first one): Shell curl -X DELETE https://${appConEndpoint}/api/v1/integration-runtimes/http-echo-service \ -H "x-ibm-instance-id: ${appConInstanceID}" \ -H "Content-Type: application/json" \ -H "Accept: application/json" \ -H "X-IBM-Client-Id: ${appConClientID}" \ -H "authorization: Bearer ${appConToken}" Shell curl -X DELETE https://${appConEndpoint}/api/v1/integration-runtimes/second-ir-using-bar \ -H "x-ibm-instance-id: ${appConInstanceID}" \ -H "Content-Type: application/json" \ -H "Accept: application/json" \ -H "X-IBM-Client-Id: ${appConClientID}" \ -H "authorization: Bearer ${appConToken}" Conclusion Using the public API, you can programmatically create resources for App Connect as you see fit (within the limits we define). The API provides an alternative to using the App Connect Dashboard, although you will be able to see changes in the Dashboard that you've made through the API. The public API provides equivalent functionality to the Dashboard only, and not to the App Connect Designer or App Connect Enterprise Toolkit. This example has demonstrated a broad range of features but more intricate scenarios using different connectors and flows can be explored.
Kubernetes is a popular open-source platform used for automating the deployment, scaling, and management of containerized applications. It provides a powerful API for managing resources, but sometimes its built-in resources are not sufficient for your use case. That's where Kubernetes Custom Resource Definitions (CRDs) come in. CRDs allow you to define your own custom resources, which can be managed in the same way as built-in resources like pods and services. In this tutorial, we'll go through the steps to implement a Kubernetes CRD. Prerequisites To follow along with this tutorial, you'll need: A Kubernetes cluster with kubectl installed and configured. The Kubernetes API Server running with RBAC (Role-Based Access Control) enabled. Basic understanding of Kubernetes resource manifests and YAML. Step 1: Define the CRD First, we'll define the YAML file that describes our CRD. This file specifies the name, version, and schema of the custom resource. For example, let's create a CRD for a fictional application called "myapp" with a version of "v1beta1": YAML apiVersion: apiextensions.k8s.io/v1beta1 kind: CustomResourceDefinition metadata: name: myapps.example.com spec: group: example.com versions: - name: v1beta1 served: true storage: true scope: Namespaced names: plural: myapps singular: myapp kind: Myapp YAML to Create CRD Definition In this YAML file: apiVersion: The version of the Kubernetes API to use for this object. kind: The kind of object this is (in this case, a CustomResourceDefinition). metadata: Metadata for the object, including its name. spec: The specification for the CRD. group: The API group for the custom resource. versions: A list of supported versions for the custom resource. name: The name of the version. served: Whether the version is served by the API server. storage: Whether the version should be persisted. scope: The scope of the custom resource (Cluster or Namespaced). names: The names used to refer to the custom resource. plural: The plural name of the resource. singular: The singular name of the resource. kind: The kind of the resource. shortNames: A list of short names for the resource. Save this YAML file as myapp-crd.yaml. Step 2: Create the CRD Next, we'll use kubectl to create the CRD in our Kubernetes cluster: $ kubectl create -f myapp-crd.yaml Create CRD Definition This will create the CRD myapps.example.com in the Kubernetes cluster. Step 3: Define the Custom Resource Now that we've defined the CRD, we can define the custom resource that will use this CRD. In this example, we'll create a YAML file that defines a custom resource for myapp: YAML piVersion: example.com/v1beta1 kind: Myapp metadata: name: myapp-sample spec: replicas: 3 image: nginx:latest YAML to create CRD In this YAML file, we define the following: apiVersion: The API group and version of the custom resource. Here, it's example.com/v1beta1. kind: The kind of the custom resource. Here, it's Myapp. metadata: Metadata associated with the custom resource. Here, we set the name to myapp-sample. spec: The specification of the custom resource. Here, we specify the number of replicas and the image to use. Save this YAML file as myapp-sample.yaml. Step 4: Create the Custom Resource Next, we'll use kubectl to create the custom resource in our Kubernetes cluster: $ kubectl create -f myapp-sample.yaml This will create the custom resource myapp-sample in the Kubernetes cluster. Step 5: View the Custom Resource To view the custom resource we just created, run the following command: kubectl get crd Command to check CRD You should see the list of CRDs created in the cluster and validate that the resource you have created exists. Conclusion Kubernetes Custom Resource Definitions (CRDs) are a powerful feature that allows you to extend Kubernetes with your own custom resources. With CRDs, you can create your own Kubernetes API resources, which can be used just like any other native Kubernetes resource. With these simple steps, you can easily create CRDs in your cluster.
Shift-left is an approach to software development and operations that emphasizes testing, monitoring, and automation earlier in the software development lifecycle. The goal of the shift-left approach is to prevent problems before they arise by catching them early and addressing them quickly. When you identify a scalability issue or a bug early, it is quicker and more cost-effective to resolve it. Moving inefficient code to cloud containers can be costly, as it may activate auto-scaling and increase your monthly bill. Furthermore, you will be in a state of emergency until you can identify, isolate, and fix the issue. The Problem Statement I would like to demonstrate to you a case where we managed to avert a potential issue with an application that could have caused a major issue in a production environment. I was reviewing the performance report of the UAT infrastructure following the recent application change. It was a Spring Boot microservice with MariaDB as the backend, running behind Apache reverse proxy and AWS application load balancer. The new feature was successfully integrated, and all UAT test cases are passed. However, I noticed the performance charts in the MariaDB performance dashboard deviated from pre-deployment patterns. This is the timeline of the events. On August 6th at 14:13, The application was restarted with a new Spring Boot jar file containing an embedded Tomcat. Application restarts after migration At 14:52, the query processing rate for MariaDB increased from 0.1 to 88 queries per second and then to 301 queries per second. Increase in query rate Additionally, the system CPU was elevated from 1% to 6%. Raise in CPU utilization Finally, the JVM time spent on the G1 Young Generation Garbage Collection increased from 0% to 0.1% and remained at that level. Increase in GC time on JVM The application, in its UAT phase, is abnormally issuing 300 queries/sec, which is far beyond what it was designed to do. The new feature has caused an increase in database connection, which is why the increase in queries is so drastic. However, the monitoring dashboard showed that the problematic measures were normal before the new version was deployed. The Resolution It is a Spring Boot application that uses JPA to query a MariaDB. The application is designed to run on two containers for minimal load but is expected to scale up to ten. Web - app - db topology If a single container can generate 300 queries per second, can it handle 3000 queries per second if all ten containers are operational? Can the database have enough connections to meet the needs of the other parts of the application? We had no other choice but to go back to the developer's table to inspect the changes in Git. The new change will take a few records from a table and process them. This is what we observed in the service class. List<X> findAll = this.xRepository.findAll(); No, using the findAll() method without pagination in Spring's CrudRepository is not efficient. Pagination helps to reduce the amount of time it takes to retrieve data from the database by limiting the amount of data fetched. This is what our primary RDBMS education taught us. Additionally, pagination helps to keep memory usage low to prevent the application from crashing due to an overload of data, as well as reducing the Garbage Collection effort of Java Virtual Machine, which was mentioned in the problem statement above. This test was conducted using only 2,000 records in one container. If this code were to move to production, where there are around 200,000 records in up to 10 containers, it could have caused the team a lot of stress and worry that day. The application was rebuilt with the addition of a WHERE clause to the method. List<X> findAll = this.xRepository.findAllByY(Y); The normal functioning was restored. The number of queries per second was decreased from 300 to 30, and the amount of effort put into garbage collection returned to its original level. Additionally, the system's CPU usage decreased. Query rate becomes normal Learning and Summary Anyone who works in Site Reliability Engineering (SRE) will appreciate the significance of this discovery. We were able to act upon it without having to raise a Severity 1 flag. If this flawed package had been deployed in production, it could have triggered the customer's auto-scaling threshold, resulting in new containers being launched even without an additional user load. There are three main takeaways from this story. Firstly, it is best practice to turn on an observability solution from the beginning, as it can provide a history of events that can be used to identify potential issues. Without this history, I might not have taken a 0.1% Garbage Collection percentage and 6% CPU consumption seriously, and the code could have been released into production with disastrous consequences. Expanding the scope of the monitoring solution to UAT servers helped the team to identify potential root causes and prevent problems before they occur. Secondly, performance-related test cases should exist in the testing process, and these should be reviewed by someone with experience in observability. This will ensure the functionality of the code is tested, as well as its performance. Thirdly, cloud-native performance tracking techniques are good for receiving alerts about high utilization, availability, etc. To achieve observability, you may need to have the right tools and expertise in place. Happy Coding!
AI is under the hype now, and some products overuse the AI topic a lot — however, many companies and products are automating their processes using this technology. In the article, we will discover AI products and build an AI landing zone. Let’s look into the top 3 companies that benefit from using AI. Github Copilot Github Copilot’s primary objective is to aid programmers by providing code suggestions and auto-completing lines or blocks of code while they write. By intelligently analyzing the context and existing code, it accelerates the coding process and enhances developer productivity. It becomes an invaluable companion for developers throughout their coding journey, capable of supporting various programming languages and comprehending code patterns. Neuraltext Neuraltext strives to encompass the entire content workflow, encompassing everything from generating ideas to executing them, all powered by AI. It is an AI-driven copywriter, SEO content, and keyword research tool. By leveraging AI copywriting capabilities, you can effortlessly produce compelling copy for your campaigns, generating numerous variations. With a vast collection of over 50 pre-designed templates for various purposes, such as Facebook ads, slogan ideas, blog sections, and more, Neuraltext simplifies the content creation process. Motum Motum is the intelligent operating system for operational fleet management. It has damage recognition that uses computer vision and machine learning algorithms to detect and assess damages to vehicles automatically. By analyzing images of vehicles, the AI system can accurately identify dents, scratches, cracks, and other types of damage. This technology streamlines the inspection process for insurance claims, auto body shops, and vehicle appraisals, saving time and improving accuracy in assessing the extent of damages. What Is a Cloud Landing Zone? AI Cloud landing zone is a framework that includes fundamental cloud services, tools, and infrastructure that form the basis for developing and deploying artificial intelligence (AI) solutions. What AI Services Are Included in the Landing Zone? Azure AI Landing zone includes the following AI services: Azure Open AI — Provides pre-built AI models and APIs for tasks like image recognition, natural language processing, and sentiment analysis, making it easier for developers to incorporate AI functionalities; Azure AI services also include machine learning tools and frameworks for building custom models and conducting data analysis. Azure AI Services — A service that enables organizations to create more immersive, personalized, and intelligent experiences for their users, driving innovation and efficiency in various industries; Developers can leverage these pre-built APIs to add intelligent features to their applications, such as face recognition, language understanding, and sentiment analysis, without extensive AI expertise. Azure Bot Services — This is a platform Microsoft Azure provides and is part of AI Services. It enables developers to create chatbots and conversational agents to interact with users across various channels, such as web chat, Microsoft Teams, Skype, Telegram, and other platforms. Architecture We started integrating and deploying the Azure AI Landing Zone into our environment. Three logical boxes separate the AI landing zone: Azure DevOps Pipelines Terraform Modules and Environments Resources that deployed to Azure Subscriptions We can see it in the diagram below. Figure 1: AI Landing Zone Architecture (author: Boris Zaikin) The architecture contains CI/CD YAML pipelines and Terraform modules for each Azure subscription. It contains two YAML files: tf-provision-ci.yaml is the main pipeline that is based on stages. It reuses tf-provision-ci.jobs.yaml pipeline for each environment. tf-provision-ci.jobs.yaml contains workflow to deploy terraform modules. YAML trigger: - none pool: vmImage: 'ubuntu-latest' variables: devTerraformDirectory: "$(System.DefaultWorkingDirectory)/src/tf/dev" testTerraformDirectory: "$(System.DefaultWorkingDirectory)/src/tf/test" prodTerraformDirectory: "$(System.DefaultWorkingDirectory)/src/tf/prod" stages: - stage: Dev jobs: - template: tf-provision-ci-jobs.yaml parameters: environment: test subscription: 'terraform-spn' workingTerraformDirectory: $(devTerraformDirectory) backendAzureRmResourceGroupName: '<tfstate-rg>' backendAzureRmStorageAccountName: '<tfaccountname>' backendAzureRmContainerName: '<tf-container-name>' backendAzureRmKey: 'terraform.tfstate' - stage: Test jobs: - template: tf-provision-ci-jobs.yaml parameters: environment: test subscription: 'terraform-spn' workingTerraformDirectory: $(testTerraformDirectory) backendAzureRmResourceGroupName: '<tfstate-rg>' backendAzureRmStorageAccountName: '<tfaccountname>' backendAzureRmContainerName: '<tf-container-name>' backendAzureRmKey: 'terraform.tfstate' - stage: Prod jobs: - template: tf-provision-ci-jobs.yaml parameters: environment: prod subscription: 'terraform-spn' prodTerraformDirectory: $(prodTerraformDirectory) backendAzureRmResourceGroupName: '<tfstate-rg>' backendAzureRmStorageAccountName: '<tfaccountname>' backendAzureRmContainerName: '<tf-container-name>' backendAzureRmKey: 'terraform.tfstate' tf-provision-ci.yaml — Contains the main configuration, variables, and stages: Dev, Test, and Prod; The pipeline re-uses the tf-provision-ci.jobs.yaml in each stage by providing different parameters. After we’ve added and executed the pipeline to AzureDevOps, we can see the following staging structure. Figure 2: Azure DevOps Stages UI Azure DevOps automatically recognizes stages in the main YAML pipeline and provides a proper UI. Let’s look into tf-provision-ci.jobs.yaml. YAML jobs: - deployment: deploy displayName: AI LZ Deployments pool: vmImage: 'ubuntu-latest' environment: ${{ parameters.environment } strategy: runOnce: deploy: steps: - checkout: self # Prepare working directory for other commands - task: TerraformTaskV3@3 displayName: Initialise Terraform Configuration inputs: provider: 'azurerm' command: 'init' workingDirectory: ${{ parameters.workingTerraformDirectory } backendServiceArm: ${{ parameters.subscription } backendAzureRmResourceGroupName: ${{ parameters.backendAzureRmResourceGroupName } backendAzureRmStorageAccountName: ${{ parameters.backendAzureRmStorageAccountName } backendAzureRmContainerName: ${{ parameters.backendAzureRmContainerName } backendAzureRmKey: ${{ parameters.backendAzureRmKey } # Show the current state or a saved plan - task: TerraformTaskV3@3 displayName: Show the current state or a saved plan inputs: provider: 'azurerm' command: 'show' outputTo: 'console' outputFormat: 'default' workingDirectory: ${{ parameters.workingTerraformDirectory } environmentServiceNameAzureRM: ${{ parameters.subscription } # Validate Terraform Configuration - task: TerraformTaskV3@3 displayName: Validate Terraform Configuration inputs: provider: 'azurerm' command: 'validate' workingDirectory: ${{ parameters.workingTerraformDirectory } # Show changes required by the current configuration - task: TerraformTaskV3@3 displayName: Build Terraform Plan inputs: provider: 'azurerm' command: 'plan' workingDirectory: ${{ parameters.workingTerraformDirectory } environmentServiceNameAzureRM: ${{ parameters.subscription } # Create or update infrastructure - task: TerraformTaskV3@3 displayName: Apply Terraform Plan continueOnError: true inputs: provider: 'azurerm' command: 'apply' environmentServiceNameAzureRM: ${{ parameters.subscription } workingDirectory: ${{ parameters.workingTerraformDirectory } tf-provision-ci.jobs.yaml — Contains Terraform tasks, including init, show, validate, plan, and apply. Below, we can see the execution process. Figure 3: Azure DevOps Landing Zone Deployment UI As we can see, the execution of all pipelines is done successfully, and each job provides detailed information about state, configuration, and validation errors. Also, we must not forget to fill out the Request Access Form. It takes a couple of days to get a response back. Otherwise, the pipeline will fail with a quota error message. Terraform Scripts and Modules By utilizing Terraform, we can encapsulate the code within a Terraform module, allowing for its reuse across various sections of our codebase. This eliminates the need for duplicating and replicating the same code in multiple environments, such as staging and production. Instead, both environments can leverage code from a shared module, promoting code reusability and reducing redundancy. A Terraform module can be defined as a collection of Terraform configuration files organized within a folder. Technically, all the configurations you have written thus far can be considered modules, although they may not be complex or reusable. When you directly deploy a module by running “apply” on it, it is called a root module. However, to truly explore the capabilities of modules, you need to create reusable modules intended for use within other modules. These reusable modules offer greater flexibility and can significantly enhance your Terraform infrastructure deployments. Let’s look at the project structure below. Figure 4: Terraform Project Structure Modules The image above shows that all resources are placed in one Module directory. Each Environment has its directory, index terraform file, and variables where all resources are reused in an index.tf file with different parameters that are inside variable files. We will place all resources in a separate file in the module, and all values will be put into Terraform variables. This allows managing the code quickly and reduces hardcoded values. Also, resource granularity allows organized teamwork with a GIT or other source control (fewer merge conflicts). Let’s have a look into the open-ai tf module. YAML resource "azurerm_cognitive_account" "openai" { name = var.name location = var.location resource_group_name = var.resource_group_name kind = "OpenAI" custom_subdomain_name = var.custom_subdomain_name sku_name = var.sku_name public_network_access_enabled = var.public_network_access_enabled tags = var.tags identity { type = "SystemAssigned" } lifecycle { ignore_changes = [ tags ] } } The Open AI essential parameters lists: prefix: Sets a prefix for all Azure resources domain: Specifies the domain part of the hostname used to expose the chatbot through the Ingress Controller subdomain: Defines the subdomain part of the hostname used for exposing the chatbot via the Ingress Controller namespace: Specifies the namespace of the workload application that accesses the Azure OpenAI Service service_account_name: Specifies the name of the service account used by the workload application to access the Azure OpenAI Service vm_enabled: A boolean value determining whether to deploy a virtual machine in the same virtual network as the AKS cluster location: Specifies the region (e.g., westeurope) for deploying the Azure resources admin_group_object_ids: The array parameter contains the list of Azure AD group object IDs with admin role access to the cluster. We need to pay attention to the subdomain parameters. Azure Cognitive Services utilize custom subdomain names for each resource created through Azure tools such as the Azure portal, Azure Cloud Shell, Azure CLI, Bicep, Azure Resource Manager (ARM), or Terraform. These custom subdomain names are unique to each resource and differ from regional endpoints previously shared among customers in a specific Azure region. Custom subdomain names are necessary for enabling authentication features like Azure Active Directory (Azure AD). Specifying a custom subdomain for our Azure OpenAI Service is essential in some cases. Other parameters can be found in “Create a resource and deploy a model using Azure OpenAI.” In the Next Article Add an Az private endpoint into the configuration: A significant aspect of Azure Open AI is its utilization of a private endpoint, enabling precise control over access to your Azure Open AI services. With private endpoint, you can limit access to your services to only the necessary resources within your virtual network. This ensures the safety and security of your services while still permitting authorized resources to access them as required. Integrate OpenAI with Aazure Kubernetes Services: Integrating OpenAI services with a Kubernetes cluster enables efficient management, scalability, and high availability of AI applications, making it an ideal choice for running AI workloads in a production environment. Describe and compare our lightweight landing zone and OpenAI landing zone from Microsoft. Project Repository GitHub - Boriszn/Azure-AI-LandingZone Conclusion This article explores AI products and creating an AI landing zone. We highlight three key players benefiting from AI: Reply.io for sales engagement, Github Copilot for coding help, and Neuraltext for AI-driven content. Moving to AI landing zones, we focus on Azure AI services, like Open AI, with pre-built models and APIs. We delve into architecture using Terraform and CI/CD pipelines. Terraform’s modular approach is vital, emphasizing reusability. We delve into Open AI module parameters, especially custom subdomains for Azure Cognitive Services. In this AI-driven era, automation and intelligent decisions are revolutionizing technology.
Running a large-scale application serving customers across multiple regions can be challenging. You can use different methods to control the Kubernetes scheduler, but if you long for high availability and efficiency across different failure domains, it’s time to explore topology spread constraints. Optimizing performance and availability can be tricky even when your systems don’t comprise numerous data centers across different countries or continents. Node Selector, Node Affinity, Pod Affinity, and Anti-affinity can help you tame the Kubernetes scheduler, but they may be insufficient for setups with multiple failure domains like regions, zones, and nodes. By configuring topology spread constraints, you can ensure that your workloads stay online even if there is an outage or hardware failure in one zone. They can also let you gain a much finer level of control over your Pods’ distribution and support rolling update workloads and scaling replicas smoothly. Here’s how Pod topology spreads work in Kubernetes clusters and how to use them. Explaining Pod Topology Spreads Although the name may initially sound like a philosophical concept, when it comes to the cloud, topology refers to the arrangement of elements within a network. By configuring Pod topology spreads, you get fine-grained control over the distribution of pods across the failure domains. A failure domain refers to a set of resources that can be negatively impacted in case of a failure. In the context of Kubernetes, there are three main types of such domains: Node failure domain refers to individual nodes within a cluster. If a node fails or becomes unreachable, it can affect the Pods running on it. Availability Zone failure domain represents distinct zones within a cloud provider’s infrastructure. Region failure domain involves a broader geographical region and comprises multiple Availability Zones (AZs). Let’s now consider two examples to clarify their significance better. Mitigating Different Failure Domains With Pod Topology Spreads First, imagine a cluster of twenty nodes. You want to run a workload that automatically scales its replica number. It can scale anywhere from two to twenty Pods, and you want to run those replicas on as many separate nodes as possible. This approach helps to minimize the risk of a node failure affecting the workload. Then let’s think about an application with fifteen replicas running on three nodes in the same Availability Zone, with five Pods on each node. You’ve mitigated the node failure risk, but clients interacting with the workload come from three distinct zones — and traffic spanning different AZs results in higher latency and network costs. You can reduce them by distributing Pods across nodes in different AZs and routing clients to the instances inside the relevant zone. Deploying the workload in multiple zones in addition to running in on several nodes further decreases the risk of a failure affecting your Pods. Normally, you’d want to distribute workloads evenly across every failure domain. You can configure that with pod topology constraints — and to do so, use the spec.topologySpreadConstraints field. How Pod Topology Spread Constraints Work Here’s an example of a pod topology spread constraint: apiVersion: v1 kind: Pod metadata: name: example-pod spec: # Configure a topology spread constraint topologySpreadConstraints: - maxSkew: <integer> minDomains: <integer> # optional; topologyKey: <string> whenUnsatisfiable: <string> labelSelector: <object> matchLabelKeys: <list> # optional; nodeAffinityPolicy: [Honor|Ignore] # optional; nodeTaintsPolicy: [Honor|Ignore] # optional; You can find a full explanation of each element in the Kubernetes documentation. For now, let’s just briefly outline the obligatory fields: maxSkew is the degree to which your Pods can be distributed unevenly across all zones. Its value must be more than zero. topologyKey is the key of node labels. Nodes with the same label and values belong to the same topology. Each topology instance is a domain to which the scheduler tries to assign a balanced number of pods. whenUnsatisfiable lets you decide what to do with a Pod when it doesn’t satisfy your spread constraint:1. DoNotSchedule instructs the scheduler not to schedule it.2. ScheduleAnyway tells the scheduler to schedule it and prioritize the nodes minimizing the skew. labelSelector allows finding matching Pods. The number of Pods in their corresponding topology domain is based on the Pods matching the label selector. Pod Topology Spread’s Relation to Other Scheduling Policies Before topology spread constraints, Pod Affinity and Anti-affinity were the only rules to achieve similar distribution results. But their uses are limited to two main rules: Prefer or require an unlimited number of Pods to only run on a specific set of nodes; Try to avoid running more than one Pod on the same node. As a more flexible alternative to Pod Affinity/anti-Affinity, topology spread constraints let you separate nodes into groups and assign Pods using a label selector. They also allow you to instruct the scheduler on how (un)evenly to distribute those Pods. Topology spread constraints can overlap with other scheduling policies like Node Selector or taints. The last two fields of a pod topology spread let you decide on the nature of these relations: nodeAffinityPolicy lets you decide how to treat your Pod’s nodeAffinity and nodeSelector when calculating the topology spread skew. You get two options:1. Honor only includes nodes matching nodeAffinity and nodeSelector.2. Ignore, ignoring these settings and including all nodes in the calculations.The Honor approach is the default if you leave this field empty. nodeTaintsPolicy indicates how you wish to treat node taints when calculating pod topology spread skew. Here you also get two options: Honor or Ignore, with the latter being followed if you leave this field empty. How to Use Topology Spread Constraints Pod spread constraints rely on Kubernetes labels to identify the topology domains that each node is in. For example, a node may have labels like this: region: us-west-1 zone: us-west-1a So if you have a cluster with four nodes with the following labels: NAME STATUS ROLES AGE VERSION LABELS node1 Ready <none> 2m26s v1.16.0 node=node1,zone=zoneA node2 Ready <none> 7m48s v1.16.0 node=node2,zone=zoneA node3 Ready <none> 2m55s v1.16.0 node=node3,zone=zoneB node4 Ready <none> 2m43s v1.16.0 node=node4,zone=zoneB Then the cluster view would be like this: +---------------+---------------+ | zoneA | zoneB | +-------+-------+-------+-------+ | node1 | node2 | node3 | node4 | +-------+-------+-------+-------+ You can check the Kubernetes documentation for more examples of topology spread constraints. Applying labels can get messy, so you need a mechanism to ensure consistent labeling. To avoid having to apply labels manually, most clusters automatically populate well-known labels such as kubernetes.io/region. When there is more than one topologySpreadConstraint describing a Pod, all these constraints get combined using a logical AND operation. The Kubernetes scheduler then looks for a node satisfying all these constraints. Toward More Affordable Efficiency and Fault-Tolerance Topology spread constraints help to ensure high availability and fault-tolerance in distributed systems. When combining them with CAST AI’s ability to identify and provision cheaper nodes, you can also benefit in terms of optimizing costs. Here’s an example of a workload deployment utilizing spot instances for cost savings and topology spread constraints for increased fault-tolerance: apiVersion: apps/v1 kind: Deployment metadata: name: nginx-deployment spec: selector: matchLabels: app: nginx replicas: 8 template: metadata: labels: app: nginx spec: tolerations: - key: scheduling.cast.ai/spot operator: Exists nodeSelector: scheduling.cast.ai/spot: "true" topologySpreadConstraints: - maxSkew: 1 topologyKey: topology.kubernetes.io/zone whenUnsatisfiable: DoNotSchedule labelSelector: matchLabels: app: nginx containers: - name: nginx image: nginx:1.14.2 resources: requests: memory: "1024Mi" cpu: "500m" Following this configuration, CAST AI’s Autoscaler will find and provision the most affordable instances that fit the workload in different availability zones. After the deployment, workloads get distributed across nodes in different availability zones: NAME STATUS NODE ZONE nginx-deployment-76789fc756-28js7 Running gke-cluster-07-03-cast-pool-263bc096 us-central1-a nginx-deployment-76789fc756-6dt2x Running gke-cluster-07-03-cast-pool-263bc096 us-central1-a nginx-deployment-76789fc756-7crcj Running gke-cluster-07-03-cast-pool-bb05def6 us-central1-b nginx-deployment-76789fc756-7zx6z Running gke-cluster-07-03-cast-pool-e2f8a420 us-central1-c nginx-deployment-76789fc756-8k546 Running gke-cluster-07-03-cast-pool-e2f8a420 us-central1-c nginx-deployment-76789fc756-bnzbq Running gke-cluster-07-03-cast-pool-e2f8a420 us-central1-c nginx-deployment-76789fc756-jm658 Running gke-cluster-07-03-cast-pool-bb05def6 us-central1-b nginx-deployment-76789fc756-jmmdj Running gke-cluster-07-03-cast-pool-bb05def6 us-central1-b As a result, your workload will keep running on the most cost-efficient resources even if a failure occurs in one of the zones. If spot instances become unavailable, the Fallback feature can temporarily move your Pods to on-demand resources to ensure continued running. Topology Spread Constraints: Limitations and Solutions Topology spread constraints promise a greater level of distribution of pods across domains, but as with many other K8s aspects, they come with some nuances. First, there is no guarantee that your constraints remain satisfied when Pods are removed. This could happen, for example, when scaling down a deployment leads to an imbalanced Pod distribution. By default, the K8s scheduler doesn’t know all of your cluster’s zones or other topology domains. They are determined from the existing nodes in the cluster. This could lead to a problem in autoscaled clusters when a node pool or group scales to zero nodes. Topology domains need at least one node to be considered by autoscalers. One workaround to these challenges is to use a cluster autoscaler aware of Pod topology spread constraints and the overall domain set like CAST AI. The platform supports the zone topology key topology.kubernetes.io/zone, enabling your pods to be spread between availability zones and taking advantage of cloud redundancy. Read more in the documentation. Conclusion Topology spread constraints are an important asset in the Kubernetes user’s weaponry. Knowing how to use them helps you deploy and run more efficient and highly available workloads.
2023 has seen rapid growth in native-cloud applications and platforms. Organizations are constantly striving to maximize the potential of their applications, ensure seamless user experiences, and drive business growth. The rise of hybrid cloud environments and the adoption of containerization technologies, such as Kubernetes, have revolutionized the way modern applications are developed, deployed, and scaled. In this digital arena, Kubernetes is the platform of choice for most cloud-native applications and workloads, which is adopted across industries. According to a 2022 report, 96% of companies are already either using or evaluating the implementation of Kubernetes in their cloud system. This popular open-source utility is helpful for container orchestration and discovery, load balancing, and other capabilities. However, with this transformation comes a new set of challenges. As the complexity of applications increases, so does the need for robust observability solutions that enable businesses to gain deep insights into their containerized workloads. Enter Kubernetes observability—a critical aspect of managing and optimizing containerized applications in hybrid cloud environments. In this blog post, we will delve into Kubernetes observability, exploring six effective strategies that can empower businesses to unlock the full potential of their containerized applications in hybrid cloud environments. These strategies, backed by industry expertise and real-world experiences, will equip you with the tools and knowledge to enhance the observability of your Kubernetes deployments, driving business success. Understanding Observability in Kubernetes Let us first start with the basics. Kubernetes is a powerful tool for managing containerized applications. But despite its powerful features, keeping track of what's happening in a hybrid cloud environment can be difficult. This is where observability comes in. Observability is collecting, analyzing, and acting on data in a particular environment. In the context of Kubernetes, observability refers to gaining insights into the behavior, performance, and health of containerized applications running within a Kubernetes cluster. Kubernetes Observability is based on three key pillars: 1. Logs: Logs provide valuable information about the behavior and events within a Kubernetes cluster. They capture important details such as application output, system errors, and operational events. Analyzing logs helps troubleshoot issues, understand application behavior, and identify patterns or anomalies. 2. Metrics: Metrics are quantitative measurements that provide insights into a Kubernetes environment's performance and resource utilization. They include CPU usage, memory consumption, network traffic, and request latency information. Monitoring and analyzing metrics help identify performance bottlenecks, plan capacity, and optimize resource allocation. 3. Traces: Traces enable end-to-end visibility into the flow of requests across microservices within a Kubernetes application. Distributed tracing captures timing data and dependencies between different components, providing a comprehensive understanding of request paths. Traces help identify latency issues, understand system dependencies, and optimize critical paths for improved application performance. Kubernetes observability processes typically involve collecting and analyzing data from various sources to understand the system's internal state and provide actionable intelligence. By implementing the right observability strategies, you can gain a deep understanding of your applications and infrastructure, which will help you to: Detect and troubleshoot problems quickly Improve performance and reliability Optimize resource usage Meet compliance requirements Observability processes are being adopted at a rapid pace by IT teams. By 2026, 70% of organizations will have successfully applied observability to achieve shorter latency for decision-making while increasing distributed, organized, and simplified data management processes. 1. Use Centralized Logging and Log Aggregation For gaining insights into distributed systems, centralized logging is an essential strategy. In Kubernetes environments, where applications span multiple containers and nodes, collecting and analyzing logs from various sources becomes crucial. Centralized logging involves consolidating logs from different components into a single, easily accessible location. The importance of centralized logging lies in its ability to provide a holistic view of your system's behavior and performance. With Kubernetes logging, you can correlate events and identify patterns across your Kubernetes cluster, enabling efficient troubleshooting and root-cause analysis. To implement centralized logging in Kubernetes, you can leverage robust log aggregation tools or cloud-native solutions like Amazon CloudWatch Logs or Google Cloud Logging. These tools provide scalable and efficient ways to collect, store, and analyze logs from your Kubernetes cluster. 2. Leverage Distributed Tracing for End-to-End Visibility In a complex Kubernetes environment with microservices distributed across multiple containers and nodes, understanding the flow of requests and interactions between different components becomes challenging. This is where distributed tracing comes into play, providing end-to-end visibility into the execution path of requests as they traverse through various services. Distributed tracing allows you to trace a request's journey from its entry point to all the microservices it touches, capturing valuable information about each step. By instrumenting your applications with tracing libraries or agents, you can generate trace data that reveals each service's duration, latency, and potential bottlenecks. The benefits of leveraging distributed tracing in Kubernetes are significant. Firstly, it helps you understand the dependencies and relationships between services, enabling better troubleshooting and performance optimization. When a request experiences latency or errors, you can quickly identify the service or component responsible and take corrective actions. Secondly, distributed tracing allows you to measure and monitor the performance of individual services and their interactions. By analyzing trace data, you can identify performance bottlenecks, detect inefficient resource usage, and optimize the overall responsiveness of your system. This information is invaluable with regard to capacity planning and ensuring scalability in your Kubernetes environment. Several popular distributed tracing solutions are available. These tools provide the necessary instrumentation and infrastructure to effectively collect and visualize trace data. By integrating these solutions into your Kubernetes deployments, you can gain comprehensive visibility into the behavior of your microservices and drive continuous improvement. 3. Integrate Kubernetes With APM Solutions To achieve comprehensive observability in Kubernetes, it is essential to integrate your environment with Application Performance Monitoring (APM) solutions. APM solutions provide advanced monitoring capabilities beyond traditional metrics and logs, offering insights into the performance and behavior of individual application components. One of the primary benefits of APM integration is the ability to detect and diagnose performance bottlenecks within your Kubernetes applications. With APM solutions, you can trace requests as they traverse through various services and identify areas of high latency or resource contention. Armed with this information, you can take targeted actions to optimize critical paths and improve overall application performance. Many APM solutions offer dedicated Kubernetes integrations that streamline the monitoring and management of containerized applications. These integrations provide pre-configured dashboards, alerts, and instrumentation libraries that simplify capturing and analyzing APM data within your Kubernetes environment. 4. Use Metrics-Based Monitoring Metrics-based monitoring forms the foundation of observability in Kubernetes. It involves collecting and analyzing key metrics that provide insights into your Kubernetes clusters and applications' health, performance, and resource utilization. When it comes to metrics-based monitoring in Kubernetes, there are several essential components to consider: Node-Level Metrics: Monitoring the resource utilization of individual nodes in your Kubernetes cluster is crucial for capacity planning and infrastructure optimization. Metrics such as CPU usage, memory usage, disk I/O, and network bandwidth help you identify potential resource bottlenecks and ensure optimal allocation. Pod-Level Metrics: Pods are the basic units of deployment in Kubernetes. Monitoring metrics related to pods allows you to assess their resource consumption, health, and overall performance. Key pod-level metrics include CPU and memory usage, network throughput, and request success rates. Container-Level Metrics: Containers within pods encapsulate individual application components. Monitoring container-level metrics helps you understand the resource consumption and behavior of specific application services or processes. Metrics such as CPU usage, memory usage, and file system utilization offer insights into container performance. Application-Specific Metrics: Depending on your application's requirements, you may need to monitor custom metrics specific to your business logic or domain. These metrics could include transaction rates, error rates, cache hit ratios, or other relevant performance indicators. Metric-based monitoring architecture diagram 5. Use Custom Kubernetes Events for Enhanced Observability Custom events communicate between Kubernetes components and between Kubernetes and external systems. They can signal important events, such as deployments, scaling operations, configuration changes, or even application-specific events within your containers. By leveraging custom events, you can achieve several benefits in terms of observability: Proactive Monitoring: Custom events allow you to define and monitor specific conditions that require attention. For example, you can create events to indicate when resources are running low, when pods experience failures, or when specific thresholds are exceeded. By capturing these events, you can proactively detect and address issues before they escalate. Contextual Information: Custom events can include additional contextual information that helps troubleshoot and analyze root causes. You can attach relevant details, such as error messages, timestamps, affected resources, or any other metadata that provides insights into the event's significance. This additional context aids in understanding and resolving issues more effectively. Integration with External Systems: Kubernetes custom events can be consumed by external systems, such as monitoring platforms or incident management tools. Integrating these systems allows you to trigger automated responses or notifications based on specific events. This streamlines incident response processes and ensures the timely resolution of critical issues. To leverage custom Kubernetes events, you can use Kubernetes event hooks, custom controllers, or even develop your event-driven applications using the Kubernetes API. By defining event triggers, capturing relevant information, and reacting to events, you can establish a robust observability framework that complements traditional monitoring approaches. 6. Incorporating Synthetic Monitoring for Proactive Observability Synthetic monitoring simulates user journeys or specific transactions that represent everyday interactions with your application. These synthetic tests can be scheduled to run regularly from various geographic locations, mimicking user behavior and measuring key performance indicators. There are several key benefits to incorporating synthetic monitoring in your Kubernetes environment: Proactive Issue Detection: Synthetic tests allow you to detect issues before real users are affected. By regularly simulating user interactions, you can identify performance degradations, errors, or unresponsive components. This early detection enables you to address issues proactively and maintain high application availability. Performance Benchmarking: Synthetic monitoring provides a baseline for performance benchmarking and SLA compliance. You can measure response times, latency, and availability under normal conditions by running consistent tests from different locations. These benchmarks serve as a reference for detecting anomalies and ensuring optimal performance. Geographic Insights: Synthetic tests can be configured to run from different geographic locations, providing insights into the performance of your application from various regions. This helps identify latency issues or regional disparities that may impact user experience. By optimizing your application's performance based on these insights, you can ensure a consistent user experience globally. You can leverage specialized tools to incorporate synthetic monitoring into your Kubernetes environment. These tools offer capabilities for creating and scheduling synthetic tests, monitoring performance metrics, and generating reports. An approach for gaining Kubernetes observability for traditional and microservice-based applications is by using third-party tools like Datadog, Splunk, Middleware, and Dynatrace. This tool captures metrics and events, providing several out-of-the-box reports, charts, and alerts to save time. Wrapping Up This blog explored six practical strategies for achieving Kubernetes observability in hybrid cloud environments. By utilizing centralized logging and log aggregation, leveraging distributed tracing, integrating Kubernetes with APM solutions, adopting metrics-based monitoring, incorporating custom Kubernetes events, and synthetic monitoring, you can enhance your understanding of the behavior and performance of your Kubernetes deployments. Implementing these strategies will provide comprehensive insights into your distributed systems, enabling efficient troubleshooting, performance optimization, proactive issue detection, and improved user experience. Whether you are operating a small-scale Kubernetes environment or managing a complex hybrid cloud deployment, applying these strategies will contribute to the success and reliability of your applications.
Enterprises nowadays are keen on adopting a microservices architecture, given its agility and flexibility. Containers and the rise of Kubernetes — the go-to container orchestration tool — made the transformation from monolith to microservices easier for them. However, a new set of challenges emerged while using microservices architecture at scale: It became hard for DevOps and architects to manage traffic between services As microservices are deployed into multiple clusters and clouds, data goes out of the (firewall) perimeter and is vulnerable; security becomes a big issue Getting overall visibility into the network topology became a nightmare for SREs. Implementing new security tools, or tuning existing API gateway or Ingress controllers, is just a patchwork and not a complete solution to solve the above problems. What architects need is a radical implementation of their infrastructure to deal with their growing network, security, and observability challenges. And that is where the concept of service mesh comes in. What Is a Service Mesh? A service mesh decouples the communication between services from the application layer to the infrastructure layer. The abstraction at the infrastructure level happens by proxying the traffic between services (see Fig. A). Fig A — Service-to-service communication before and after service mesh implementation The proxy is deployed alongside the application as a sidecar container. The traffic that goes in and out of the service is intercepted by the proxy, and it provides advanced traffic management and security features. On top of it, service mesh provides observability into the overall network topology. In a service mesh architecture, the mesh of proxies is called the data plane, and the controller responsible for configuring and managing the data plane proxies is called the control plane. Why Do You Need a Service Mesh for Kubernetes? While starting off, most DevOps only have a handful of services to deal with. As the applications scale and the number of services increases, managing the network and security becomes complex. Tedious Security Compliance Applications deployed in multiple clusters from different cloud vendors talk to each other over the network. It is essential for such traffic to comply with certain standards to keep out intruders and to ensure secure communication. The problem is that security policies are typically cluster-local and do not work across cluster boundaries. This points to a need for a solution that can enforce consistent security policies across clusters. Chaotic Network Management DevOps engineers would often need to control the traffic flow to services — to perform canary deployments, for example. And they also would want to test the resiliency and reliability of the system by injecting faults and implementing circuit breakers. Achieving such kinds of granular controls over the network requires DevOps engineers to create a lot of configurations and scripting in Kubernetes and the cloud environment. Lack of Visualization Over the Network With applications distributed over a network and communications happening between them, it becomes hard for SREs to keep track of the health and performance of the network infrastructure. This severely impedes their ability to identify and troubleshoot network issues. Implementing service solves the above problems by providing features that make managing applications deployed to Kubernetes painless. Key Features of Service Mesh in Kubernetes Service mesh acts as a centralized platform for networking, security, and observability for microservices deployed into Kubernetes. Centralized Security With a service mesh, security compliance is easier to achieve as it can be done from a central plane instead of configuring it per service. A service mesh platform can enforce consistent security policies that work across cluster boundaries. Service mesh provides granular authentication, authorization, and access control for applications in the mesh. Authentication: mTLS implementation, JWT Authorization: Policies can be set to allow, deny, or perform custom actions against an incoming request Access control: RBAC policies that can be set on method, service, and namespace levels Advanced Networking and Resilience Testing Service mesh provides granular control over the traffic flow between services. DevOps engineers can split traffic between services or route them based on certain weights. Besides, service mesh provides the following features to test the resiliency of the infrastructure with little work: Fault injection Timeouts Retries Circuit breaking Mirroring Unified Observability Implementing service mesh helps SREs and Ops teams to have centralized visibility into the health and performance of applications within the mesh. Service mesh provides the following telemetry for observability and real-time visibility: Metrics: To monitor performance and see latency, traffic, errors, and saturation. Distributed tracing: To understand requests’ lifecycle and analyze service dependency and traffic flow. Access logs: To audit service behavior. Top Service Mesh Software To Consider for Kubernetes One may find various service mesh software such as Istio, Linkerd, HashiCorp Consul, Kong KUMA, Google Anthos (built on Istio), VMware Tanzu, etc., in the market. However, over 90% of the users either use Istio or Linkerd service mesh software because of their strong and vibrant open-source ecosystem for innovation and support. Istio Istio is the most popular, CNCF-graduated open-source service mesh software available. It uses Envoy proxy as sidecars, while the control plane is used to manage and configure them (see Fig.B). Fig B — Istio sidecar architecture Istio provides networking, security, and observability features for applications at scale. Developers from Google, Microsoft, IBM, and others actively contribute to the Istio project. Linkerd Linkerd is a lightweight, open-source service mesh software developed by Buoyant. It provides the basic features of a service mesh and has a destination service, identity service, and proxy injector (see Fig.C). Fig C — Linkerd architecture More than 80% of the contributions to Linkerd are by the founders, Buoyant, itself. (To see a detailed comparison between the two and choose one for your Kubernetes deployments, head to Istio vs Linkerd: The Best Service Mesh for 2023.) Benefits of Service Mesh for Kubernetes Below are some benefits enterprises would reap by implementing service mesh in Kubernetes. 100% Network and Data Security Service mesh helps in maintaining a zero-trust network where requests are constantly authenticated and authorized before processing. DevOps engineers can implement features such as mTLS, which works cluster-wide and across cluster boundaries (see Fig.D). Fig D — Zero trust network implementation A zero-trust network helps in maintaining a secure infrastructure in the current dynamic threat landscape filled with attacks, like man-in-the-middle (MITM) and denial of service (DoS). (If you are interested to learn more, check out this article: Zero Trust Network for Microservices with Istio.) 80% Reduction in Change Failure Rate Nowadays, enterprises release applications to a small set of live users before a complete rollout. It helps DevOps and SREs to analyze the application performance, identify any bugs, and thus avoid potential downtime.Canary and blue/green deployments are two such deployment strategies. The fine-grained traffic controls — including splitting traffic based on weights (see Fig.F) — provided by service mesh make it easier for DevOps engineers to perform them. Fig F — Canary deployment with Istio service mesh 99.99% Available and Resilient Infrastructure The telemetry data provided by service mesh software helps SREs and Ops teams to identify and respond to bugs/threats quickly. Most service mesh software integrates with monitoring tools like Prometheus, Grafana, Kiali, Jaeger, etc., and the dashboard (see Fig.G) provided by them helps operators visualize the health, performance, and behavior of the services. Fig G — Kiali service graph 5X Improvement in Developer Experience Most application developers do not enjoy configuring the network and security logic in their applications. Their focus tends to be on business logic and building features. Implementing service mesh reduces developer toil as they are left alone with the application code. They can offload network and security configurations completely to the service mesh at the infrastructure level. The separation helps developers focus on their core responsibilities, i.e., delivering the business logic. Three Pillars for Successful Implementation of Service Mesh Since service mesh is a radical concept, it can be overwhelming for enterprises to implement and realize its value successfully. If you are an architect or CIO, you would want to consider the following three pillars for successful service mesh implementation. 1. Technology Support It is important to evaluate a service mesh software from the technology support perspective. If you are a mature DevOps organization using various open source and open standards in your CI/CD process, ensure that service mesh software integrates well with your CI/CD tools (of whatever versions). For example, if you are using Argo CD for GitOps deployment, or Prometheus for monitoring, then a service mesh software must be able to integrate with less intervention. 2. Enterprise Support Open-source software adoption is on the rise. But the support for software will be a prime necessity for enterprises to make sure their IT is available for the business. Evaluate a service mesh software that is backed by a large community member (very good for support), and also, there are 3rd party vendor ecosystems that can provide 24*7 support with fixed SLA. 3. Training and Onboarding Support Ensure there are adequate reading materials, documents, and videos available, which will supplement the learning of users of service mesh software because it would not make sense if internal employees such as DevOps and SREs are not able to adopt it. Finally, a service mesh is not just software but is an application operation pattern. Do not hasten your project. Rather, research and evaluate the best service mesh that suits your organizational requirements and needs. Service Mesh Is the Way Forward for Kubernetes Workloads The goal of service mesh implementation is to make managing applications deployed in Kubernetes easier. Adopting it will become a necessity as services start to overflow from a single cluster to multiple clusters. With Kubernetes adoption on the rise, service mesh will eventually become a critical component in most organizations.
Spring Cloud is a versatile framework for building Java applications in various cloud environments. Today, we'll explore how to use two components of the framework - Spring Cloud Gateway and Discovery Service (aka Spring Cloud Netflix) - for easy routing of user requests between your Java microservices. We'll build two microservices, register them with a Discovery Service instance, and use the Cloud Gateway for routing requests to a specific microservice instance. The cool thing is that the Cloud Gateway will also be registered with the Discovery Service and will use the latter to resolve a microservice name into an actual connection endpoint. So, whether you prefer reading or watching, let’s walk through this practical example: Creating Sample Microservices Imagine we’re creating an online service for a pizza company. There are two basic capabilities the service needs to support - customers can order a pizza online and then track the order status. To achieve this, let's introduce two microservices - the Kitchen and the Tracker. Kitchen Microservice The Kitchen microservice allows customers to place pizza orders. Once an order is placed, it'll hit the kitchen, and the chef will start cooking. Let's create a basic implementation for the purpose of testing Spring Cloud Gateway with the Discovery Service. This service is a Spring Boot web application with a REST controller that simply acknowledges an order. Java @RestController @RequestMapping("/kitchen") public class KitchenController { @PostMapping("/order") public ResponseEntity<String> addNewOrder(@RequestParam("id") int id) { return ResponseEntity.ok("The order has been placed!"); } } The service will be listening on port 8081, which is set in the application.properties file: Properties files server.port=8081 Once the microservice is started you can use curl or HTTPie to test that the REST endpoint works. We’ll be using HTTPie throughout the article: Shell http POST localhost:8081/kitchen/order id==1 HTTP/1.1 200 Connection: keep-alive Content-Length: 26 Content-Type: text/plain;charset=UTF-8 Date: Thu, 03 Aug 2023 18:45:26 GMT Keep-Alive: timeout=60 The order has been placed! Tracker Microservice Customers use the second microservice, the Tracker, to check their order status. We'll go the extra mile with this service implementation by supporting several order statuses, including ordered, baking, and delivering. Our mock implementation will randomly select one of these statuses: Java @RestController @RequestMapping("/tracker") public class TrackerController { @GetMapping("/status") public ResponseEntity<String> getOrderStatus(@RequestParam("id") int orderId) { String[] status = { "Ordered", "Baking", "Delivering" }; Random rand = new Random(); return ResponseEntity.ok(status[rand.nextInt(status.length)]); } } The Tracker will be listening on port 8082, which is configured in the application.properties file: Properties files server.port=8082 Once the microservice is started, we can test it by sending the following GET request: Shell http GET localhost:8082/tracker/status id==1 HTTP/1.1 200 Connection: keep-alive Content-Length: 10 Content-Type: text/plain;charset=UTF-8 Date: Thu, 03 Aug 2023 18:52:45 GMT Keep-Alive: timeout=60 Delivering Registering Microservices With Spring Cloud Discovery Service Our next step is to register these two microservices with the Spring Cloud Discovery Service. But what exactly is a Discovery Service? Discovery Service The Discovery Service lets your microservices connect to each other using only their names. For instance, if Tracker needs to connect to Kitchen, the Discovery Service gives Tracker the IP addresses of Kitchen's available instances. This list can change - you can add or remove Kitchen instances as needed, and the Discovery Service always keeps the updated list of active endpoints. There are several ways to start a Discovery Service server instance. One of the options is to use the Spring Initializr website to generate a Spring Boot project with the Eureka Server dependency. If you choose that method, the generated project will come with the following class that initiates a server instance of the Discovery Service: Java @SpringBootApplication @EnableEurekaServer public class DiscoveryServerApplication { public static void main(String[] args) { SpringApplication.run(DiscoveryServerApplication.class, args); } } By default, the server listens on port 8761. So, once we start the server, we can visit localhost:8761 to view the Discovery Service dashboard: Currently, the Discovery Service is running, but no microservices are registered with it yet. Now, it's time to register our Kitchen and Tracker microservices. Update the Kitchen Microservice To register the Kitchen service with the Discovery Service, we need to make the following changes: 1. Add the Discovery Service’s client library to the Kitchen’s pom.xml file: XML <dependency> <groupId>org.springframework.cloud</groupId> <artifactId>spring-cloud-starter-netflix-eureka-client</artifactId> </dependency> 2. Annotate the Kitchen’s application class with @EnableDiscoveryClient: Java @SpringBootApplication @EnableDiscoveryClient public class KitchenApplication { public static void main(String[] args) { SpringApplication.run(KitchenApplication.class, args); } } 3. Update the application.properties file by adding these two parameters: Properties files # The microservice will be registered under this name # with the Discovery Service spring.application.name=kitchen-service # Discovery Service address eureka.client.service-url.defaultZone=http://localhost:8761/eureka Update the Tracker Microservice We need to follow the same steps to update the Tracker microservice. There's only one difference for Tracker: its name, which is provided via the spring.application.name property in the application.properties file: Properties files # The microservice will be registered under this name with the Discovery Service spring.application.name=tracker-service Register Microservices Finally, restart the microservices to confirm their registration with the Discovery Service. As expected, both the Kitchen and Tracker microservices successfully register with the Discovery Service! Now, it's time to focus on the Spring Cloud Gateway. Spring Cloud Gateway Spring Cloud Gateway is used to resolve user requests and forward them to the appropriate microservices or API endpoints for further processing. You can generate a Spring Boot project with Cloud Gateway support using the same Spring Initializr website. Simply add the Gateway and Eureka Discovery Client libraries, then click the generate button: The Gateway will serve as a one-stop solution for directing requests to Kitchen and Tracker instances. Its implementation is simple yet powerful: Java @SpringBootApplication @EnableDiscoveryClient public class ApiGatewayApplication { public static void main(String[] args) { SpringApplication.run(ApiGatewayApplication.class, args); } @Bean public RouteLocator routeLocator(RouteLocatorBuilder builder) { return builder.routes() .route("kitchen-route", r -> r.path("/kitchen/**").uri("lb://kitchen-service")) .route("tracker-route", r -> r.path("/tracker/**").uri("lb://tracker-service")) .build(); } } The Gateway supports two routes: The kitchen-route is for all requests beginning with the /kitchen/** path. The tracker-route is for requests starting with the /tracker/** path. The most interesting part of this route configuration is how we define the destination endpoint (the uri(...) part of the configuration). Each destination starts with lb:, followed by a microservice name. When the Gateway encounters such a destination URL, it will use the Discovery Service to resolve a microservice name into an IP address of a microservice instance and establish a connection to it. Furthermore, lb stands for load balancer, which means the Gateway will distribute requests evenly if the Discovery Service returns several instances of the same microservice. Finally, once we initiate the Gateway, it begins to monitor port 8080 for incoming traffic. We can then use the following HTTP requests to confirm that the Gateway is successfully routing our requests to the Kitchen and Tracker microservices! Shell http POST localhost:8080/kitchen/order id==2 HTTP/1.1 200 OK Content-Length: 26 Content-Type: text/plain;charset=UTF-8 Date: Thu, 03 Aug 2023 20:11:26 GMT The order has been placed! http GET localhost:8080/tracker/status id==2 HTTP/1.1 200 OK Content-Length: 6 Content-Type: text/plain;charset=UTF-8 Date: Thu, 03 Aug 2023 20:11:41 GMT Baking Summary With Spring Cloud, building robust, scalable Java applications for the cloud has never been easier. Say goodbye to the hassles of tracking IP addresses and instances, and say hello to intelligent, effortless request routing. Enjoy!
Wordle took the internet by storm after its release in late 2021. For many, it’s still a morning ritual that pairs seamlessly with a cup of coffee and the start of a work day. As a DevOps engineer, is there a single better way to warm up your mind other than puzzling out a Docker Compose file and then indulging in the world’s favorite word game? Well, the jury’s still out on that one, but this tutorial can let you see for yourself. Why Write a Docker Compose File? Even in an application with a single Dockerfile, a Docker Compose file can be a useful asset. Working from a Dockerfile often requires lengthy build and run commands, which can be migrated into a Compose file. This way, you aren't copying and pasting complex commands on every new build. Instead, your entire application builds and runs with just docker compose up. This is even more valuable when using an application with multiple Dockerfiles: you no longer need to individually build and run each Dockerfile. There are still countless apps that are Dockerized, but lack Compose files. When working on Shipyard's Docker Compose Community Spotlight Series, I specifically looked for apps that came pre-packaged with Compose files. This was to emphasize some of the cool things you can do with Docker Compose, as well as to show how easy Compose makes app development. However, when it comes to writing a Compose file from scratch, it's easy to get intimidated by aspects such as container networking, volume mounts, or getting a service definition correct. If you're new to Docker Compose, there is nothing to worry about: most of your first Compose file will resemble your Docker build and run commands. Using Compose To Run a Dockerized Wordle Clone There’s an excellent open-source React-based Wordle clone on GitHub. It has approximately one hundred contributors, and over two thousand users have forked it to put their own spin on the modern classic web game. This repo comes equipped with a Dockerfile, allowing you to run it in a container on your local machine. Try out the demo here. It'll take us a matter of minutes to get it up and running with Docker Compose. Step 1: Fork React-Wordle From GitHub Start out by forking the react-wordle repo from GitHub to your local machine. I created a branch based on main called add-docker-compose so I can make multiple commits without cluttering my main branch's git log. The repo provides the following Docker commands to build and run the image: Shell docker build -t reactle:dev -f docker/Dockerfile . docker run -d -p 3000:3000 --name reactle-dev reactle:dev We'll use these commands to populate our Docker Compose file in the next step. Step 2: Craft a Compose File We can get this repo deployed with the addition of a simple, single-service Docker Compose file. Open your text editor or IDE of choice and create a docker-compose.yaml file in your forked app’s root directory. First, let’s set the Compose version and define a service based off of our single Dockerfile, which we’ll call reactle: YAML version: '3.8' services: reactle: Now we’ll want to build from the existing Dockerfile. In this repo, it's stored in the docker directory, so we’ll include this path in our Compose definition. Since all files required for this app are stored immediately in the root directory, we'll set our build context to the app's root. I set the container’s port to 3000, which is standard for development. YAML version: '3.8' services: reactle: build: context: . dockerfile: docker/Dockerfile ports: - '3000:3000' This app’s resources live in a couple of directories and files, as specified by the Dockerfile. We can list their paths under the volumes label so the container can access them. Each volume is formatted as the path within the repo (./src) followed by a colon and then the corresponding mount point within the container (/app/src). YAML version: '3.8' services: reactle: build: context: . dockerfile: docker/Dockerfile ports: - '3000:3000' volumes: - './src:/app/src' - './public:/app/public' - './package-lock.json:/app/package-lock.json' - './package.json:/app/package.json' If we want to make this app Shipyard-compatible, we just need to add one more label to the Compose file: YAML version: '3.8' services: reactle: build: context: . dockerfile: docker/Dockerfile labels: shipyard.route: `/` ports: - '3000:3000' volumes: - './src:/app/src' - './public:/app/public' - './package-lock.json:/app/package-lock.json' - './package.json:/app/package.json' And there we have it: a complete Docker Compose file equipped to run our Wordle clone! I'll open a PR on the react-wordle repo with this new file. Step 3: Running Our App Now that we’ve done all the hard work, we can head on over to the terminal, navigate to the app’s root directory, and run the docker compose up command. Compose will provide a link to the running application, which we can access from the browser. …And Enjoy! Now you can harness the power of Docker Compose to manage your fully-functional Wordle clone! The possibilities are endless — you can customize Wordle to your liking, contribute to the react-wordle repo, host your own Wordle variant online, and share links to your creations with friends and colleagues. For now, you can maybe just sit back, relax, and solve today’s Wordle.
Boris Zaikin
Lead Solution Architect,
CloudAstro GmBH
Ranga Karanam
Best Selling Instructor on Udemy with 1 MILLION Students,
in28Minutes.com
Samir Behara
Senior Cloud Infrastructure Architect,
AWS
Pratik Prakash
Master Software Engineer (SDE-IV),
Capital One