Celebrate a decade of Kubernetes. Explore why K8s continues to be one of the most prolific open-source systems in the SDLC.
With the guidance of FinOps experts, learn how to optimize AWS containers for performance and cost efficiency.
The final step in the SDLC, and arguably the most crucial, is the testing, deployment, and maintenance of development environments and applications. DZone's category for these SDLC stages serves as the pinnacle of application planning, design, and coding. The Zones in this category offer invaluable insights to help developers test, observe, deliver, deploy, and maintain their development and production environments.
In the SDLC, deployment is the final lever that must be pulled to make an application or system ready for use. Whether it's a bug fix or new release, the deployment phase is the culminating event to see how something works in production. This Zone covers resources on all developers’ deployment necessities, including configuration management, pull requests, version control, package managers, and more.
The cultural movement that is DevOps — which, in short, encourages close collaboration among developers, IT operations, and system admins — also encompasses a set of tools, techniques, and practices. As part of DevOps, the CI/CD process incorporates automation into the SDLC, allowing teams to integrate and deliver incremental changes iteratively and at a quicker pace. Together, these human- and technology-oriented elements enable smooth, fast, and quality software releases. This Zone is your go-to source on all things DevOps and CI/CD (end to end!).
A developer's work is never truly finished once a feature or change is deployed. There is always a need for constant maintenance to ensure that a product or application continues to run as it should and is configured to scale. This Zone focuses on all your maintenance must-haves — from ensuring that your infrastructure is set up to manage various loads and improving software and data quality to tackling incident management, quality assurance, and more.
Modern systems span numerous architectures and technologies and are becoming exponentially more modular, dynamic, and distributed in nature. These complexities also pose new challenges for developers and SRE teams that are charged with ensuring the availability, reliability, and successful performance of their systems and infrastructure. Here, you will find resources about the tools, skills, and practices to implement for a strategic, holistic approach to system-wide observability and application monitoring.
The Testing, Tools, and Frameworks Zone encapsulates one of the final stages of the SDLC as it ensures that your application and/or environment is ready for deployment. From walking you through the tools and frameworks tailored to your specific development needs to leveraging testing practices to evaluate and verify that your product or application does what it is required to do, this Zone covers everything you need to set yourself up for success.
DevOps
The DevOps movement has paved the way for CI/CD and streamlined application delivery and release orchestration. These nuanced methodologies have not only increased the scale and speed at which we release software, but also redistributed responsibilities onto the developer and led to innovation and automation throughout the SDLC.DZone's 2023 DevOps: CI/CD, Application Delivery, and Release Orchestration Trend Report explores these derivatives of DevOps by diving into how AIOps and MLOps practices affect CI/CD, the proper way to build an effective CI/CD pipeline, strategies for source code management and branching for GitOps and CI/CD, and more. Our research builds on previous years with its focus on the challenges of CI/CD, a responsibility assessment, and the impact of release strategies, to name a few. The goal of this Trend Report is to provide developers with the information they need to further innovate on their integration and delivery pipelines.
Cloud Build Unleashed: Expert Techniques for CI/CD Optimization
Getting Started With OpenTelemetry
Continuous Delivery is a practice and methodology that helps you build and deploy your software faster so that it can be released to production systems at any time. It facilitates shortening the lifecycle times of various development and operations processes. Effectively applying the concepts of Continuous Integration (CI) and Continuous Deployment (CD) helps achieve the benefits of the continuous delivery principles, also enabling faster software releases. We explore the challenges encountered by software teams implementing CI/CD and demonstrate how feature flags can help mitigate these risks. Introduction To CI/CD CI/CD ensures that the development teams frequently integrate their code changes into the main product. CI involves frequent integration of code into a shared repository with automated testing to catch issues early, while CD extends this by automating deployments for reliable and frequent releases. However, teams face challenges such as complex tool integration, maintaining extensive automated tests, ensuring environment consistency, and overcoming cultural resistance. Mitigating Continuous Delivery Challenges with Feature Flags Technical Challenges Complex Merging and Integration Issues Challenge: Frequent changes to the code can cause merge conflicts, which makes it challenging to smoothly integrate different branches of the project. Solution with feature flags: Feature flags allow new features to be integrated into the main branch while still being hidden from users. This approach helps reduce the need for long-lived branches and minimizes merge conflicts because the code can be merged more frequently. Testing Bottlenecks Challenge: As the codebase expands, it can become increasingly challenging to ensure thorough test coverage and keep automated test suites up to date. Solution with feature flags: Feature flags let you test new features in a live production environment without exposing them to all users. This allows for more thorough real-world testing and gradual rollouts, reducing the pressure on automated test suites. Environment Consistency Challenge: Maintaining consistency across different deployment environments can be challenging, often resulting in configuration drift and potential issues during deployment. Solution with feature flags: Feature flags can be used to manage environment-specific configurations, to make sure that features behave consistently across environments by toggling them as needed. Deployment Failures Challenge: Managing failed deployments gracefully and implementing a rollback strategy is essential to keep the system stable. Solution with feature flags: Feature flags provide a quick way to disable troublesome features without needing to roll back the entire deployment. This helps reduce downtime and enables quick recovery from deployment issues. Tooling and Infrastructure Challenge: Choosing and setting up the right CI/CD tools, as well as maintaining the CI/CD infrastructure, can be complicated and require a lot of resources. Solution with feature flags: Feature flags can reduce the dependency on complex infrastructure by enabling gradual rollouts and testing in production, which can reduce the reliance on CI/CD tools and infrastructure. Organizational Challenges Cultural Resistance Challenge: Overcoming resistance to change and fostering a culture of continuous improvement can be difficult. Solution with feature flags: Feature flags promote a culture of experimentation and continuous delivery by allowing teams to release features incrementally and gather feedback early, demonstrating the benefits of agile practices. Skill Gaps Challenge: Training team members on CI/CD best practices and keeping up with the latest technologies Solution with feature flags: Feature flags offer gradual rollout and rollback options, acting as a safety net that lets teams slowly and safely adopt new practices and technologies. Process-Related Challenges Defining Effective Pipelines Challenge: Designing and continuously optimizing efficient CI/CD pipelines Solution with feature flags: Feature flags simplifies pipeline design by decoupling deployment from release, leading to simpler, faster pipelines with fewer dependencies and less complexity. Maintaining High Velocity Challenge: Balancing the speed of delivery with quality and stability Solution with feature flags: Feature flags help deliver features quickly by allowing features to be deployed in a controlled, ensuring both high quality and stability while keeping up the pace. Continuous Monitoring and Feedback Monitoring and Observability Challenge: Implementing effective monitoring and observability practices to quickly detect and resolve issues Solution with feature flags: Feature flags can be monitored and toggled based on their performance metrics and user feedback, allowing for quick response to issues and keeping the system reliable. Feedback Loops Challenge: Establishing rapid feedback loops from production to continuously improve Solution with feature flags: Feature flags allow for A/B testing and controlled rollouts, giving valuable feedback on new features and enabling continuous improvement based on real user data. Best Practices for Using Feature Flags With CI/CD Pipelines Integrating a centralized feature flag management system into CI/CD pipelines can significantly enhance deployment processes. Here are the few best practices: Choose a feature flag management system that integrates well with your CI/CD tools and workflows. It would be beneficial if the feature flag system supports workflows as the deployment process involves workflow management for change requests. Use consistent and descriptive names for feature flags to avoid confusion. Establish clear processes for creating, updating, and retiring feature flags. Use CI/CD pipeline scripts or APIs provided by the feature flag management system to automate the creation, modification, and deletion of feature flags. Introduce feature flags at the beginning of the development lifecycle. Use feature flags to perform canary releases and gradual rollouts, starting with a small subset of users and expanding gradually. Track feature flag usage, performance, and impact on system metrics. Integrate feature flag data with monitoring and analytics tools to gain insights and make informed decisions. Implement role-based access control (RBAC) to restrict who can create, modify, or delete feature flags. Include feature flags in your automated testing processes. Configure feature flags differently for development, testing, staging, and production environments. Utilize secret-type support within the feature flag storage to securely store all sensitive configuration data used in the pipelines. Feature Toggle Management Tools There are several feature flag management systems that can be integrated with CI/CD pipelines to enhance the deployment process. Here are some options: IBM Cloud App Configuration: IBM Cloud App Configuration is a centralized feature management and configuration service available on IBM Cloud for use with web and mobile applications, microservices, and distributed environments. This has a native integration with IBM Cloud Continuous delivery toolchains. LaunchDarkly: A feature flag management tool that allows you to control the release of new features and changes using feature flags; integrates with popular CI/CD tools like Jenkins, CircleCI, and GitLab Unleash: An open-source feature flag management system that provides flexibility for custom integrations; this works well with CI/CD tools such as Jenkins, GitHub Actions, and GitLab CI. Optimizely: A feature flagging and experimentation platform that focuses on A/B testing and performance optimization; supports integrations with CI/CD tools such as Jenkins, CircleCI, and GitHub Actions FeatureHub: An open-source feature management service that can be integrated with CI/CD tools such as Jenkins and GitHub Actions Conclusion Feature flags have become a powerful tool for continuous delivery processes. By weaving feature flags into CI/CD pipelines, development teams can enjoy greater control, flexibility, and safety in their deployments. When you embrace feature flags not just in development but also throughout the deployment process, you pave the way for smoother releases, happier users, and a more dynamic approach to software development and delivery.
With AI, the domain of software development is experiencing a breakthrough phase with the continuous integration of state-of-the-art Large Language Models like GPT-4 and Claude Opus. These models extend beyond the role of traditional developer tools to directly assist developers in translating verbal instructions into executable code across a variety of programming languages, which speeds up the process of coding. Code Generation Enhancing Developer Productivity LLMs understand context and generate best-practice pieces of code, making them very good at enhancing the productivity of developers and their future research. They work as a developer on-call assistant, offering insight and alternatives that may even elude more experienced programmers. Such a role gains a lot of importance in large and complex projects where the integration of different software modules might introduce subtle, sometimes undetectable bugs. Training and Adaptation Continuous improvements in LLMs will be realized through the feedback loops from their real-world use, wherein models will be trained according to the corrections and suggestions of the developers. Continuous training brings models closer to specific industry needs, further entrenching them in the core of software development processes. Debugging and Bug Fixing With AI Innovative Tools for Enhanced Accuracy LLM integration into debugging and bug fixing is a radical change. Tools like Meta's SapFix and Microsoft's InferFix automatically detect and fix errors, hence saving time in workflows and reducing downtime. Such systems are designed to be neatly plugged into the existing CI/CD pipelines, providing real-time feedback without interrupting the flow of development. Capabilities to scan millions of lines of code - these AI-enhanced tools reduce the error rates significantly by catching bugs at their early stages. This proactive detection of bugs will definitely help in maintaining the health of the codebase and ensuring bugs are resolved before they turn into major problems. Customized Solutions This flexibility, however, is what enables LLMs to fit into the needs of a given project. Whether matching different coding standards or particular programming languages, these models can be versatile instruments in the arsenal of a developer that can be trained to suit very granular needs. Seamless CI/CD Integration AI: The Catalyst for Reliable Deployments LLMs are fast becoming a staple in CI/CD ecosystems and further improve the reliability of deployments. They automate code reviews and quality checks that ensure only very stable versions of applications make it to deployment. This raises the pace of deployment, raising the quality of software products overall. Continuous Learning and Improvement This means that the integration of LLMs into CI/CD processes is not a one-time setup but part of a continuous improvement strategy. These models learn with every deployment and hence become efficient over time to reduce the chances of deployment failures. Closing the Gap Between Dev and Ops By providing more homogeneous outputs of code and automating routine checks, LLMs bridge the traditional gap between development and operations teams. That is a very important synergy in terms of modern DevOps practices, trying to create a more collaborative and efficient environment. Future Impact and Market Adoption of Large Language Models in Software Development The future of software development is inherently tied to the advances made with LLMs. The more they develop, the more they are going to change the roles within software teams and eventually alter processes, like Agile or Scrum, which now dominate. The ability of LLMs to work as a development and abstraction tool both instills the probability of increased productivity. This will lead to the completion of projects much faster and enable companies to deliver software products faster. Market Adoption and Economic Implications The potential of LLMs to impact software development economically is huge. Such advances in technologies, if adopted by companies, would lead to much higher productivity levels, which can result in cost savings in the software development and maintenance process. For instance, GitHub Copilot, when integrated into the development environment, will help to come up with code snippets and automate literal translation, thus considerably reducing the time a developer can take to perform these tasks. Moreover, with the capabilities of generating test cases and debugging, LLMs also reduce the resource requirements of these processes which are time-consuming but important. Reshaping the Workforce The nature of the workforce in the tech industry is also going to change as LLMs are integrated. Since these models are going to find themselves more and more engaged in routine and repetitive tasks, the nature of the work going to be done by a software developer will move toward being creative and problem-solving. This will mean that developers should re-skill themselves to amplify their competencies in machine learning, data science, and understanding AI-driven tooling. The tasks in software development will expand to include more problem-solving and critical thinking using strategic decision-making as coding becomes more resolutions through the LLMs. Conclusion LLMs are no longer just tools; they are becoming an integral part of software development. Their impact on productivity, economic outcomes, and the nature of work in the tech industry is promising. Successful integration requires careful planning and continuous learning to adapt to these ever-evolving technologies.
My demo of OpenTelemetry Tracing features two Spring Boot components. One uses the Java agent, and I noticed a different behavior when I recently upgraded it from v1.x to v2.x. In the other one, I'm using Micrometer Tracing because I compile to GraalVM native, and it can't process Java agents. I want to compare these three different ways in this post: Java agent v1, Java agent v2, and Micrometer Tracing. The Base Application and Its Infrastructure I'll use the same base application: a simple Spring Boot application, coded in Kotlin. It offers a single endpoint. The function beyond the endpoint is named entry(). It calls another function named intermediate(). The latter uses a WebClient instance, the replacement of RestTemplate, to make a call to the above endpoint. To avoid infinite looping, I pass a custom request header: if the entry() function finds it, it doesn't proceed further. It translates into the following code: Kotlin @SpringBootApplication class Agent1xApplication @RestController class MicrometerController { private val logger = LoggerFactory.getLogger(MicrometerController::class.java) @GetMapping("/{message}") fun entry(@PathVariable message: String, @RequestHeader("X-done") done: String?) { logger.info("entry: $message") if (done == null) intermediate() } fun intermediate() { logger.info("intermediate") RestClient.builder() .baseUrl("http://localhost:8080/done") .build() .get() .header("X-done", "true") .retrieve() .toBodilessEntity() } } For every setup, I'll check two stages: the primary stage, with OpenTelemetry enabled, and a customization stage to create additional internal spans. Micrometer Tracing Micrometer Tracing stems from Micrometer, a "vendor-neutral application observability facade." Micrometer Tracing provides a simple facade for the most popular tracer libraries, letting you instrument your JVM-based application code without vendor lock-in. It is designed to add little to no overhead to your tracing collection activity while maximizing the portability of your tracing effort. - Micrometer Tracing site To start with Micrometer Tracing, one needs to add a few dependencies: Spring Boot Actuator, org.springframework.boot:spring-boot-starter-actuator Micrometer Tracing itself, io.micrometer:micrometer-tracing A "bridge" to the target tracing backend API; In my case, it's OpenTelemetry, hence io.micrometer:micrometer-tracing-bridge-otel A concrete exporter to the backend, io.opentelemetry:opentelemetry-exporter-otlp We don't need a BOM because versions are already defined in the Spring Boot parent. Yet, we need two runtime configuration parameters: where should the traces be sent, and what is the component's name. They are governed by the MANAGEMENT_OTLP_TRACING_ENDPOINT and SPRING_APPLICATION_NAME variables. YAML services: jaeger: image: jaegertracing/all-in-one:1.55 environment: - COLLECTOR_OTLP_ENABLED=true #1 ports: - "16686:16686" micrometer-tracing: build: dockerfile: Dockerfile-micrometer environment: MANAGEMENT_OTLP_TRACING_ENDPOINT: http://jaeger:4318/v1/traces #2 SPRING_APPLICATION_NAME: micrometer-tracing #3 Enable the OpenTelemetry collector for Jaeger. Full URL to the Jaeger OpenTelemetry gRPC endpoint. Set the OpenTelemetry's service name. Here's the result: Without any customization, Micrometer creates spans when receiving and sending HTTP requests. The framework needs to inject magic into the RestClient for sending. We must let the former instantiate the latter for that: Kotlin class MicrometerTracingApplication { @Bean fun restClient(builder: RestClient.Builder) = builder.baseUrl("http://localhost:8080/done").build() } We can create manual spans in several ways, one via the OpenTelemetry API itself. However, the setup requires a lot of boilerplate code. The most straightforward way is the Micrometer's Observation API. Its main benefit is to use a single API that manages both metrics and traces. Here's the updated code: Kotlin class MicrometerController( private val restClient: RestClient, private val registry: ObservationRegistry ) { @GetMapping("/{message}") fun entry(@PathVariable message: String, @RequestHeader("X-done") done: String?) { logger.info("entry: $message") val observation = Observation.start("entry", registry) if (done == null) intermediate(observation) observation.stop() } fun intermediate(parent: Observation) { logger.info("intermediate") val observation = Observation.createNotStarted("intermediate", registry) .parentObservation(parent) .start() restClient.get() .header("X-done", "true") .retrieve() .toBodilessEntity() observation.stop() } } The added observation calls reflect upon the generated traces: OpenTelemetry Agent v1 An alternative to Micrometer Tracing is the generic OpenTelemetry Java Agent. Its main benefit is that it impacts neither the code nor the developers; the agent is a pure runtime-scoped concern. Shell java -javaagent:opentelemetry-javaagent.jar agent-one-1.0-SNAPSHOT.jar The agent abides by OpenTelemetry's configuration with environment variables: YAML services: agent-1x: build: dockerfile: Dockerfile-agent1 environment: OTEL_EXPORTER_OTLP_ENDPOINT: http://jaeger:4317 #1 OTEL_RESOURCE_ATTRIBUTES: service.name=agent-1x #2 OTEL_METRICS_EXPORTER: none #3 OTEL_LOGS_EXPORTER: none #4 ports: - "8081:8080" Set the protocol, the domain, and the port. The library appends /v1/traces. Set the OpenTelemetry's service name. Export neither the metrics nor the logs. With no more configuration, we get the following traces: The agent automatically tracks requests, both received and sent, as well as functions marked with Spring-related annotations. Traces are correctly nested inside each other, according to the call stack. To trace additional functions, we need to add a dependency to our codebase, io.opentelemetry.instrumentation:opentelemetry-instrumentation-annotations. We can now annotate previously untraced functions with the @WithSpan annotation. The value() part governs the trace's label, while the kind translates as a span.kind attribute. If the value is set to an empty string, which is the default, it outputs the function's name. For my purposes, default values are good enough. Kotlin @WithSpan fun intermediate() { logger.info("intermediate") RestClient.builder() .baseUrl("http://localhost:8080/done") .build() .get() .header("X-done", "true") .retrieve() .toBodilessEntity() } It yields the expected new intermediate() trace: OpenTelemetry Agent v2 OpenTelemetry released a new major version of the agent in January of this year. I updated my demo with it. Traces are now only created when the app receives and sends requests. As for the previous version, we can add traces with the @WithSpan annotation. The only difference is that we must also annotate the entry() function. It's not traced by default. Discussion Spring became successful for two reasons: it simplified complex solutions, i.e., EJBs 2, and provided an abstraction layer over competing libraries. Micrometer Tracing started as an abstraction layer over Zipkin and Jaeger, and it made total sense. This argument becomes moot with OpenTelemetry being supported by most libraries across programming languages and trace collectors. The Observation API is still a considerable benefit of Micrometer Tracing, as it uses a single API over Metrics and Traces. On the Java Agent side, OpenTelemetry configuration is similar across all tech stacks and libraries - environment variables. I was a bit disappointed when I upgraded from v1 to v2, as the new agent is not Spring-aware: Spring-annotated functions are not traced by default. In the end, it's a wise decision. It's much better to be explicit about the spans you want than remove some you don't want to see. Thanks to Jonatan Ivanov for his help and his review. The complete source code for this post can be found on GitHub. To Go Further OpenTelemetry Traces OpenTelemetry Java integration OpenTelemetry Java examples Distributed Tracing with Spring Boot 3 — Micrometer vs OpenTelemetry Observability With Spring Boot 3
As enterprises mature in their CI/CD journey, they tend to ship code faster, safely, and securely. One essential strategy the DevOps team applies is releasing code progressively to production, also known as canary deployment. Canary deployment is a bulletproof mechanism that safely releases application changes and provides flexibility for business experiments. It can be implemented using open-source software like Argo Rollouts and Flagger. However, advanced DevOps teams want to gain granular control over their traffic and pod scaling while performing canary deployment to reduce overall costs. Many enterprises achieve advanced traffic management of canary deployment at scale using open-source Istio service mesh. We want to share our knowledge with the DevOps community through this blog. Before we get started, let us discuss the canary architecture implemented by Argo Rollouts and Istio. Recap of Canary Implementation Architecture With Argo Rollouts and Istio If you use Istio service mesh, all of your meshed workloads will have an Envoy proxy sidecar attached to the application container in the pod. You can have an API or Istio ingress gateway to receive incoming traffic from outside. In such a case, you can use Argo Rollouts to handle canary deployment. Argo Rollouts provides a CRD called Rollout to implement the canary deployment, which is similar to a Deployment object and responsible for creating, scaling, and deleting ReplicaSets in K8s. The canary deployment strategy starts by redirecting a small amount of traffic (5%) to the newly deployed app. Based on specific criteria, such as optimized resource utilization of new canary pods, you can gradually increase the traffic to 100%. The Istio sidecar handles the traffic for the baseline and canary as per the rules defined in the Virtual Service resource. Since Argo Rollouts provides native integration with Istio, it would override the Virtual Service resource to increase the traffic to the canary pods. Canary can be implemented using two methods: deploying new changes as a service or deploying new changes as a subset. 1. Deploying New Changes as a Service In this method, we can create a new service (called canary) and split the traffic from the Istio ingress gateway between the stable and canary services. Refer to the image below. You can refer to the YAML file for a sample implementation of deploying a canary with multiple services here. We have created two services called rollouts-demo-stable and rollouts-demo-canary. Each service will listen to HTTP traffic for the Argo Rollout resource called rollouts-demo. In the rollouts-demo YAML, we have specified the Istio virtual service resource and the logic to gradually improve the traffic weightage from 20% to 40%, 60%, 80%, and eventually 100%. 2. Deploying New Changes as a Subset In this method, you can have one service but create a new Deployment subset (canary version) pointing to the same service. Traffic can be split between the stable and canary deployment sets using Istio Virtual service and Destination rule resources. Please note that we have thoroughly discussed the second method in this blog. Implementing Canary Using Istio and Argo Rollouts Without Changing Deployment Resource Since there is a misunderstanding among DevOps professionals that Argo Rollouts is a replacement for Deployment resource, and the services considered for canary deployment have to refer to the Argo Rollouts with Deployment configuration rewritten. Well, that’s not true. The Argo Rollout resource provides a section called workloadRef where existing Deployments can be referred to without making significant changes to Deployment or service YAML. If you use the Deployments resource for a service in Kubernetes, you can provide a reference in the Rollout CRD, after which Argo Rollouts will manage the ReplicaSet for that service. Refer to the image below. We will use the same concept to deploy a canary version using the second method: deploying new changes using a Deployment. Argo Rollouts Configuration for Deploying New Changes Using a Subset Let's say you have a Kubernetes service called rollout-demo-svc and a deployment resource called rollouts-demo-deployment (code below). You need to follow the three steps to configure the canary deployment. Code for Service.yaml: YAML apiVersion: v1 kind: Service metadata: name: rollouts-demo-svc namespace: istio-argo-rollouts spec: ports: - port: 80 targetPort: http protocol: TCP name: http selector: app: rollouts-demo Code for deployment.yaml: YAML apiVersion: apps/v1 kind: Deployment metadata: name: rollouts-demo-deployment namespace: istio-argo-rollouts spec: replicas: 0 # this has to be made 0 once Argo rollout is active and functional. selector: matchLabels: app: rollouts-demo template: metadata: labels: app: rollouts-demo spec: containers: - name: rollouts-demo image: argoproj/rollouts-demo:blue ports: - name: http containerPort: 8080 resources: requests: memory: 32Mi cpu: 5m Step 1: Setup Virtual Service and Destination Rule in Istio Set up the virtual service by specifying the back-end destination for the HTTP traffic from the Istio gateway. In our virtual service rollouts-demo-vs2, we mentioned the back-end service as rollouts-demo-svc, but we created two subsets (stable and canary) for the respective deployment sets. We have set the traffic weightage rule so that 100% of the traffic goes to the stable version and 0% goes to the canary version. As Istio is responsible for the traffic split, we will see how Argo updates this Virtual service resource with the new traffic configuration specified in the canary specification. YAML apiVersion: networking.istio.io/v1alpha3 kind: VirtualService metadata: name: rollouts-demo-vs2 namespace: istio-argo-rollouts spec: gateways: - istio-system/rollouts-demo-gateway hosts: - "*" http: - name: route-one route: - destination: host: rollouts-demo-svc port: number: 80 subset: stable weight: 100 - destination: host: rollouts-demo-svc port: number: 80 subset: canary weight: 0 Now, we have to define the subsets in the Destination rules. In the rollout-destrule below, we have defined the subsets canary and stable and referred to the Argo Rollout resource called rollouts-demo. YAML apiVersion: networking.istio.io/v1alpha3 kind: DestinationRule metadata: name: rollout-destrule namespace: istio-argo-rollouts spec: host: rollouts-demo-svc subsets: - name: canary # referenced in canary.trafficRouting.istio.destinationRule.canarySubsetName labels: # labels will be injected with canary rollouts-pod-template-hash value app: rollouts-demo - name: stable # referenced in canary.trafficRouting.istio.destinationRule.stableSubsetName labels: # labels will be injected with stable rollouts-pod-template-hash value app: rollouts-demo In the next step, we will set up the Argo Rollout resource. Step 2: Setup Argo Rollout Resource The rollout spec should note two important items in the canary strategy: declare the Istio virtual service and destination rule and provide the traffic increment strategy. You can learn more about the Argo Rollout spec. In our Argo rollout resource, rollouts-demo, we have provided the deployment (rollouts-demo-deployment) in the workloadRef spec. In the canary spec, we have referred to the virtual resource (rollouts-demo-vs2) and destination rule (rollout-destrule) created in the earlier step. We have also specified the traffic rules to redirect 20% of the traffic to the canary pods and then pause for manual direction. We have given this manual pause so that in the production environment, the Ops team can verify whether all the vital metrics and KPIs, such as CPU, memory, latency, and the throughput of the canary pods, are in an acceptable range. Once we manually promote the release, the canary pod traffic will increase to 40%. We will wait 10 seconds before increasing the traffic to 60%. The process will continue until the traffic to the canary pods increases to 100% and the stable pods are deleted. YAML apiVersion: argoproj.io/v1alpha1 kind: Rollout metadata: name: rollouts-demo namespace: istio-argo-rollouts spec: replicas: 5 strategy: canary: trafficRouting: istio: virtualService: name: rollouts-demo-vs2 # required routes: - route-one # optional if there is a single route in VirtualService, required otherwise destinationRule: name: rollout-destrule # required canarySubsetName: canary # required stableSubsetName: stable # required steps: - setWeight: 20 - pause: {} - setWeight: 40 - pause: {duration: 10} - setWeight: 60 - pause: {duration: 10} - setWeight: 80 - pause: {duration: 10} revisionHistoryLimit: 2 selector: matchLabels: app: rollouts-demo workloadRef: apiVersion: apps/v1 kind: Deployment name: rollouts-demo-deployment Once you have deployed all the resources in steps 1 and 2 and accessed them through the Istio ingress IP from the browser, you will see an output like the one below. You can run the command below to understand how the pods are handled by Argo Rollouts. YAML kubectl get pods -n <<namespace>> Validating Canary Deployment Let’s say developers have made new changes and created a new image that is supposed to be tested. For our case, we will make the Deployment manifest file (rollouts-demo-deployment) by modifying the image value from blue to red (refer to the image below). YAML spec: containers: - name: rollouts-demo image: argoproj/rollouts-demo:blue Once you deploy the rollouts-demo-deployment, Argo Rollout will understand that new changes have been introduced to the environment. It would then start making new canary pods and allow 20% of the traffic. Refer to the image below: Now, if you analyze the virtual service spec by running the following command, you will realize Argo has updated the traffic percentage to canary from 0% to 20% (as per the Rollouts spec). YAML kubectl get vs rollouts-demo-vs2 -n <<namespace>> -o yaml Gradually, 100% of the traffic will be shifted to the new version, and older/stable pods will be terminated. In advanced cases, the DevOps team must control the scaling of canary pods. The idea is not to create all the pods as per the replica at each gradual shifting of the canary but to create the number of pods based on specific criteria. In those cases, we need HorizontalPodAutoscaler (HPA) to handle the scaling of canary pods. Scaling of Pods During Canary Deployment Using HPA Kubernetes HPA is used to increase or decrease pods based on load. HPA can also be used to control the scaling of pods during canary deployment. HorizontalPosAutoscaler overrides the Rollouts behavior for scaling of pods. We have created and deployed the following HPA resource: hpa-rollout-example. Note: The HPA will create the number of pods = maximum (minimum pods as per HPA resource, or number of the replicas mentioned in the Rollouts). This means if the number of pods mentioned in the HPA resource is 2 but the replicas as per the Rollouts resource is 5, then a total of 5 pods will be created. Similarly, if we update the replicas in the rollouts-demo resource as 1, then the number of pods created by HPA will be 2. (We will have updated the replicas to 1 to test this scenario.) In the HPA resource, we have referenced the Argo Rollout resource rollouts-demo. That means HPA will be responsible for creating two replicas at the start. If the CPU utilization is more than 10%, more pods will be created. A maximum of six replicas will be created. YAML apiVersion: autoscaling/v1 kind: HorizontalPodAutoscaler metadata: name: hpa-rollout-example namespace: istio-argo-rollouts spec: maxReplicas: 6 minReplicas: 2 scaleTargetRef: apiVersion: argoproj.io/v1alpha1 kind: Rollout name: rollouts-demo targetCPUUtilizationPercentage: 10 When we deployed a canary, only two replicas were created at first (instead of the five mentioned in the Rollouts). Validating Scaling of Pods by HPA by Increasing Synthetic Loads We can run the following command to increase the loads to a certain pod. YAML kubectl run -i -tty load-generator-1 -rm -image=busybox:1.28 -restart=Never - /bin/sh -c "while sleep 0.01; do wget -q -O- http://<<service name>>.<<namespace>>; done;" You use the following command to observe the CPU utilization of the pods created by HPA. YAML kubectl get hpa hpa-rollout-example -n <<namespace>> -watch Once the load increases more than 10%, in our case to 14% (refer to the image below), new pods will be created. Many metrics, such as latency or throughput, can be used by HPA as criteria for scaling up or down the pods. Video Below is the video by Ravi Verma, CTO of IMESH, giving a walkthrough on advanced traffic management in Canary for enterprises at scale using Istio and Argo Rollouts. Final Thought As the pace of releasing software increases with the maturity of the CI/CD process, new complications will emerge. And so will new requirements by the DevOps team to tackle these challenges. Similarly, when the DevOps team adopts the canary deployment strategy, new scale, and traffic management challenges emerge to gain granular control over the rapid release process and infrastructure cost.
Are you ready to start your journey on the road to collecting telemetry data from your applications? Great observability begins with great instrumentation! In this series, you'll explore how to adopt OpenTelemetry (OTel) and how to instrument an application to collect tracing telemetry. You'll learn how to leverage out-of-the-box automatic instrumentation tools and understand when it's necessary to explore more advanced manual instrumentation for your applications. By the end of this series, you'll have an understanding of how telemetry travels from your applications to the OpenTelemetry Collector, and be ready to bring OpenTelemetry to your future projects. Everything discussed here is supported by a hands-on, self-paced workshop authored by Paige Cruz. In the previous article, we explored how to leverage programmatic instrumentation in our application as developers would in their daily coding using OTel libraries. In this article, we carry onwards to improve the visualization of our telemetry data that was being displayed in console output. We are going to programmatically instrument and configure our application to direct all telemetry data to a Jaeger instance for visual insights. It is assumed that you followed the previous articles in setting up both OpenTelemetry and the example Python application project, but if not, go back and see the previous articles as it's not covered here. Instrumenting With Jaeger As mentioned previously, OTel does not provide a backend to store its collected telemetry data. Instead, it's acting as a collector of telemetry data and forwarding that to our preferred backend system. For visualizing telemetry data we need this backend storage where we then can query our data that tooling, such as Jaeger can then visualize in dashboards. Jaeger was built with all of this in mind, providing a very quick way to get started if you supply it with OpenTelemetry Protocol (OTLP) formatted data. We are going to use the default Jaeger in-memory storage for our telemetry data by sending it to Jaeger and explore it visually using the predefined UI dashboards. Carrying on from our previous setup where we instrumented programmatically to output our span data to the console, we'll modify this to now use the OTLPSpanExporter in our application code. This can be configured to send our span data directly to a Jaeger instance with an OTLP endpoint. Using the downloaded project we installed from previous articles, we can open the file programmatic/Buildfile-prog and add the bold lines below to install the OTLP Exporter library: FROM python:3.12-bullseye WORKDIR /app COPY requirements.txt requirements.txt RUN pip install -r requirements.txt RUN pip install opentelemetry-api \ opentelemetry-exporter-otlp \ opentelemetry-sdk \ opentelemetry-instrumentation-flask \ opentelemetry-instrumentation-jinja2 \ opentelemetry-instrumentation-requests COPY . . CMD [ "flask", "run", "--host=0.0.0.0"] Next, we need to adjust the application code found in programmatic/app.py to import the OLTPSpanExporter library and swap out the ConsoleSpanExporter as shown in bold: import random import re import urllib3 import requests from flask import Flask, render_template, request from breeds import breeds from opentelemetry.trace import set_tracer_provider from opentelemetry.sdk.trace import TracerProvider from opentelemetry.sdk.trace.export import SimpleSpanProcessor from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter from opentelemetry.instrumentation.flask import FlaskInstrumentor from opentelemetry.instrumentation.jinja2 import Jinja2Instrumentor from opentelemetry.instrumentation.requests import RequestsInstrumentor provider = TracerProvider() processor = SimpleSpanProcessor(OTLPSpanExporter()) provider.add_span_processor(processor) set_tracer_provider(provider) ... Next, we insert the imports needed for flask, jinja2, and requests instrumentation libraries above the section we just created. The code to be added is shown in bold below: import random import re import urllib3 import requests from flask import Flask, render_template, request from breeds import breeds from opentelemetry.trace import set_tracer_provider from opentelemetry.sdk.trace import TracerProvider from opentelemetry.sdk.trace.export import SimpleSpanProcessor, ConsoleSpanExporter from opentelemetry.instrumentation.flask import FlaskInstrumentor from opentelemetry.instrumentation.jinja2 import Jinja2Instrumentor from opentelemetry.instrumentation.requests import RequestsInstrumentor provider = TracerProvider() processor = SimpleSpanProcessor(ConsoleSpanExporter()) provider.add_span_processor(processor) set_tracer_provider(provider) app = Flask("hello-otel") FlaskInstrumentor().instrument_app(app) Jinja2Instrumentor().instrument() RequestsInstrumentor().instrument() ... Next, we can review the pod configuration file where all the details are found to run multiple container images in a Kubernetes pod. We will be using the Jaeger all-in-one image, which is designed for quick local testing. It includes the Jaeger UI, jaeger-collector, jaeger-query, and jaeger-agent, with an in-memory storage component. Save and close the file programmatic/app.py and build the new container image with the following command: $ podman build -t hello-otel:prog -f programmatic/Buildfile-prog Successfully tagged localhost/hello-otel:prog \ 516c5299a32b68e7a4634ce15d1fd659eed2164ebe945ef1673f7a55630e22c8 Next, open the file programmatic/app_pod.yaml and review the Jaeger section. Note the ports: 16686 and 4318. The first is for the Jaeger UI and the second is for telemetry data sent via OTLP over HTTP: ... - name: jaeger-all-in-one image: jaegertracing/all-in-one:1.56 resources: limits: memory: "128Mi" cpu: "500m" ports: - containerPort: 16686 hostPort: 16686 - containerPort: 4318 env: - name: COLLECTOR_OTLP_ENABLED value: "true" - name: OTEL_TRACES_EXPORTER value: "otlp" Now let's run the entire configuration, putting our application and the Jaeger all-in-one containers in action in a pod with the following: $ podman play kube programmatic/app_pod.yaml Pod: 277dd50306eed445f4f43fc33111eedb31ed5804db1f60a6f0784a2333a54de0 Containers: b9d5075e2051502b12510deddfb34498f32b2ae12554c5328fecd9725c7b1fe2 8505e8c8cfffaff1f473cfbbf5d9a312ca8247f32653cccf7d192305c1ca741a Open a browser and view the Jaeger UI at http://localhost:16686 which should display the Gopher Detective as shown below: Now we can generate telemetry data by accessing our application and making several requests to the http://localhost:8001/doggo endpoint and see something like this: Refreshing the doggo application emits traces with spans from each instrumentation library below providing a nice visual in the Jaeger UI: Flask spans representing requests to the app Requests spans for the external request to Dog API Jinja2 spans for HTML template compilation Back in our Jaeger UI, we can select hello-otel from the service dropdown menu and click on the find traces button on the bottom right. Confirm that you see traces returned for the operation /doggo, something like this: Verify by clicking on a span name, the top one for example, and view the trace waterfall view: This completes the tour of programmatic instrumentation where we installed and configured OpenTelemetry API and SDK programmatically in our application, successfully sent traces, and finally configured our application to view our traces in the Jaeger UI. These examples use code from a Python application that you can explore in the provided hands-on workshop. What's Next? This article completed our journey into programmatic instrumentation where we instrumented our application as developers and viewed our traces in the Jaeger UI. In our next article, we'll be visually exploring, querying, and viewing our tracing data in Jaeger.
Editor's Note: The following is an article written for and published in DZone's 2024 Trend Report, Enterprise Security: Reinforcing Enterprise Application Defense. In today's cybersecurity landscape, securing the software supply chain has become increasingly crucial. The rise of complex software ecosystems and third-party dependencies has introduced new vulnerabilities and threats, making it imperative to adopt robust security measures. This article delves into the significance of a software bill of materials (SBOM) and DevSecOps practices for enhancing application security. We will cover key points such as the importance of software supply chain security, the role of SBOMs, the integration of DevSecOps, and practical steps to secure your software supply chain. Understanding the Importance of Software Supply Chain Security Software supply chain security encompasses the protection of all components and processes involved in the creation, deployment, and maintenance of software. This includes source code, libraries, development tools, and third-party services. As software systems grow more interconnected, the attack surface expands, making supply chain security a critical focus area. The software supply chain is vulnerable to various threats, including: Malicious code injection – attackers embedding malicious code into software components Dependency hijacking – compromising third-party libraries and dependencies Code tampering – making unauthorized modifications to source code Credential theft – stealing credentials to access and manipulate development environments To combat these threats, a comprehensive approach to software supply chain security entails: Continuous monitoring and assessment – regularly evaluating the security posture of all supply chain components Collaboration and transparency – fostering open communication between developers, security teams, and third-party vendors Proactive threat management – identifying and mitigating potential threats before they can cause damage The Importance of an SBOM and Why It Matters for Supply Chain Security An SBOM is a detailed inventory of all components, dependencies, and libraries used in a software application. It provides visibility into the software's composition, enabling organizations to: Identify vulnerabilities – By knowing exactly what components are in use, security teams can swiftly identify which parts of the software are affected by newly discovered vulnerabilities, significantly reducing the time required for remediation and mitigating potential risks. Ensure compliance – Many regulations mandate transparency in software components to ensure security and integrity. An SBOM helps organizations adhere to these regulations by providing a clear record of all software components, demonstrating compliance, and avoiding potential legal and financial repercussions. Improve transparency – An SBOM allows all stakeholders, including developers, security teams, and customers, to understand the software’s composition. This transparency fosters better communication, facilitates informed decision making, and builds confidence in the security and reliability of the software. Enhance supply chain security – Detailed insights into the software supply chain help organizations manage third-party risks more effectively. Having an SBOM allows for better assessment and monitoring of third-party components, reducing the likelihood of supply chain attacks and ensuring that all components meet security and quality standards. Table 1. SBOM benefits and challenges Benefits Challenges Enhanced visibility of all software components Creating and maintaining an accurate SBOM Faster vulnerability identification and remediation Integrating SBOM practices into existing workflows Improved compliance with regulatory standards Ensuring SBOM data accuracy and reliability across the entire software development lifecycle (SDLC) Regulatory and Compliance Aspects Related to SBOMs Regulatory bodies increasingly mandate the use of SBOMs to ensure software transparency and security. Compliance with standards such as the Cybersecurity Maturity Model Certification (CMMC) and Executive Order 14028 on "Improving the Nation's Cybersecurity" emphasizes the need for comprehensive SBOM practices to ensure detailed visibility and accountability for software components. This enhances security by quickly identifying and mitigating vulnerabilities while ensuring compliance with regulatory requirements and maintaining supply chain integrity. SBOMs also facilitate rapid response to newly discovered threats, reducing the risk of malicious code introduction. Creating and Managing SBOMs An SBOM involves generating a detailed inventory of all software components, dependencies, and libraries and maintaining it accurately throughout the SDLC to ensure security and compliance. General steps to create an SBOM include: Identify components – list all software components, including libraries, dependencies, and tools Document metadata – record version information, licenses, and source details for each component Automate SBOM generation – use automated tools to generate and update SBOMs Regular updates – continuously update the SBOM to reflect changes in the software Several tools and technologies aid in managing SBOMs, such as: CycloneDX, a standard format for creating SBOMs OWASP dependency-check identifies known vulnerabilities in project dependencies Syft generates SBOMs for container images and filesystems Best Practices for Maintaining and Updating SBOMs Maintaining and updating an SBOM is crucial for ensuring the security and integrity of software applications. Let's review some best practices to follow. Automate Updates Automating the update process of SBOMs is essential to keeping them current and accurate. Automated tools can continuously monitor software components and dependencies, identifying any changes or updates needed to the SBOM. This practice reduces the risk of human error and ensures that the SBOM reflects the latest state of the software, which is critical for vulnerability management and compliance. Implementation tips: Use automation tools like CycloneDX and Syft that integrate seamlessly with your existing development environment Schedule regular automated scans to detect updates or changes in software components Ensure that the automation process includes notification mechanisms to alert relevant teams of any significant changes Practices to avoid: Relying solely on manual updates, which can lead to outdated and inaccurate SBOMs Overlooking the importance of tool configuration and updates to adapt to new security threats Integrate Into CI/CD Embedding SBOM generation into the continuous integration and continuous deployment (CI/CD) pipeline ensures that SBOMs are generated and updated automatically as part of the SDLC. This integration ensures that every software build includes an up-to-date SBOM, enabling developers to identify and address vulnerabilities early in the process. Implementation tips: Define clear triggers within the CI/CD pipeline to generate or update SBOMs at specific stages, such as code commits or builds Use tools like Jenkins and GitLab CI that support SBOM generation and integrate with popular CI/CD platforms Train development teams on the importance of SBOMs and how to use them effectively within the CI/CD process Practices to avoid: Neglecting the integration of SBOM generation into the CI/CD pipeline, which can lead to delays and missed vulnerabilities Failing to align SBOM practices with overall development workflows and objectives Regular Audits Conducting periodic audits of SBOMs is vital to verifying their accuracy and completeness. Regular audits help identify discrepancies or outdated information and ensure that the SBOM accurately reflects the software's current state. These audits should be scheduled based on the complexity and frequency of software updates. Implementation tips: Establish a routine audit schedule, such as monthly or quarterly, depending on the project’s needs Involve security experts in the auditing process to identify potential vulnerabilities and ensure compliance Use audit findings to refine and improve SBOM management practices Practices to avoid: Skipping audits, which can lead to undetected security risks and compliance issues Conducting audits without a structured plan or framework, resulting in incomplete or ineffective assessments DevSecOps and Its Role in Software Supply Chain Security DevSecOps integrates security practices into the DevOps pipeline, ensuring that security is a shared responsibility throughout the SDLC. This approach enhances supply chain security by embedding security checks and processes into every stage of development. Key Principles and Practices of DevSecOps The implementation of key DevSecOps principles can bring several benefits and challenges to organizations adopting the practice. Table 3. DevSecOps benefits and challenges Benefits Challenges Identifies and addresses security issues early in the development process Requires a shift in mindset toward prioritizing security Streamlines security processes, reducing delays and improving efficiency Integrating security tools into existing pipelines can be complex Promotes a culture of shared responsibility for security Ensuring SBOM data accuracy and reliability Automation Automation in DevSecOps involves integrating security tests and vulnerability scans into the development pipeline. By automating these processes, organizations can ensure consistent and efficient security checks, reducing human error and increasing the speed of detection and remediation of vulnerabilities. This is particularly important in software supply chain security, where timely identification of issues can prevent vulnerabilities from being propagated through dependencies. Implementation tip: Use tools like Jenkins to automate security testing within your CI/CD pipeline. Collaboration Collaboration between development, security, and operations teams is essential in DevSecOps. This principle emphasizes breaking down silos and fostering open communication and cooperation among all stakeholders. Effective collaboration ensures that security considerations are integrated from the start, leading to more secure software development processes. Implementation tip: Establish regular cross-team meetings and use collaboration tools to facilitate communication and knowledge sharing. Continuous Improvement Continuous improvement in DevSecOps involves regularly updating security practices based on feedback, new threats, and evolving technologies. This principle ensures that security measures remain effective and relevant so that they adapt to changes in the threat landscape and technological advancements. Implementation tip: Use metrics and key performance indicators (KPIs) to evaluate the effectiveness of security practices and identify areas for improvement. Shift-Left Security Shift-left security involves integrating security early in the development process rather than addressing it at the end. This approach allows developers to identify and resolve security issues during the initial stages of development, reducing the cost and complexity of fixing vulnerabilities later. Implementation tip: Conduct security training for developers and incorporate security testing tools into the development environment. Application Security Testing in DevSecOps Application security testing is crucial in DevSecOps to ensure that vulnerabilities are detected and addressed early. It enhances the overall security of applications by continuously monitoring and testing for potential threats. The following are different security testing methods that can be implemented: Static application security testing (SAST) analyzes source code for vulnerabilities. Dynamic application security testing (DAST) tests running applications for security issues. Interactive application security testing (IAST) combines elements of SAST and DAST for comprehensive testing. Open-source tools and frameworks that facilitate application security testing include: SonarQube, a static code analysis tool OWASP ZAP, a dynamic application security testing tool Grype, a vulnerability scanner for container images and filesystems Integrating Security Into CI/CD Pipelines Integrating security into CI/CD pipelines is essential to ensure that security checks are consistently applied throughout the SDLC. By embedding security practices into the CI/CD workflow, teams can detect and address vulnerabilities early, enhancing the overall security posture of the application. Here are the key steps to achieve this: Incorporate security tests into CI/CD workflows Use automated tools to scan for vulnerabilities during builds Continuously monitor for security issues and respond promptly Automating Security Checks and Vulnerability Scanning Automation ensures that security practices are applied uniformly, reducing the risk of human error and oversight to critical security vulnerabilities. Automated security checks can quickly identify vulnerabilities, allowing for faster remediation and reducing the window of opportunity for attackers to exploit weaknesses. DevSecOps emphasizes the importance of building security into every stage of development, automating it wherever possible, rather than treating it as an afterthought. Open-source CI/CD tools like Jenkins, GitLab CI, and CircleCI can integrate security tests into the pipeline. While automation offers significant benefits, there are scenarios where it may not be appropriate, such as: Highly specialized security assessments Context-sensitive analysis Initial setup and configuration False positives and negatives Ensuring Continuous Security Throughout the SDLC Implement continuous security practices to maintain a strong security posture throughout the SDLC and regularly update security policies, tools, and practices to adapt to evolving threats. This proactive approach not only helps in detecting and mitigating vulnerabilities early but also ensures that security is integrated into every phase of development, from design to deployment. By fostering a culture of continuous security improvement, organizations can better protect their software assets and reduce the likelihood of breaches. Practical Steps to Secure Your Software Supply Chain Implementing robust security measures in your software supply chain is essential for protecting against vulnerabilities and ensuring the integrity of your software. Here are practical steps to achieve this: Establishing a security-first culture: ☑ Implement training and awareness programs for developers and stakeholders ☑ Encourage collaboration between security and development teams ☑ Ensure leadership supports a security-first mindset Implementing access controls and identity management: ☑ Implement least privilege access controls to minimize potential attack vectors ☑ Secure identities and manage permissions using best practices for identity management Auditing and monitoring the supply chain: ☑ Continuously audit and monitor the supply chain ☑ Utilize open-source tools and techniques for monitoring ☑ Establish processes for responding to detected vulnerabilities Key Considerations for Successful Implementation To successfully implement security practices within an organization, it's crucial to consider both scalability and flexibility as well as the effectiveness of the measures employed. These considerations ensure that security practices can grow with the organization and remain effective against evolving threats. Ensuring scalability and flexibility: ☑ Design security practices that can scale with your organization ☑ Adapt to changing threat landscapes and technological advancements using flexible tools and frameworks that support diverse environments Measuring effectiveness: ☑ Evaluate the effectiveness of security efforts using key metrics and KPIs ☑ Regularly review and assess security practices ☑ Use feedback to continuously improve security measures Conclusion Securing the software supply chain is crucial in today's interconnected world. By adopting SBOM and DevSecOps practices using open-source tools, organizations can enhance their application security and mitigate risks. Implementing these strategies requires a comprehensive approach, continuous improvement, and a security-first culture. For further learning and implementation, explore the resources below and stay up to date with the latest developments in cybersecurity. Additional resources: "Modern DevSecOps: Benefits, Challenges, and Integrations To Achieve DevSecOps Excellence" by Akanksha Pathak "Building Resilient Cybersecurity Into Supply Chain Operations: A Technical Approach" by Akanksha Pathak "Demystifying SAST, DAST, IAST, and RASP: A Comparative Guide" by Apostolos Giannakidis Software Supply Chain Security: Core Practices to Secure the SDLC and Manage Risk by Justin Albano, DZone Refcard Getting Started With CI/CD Pipeline Security by Sudip Sengupta and Collin Chau, DZone Refcard Getting Started With DevSecOps by Caroline Wong, DZone Refcard This is an excerpt from DZone's 2024 Trend Report,Enterprise Security: Reinforcing Enterprise Application Defense.Read the Free Report
Cross-Origin Resource Sharing (CORS) is an essential security mechanism utilized by web browsers, allowing for regulated access to server resources from origins that differ in domain, protocol, or port. In the realm of APIs, especially when utilizing AWS API Gateway, configuring CORS is crucial to facilitate access for web applications originating from various domains while mitigating potential security risks. This article aims to provide a comprehensive guide on CORS and integrating AWS API Gateway through CloudFormation. It will emphasize the significance of CORS, the development of authorization including bearer tokens, and the advantages of selecting optional methods in place of standard GET requests. Why CORS Matters In the development of APIs intended for access across various domains, CORS is essential in mitigating unauthorized access. By delineating the specific domains permitted to interact with your API, you can protect your resources from Cross-Site Request Forgery (CSRF) attacks while allowing valid cross-origin requests. Benefits of CORS Security: CORS plays a crucial role in regulating which external domains can access your resources, thereby safeguarding your API against harmful cross-origin requests. Flexibility: CORS allows you to define varying levels of access (such as methods like GET, POST, DELETE, etc.) for different origins, offering adaptability based on your specific requirements. User experience: Implementing CORS enhances user experience by allowing users to seamlessly access resources from multiple domains without encountering access-related problems. Before we proceed with setting up CORS, we need to understand the need to use optional methods over GET. This comparison helps in quickly comparing the aspects of using GET versus optional methods (PUT, POST, OPTIONS) in API requests. Reason GET Optional Methods (POST, PUT, OPTIONS) Security GET requests are visible in the browser's address bar and can be cached, making it less secure for sensitive information. Optional methods like POST and PUT are not visible in the address bar and are not cached, providing more security for sensitive data. Flexibility GET requests are limited to sending data via the URL, which restricts the complexity and size of data that can be sent. Optional methods allow sending complex data structures in the request body, providing more flexibility. Idempotency and Safety GET is idempotent and considered safe, meaning it does not modify the state of the resource. POST and PUT are used for actions that modify data, and OPTIONS are used for checking available methods. CORS Preflight GET requests are not typically used for CORS preflight checks. OPTIONS requests are crucial for CORS preflight checks, ensuring that the actual request can be made. Comparison between POST and PUT methods, the purposes and behavior: Aspect POST PUT Purpose Used to create a new resource. Used to update an existing resource or create it if it doesn't exist. Idempotency Not idempotent; multiple identical requests may create multiple resources. Idempotent; multiple identical requests will not change the outcome beyond the initial change. Resource Location The server decides the resource's URI, typically returning it in the response. The client specifies the resource's URI. Data Handling Typically used when the client does not know the URI of the resource in advance. Typically used when the client knows the URI of the resource and wants to update it. Common Use Case Creating new records, such as submitting a form to create a new user. Updating existing records, such as editing user information. Caching Responses to POST requests are generally not cached. Responses to PUT requests can be cached as the request should result in the same outcome. Response Usually returns a status code of 201 (Created) with a location header pointing to the newly created resource. Usually returns a status code of 200 (OK) or 204 (No Content) if the update is successful. Setting Up CORS in AWS API Gateway Using CloudFormation Configuring CORS in AWS API Gateway can be accomplished manually via the AWS Management Console; however, automating this process with CloudFormation enhances both scalability and consistency. Below is a detailed step-by-step guide: 1. Define the API Gateway in CloudFormation Start by defining the API Gateway in your CloudFormation template: YAML Resources: MyApi: Type: AWS::ApiGateway::RestApi Properties: Name: MyApi 2. Create Resources and Methods Define the resources and methods for your API. For example, create a resource for /items and a GET method: YAML ItemsResource: Type: AWS::ApiGateway::Resource Properties: ParentId: !GetAtt MyApi.RootResourceId PathPart: items RestApiId: !Ref MyApi GetItemsMethod: Type: AWS::ApiGateway::Method Properties: AuthorizationType: NONE HttpMethod: GET ResourceId: !Ref ItemsResource RestApiId: !Ref MyApi Integration: Type: MOCK IntegrationResponses: - StatusCode: 200 MethodResponses: - StatusCode: 200 3. Configure CORS Next, configure CORS for your API method by specifying the necessary headers: YAML OptionsMethod: Type: AWS::ApiGateway::Method Properties: AuthorizationType: NONE HttpMethod: OPTIONS ResourceId: !Ref ItemsResource RestApiId: !Ref MyApi Integration: Type: MOCK RequestTemplates: application/json: '{"statusCode": 200}' IntegrationResponses: - StatusCode: 200 SelectionPattern: '2..' ResponseParameters: method.response.header.Access-Control-Allow-Headers: "'Content-Type,X-Amz-Date,Authorization,X-Api-Key,X-Amz-Security-Token'" method.response.header.Access-Control-Allow-Methods: "'*'" method.response.header.Access-Control-Allow-Origin: "'*'" MethodResponses: - StatusCode: 200 ResponseModels: { "application/json": "Empty" } ResponseParameters: method.response.header.Access-Control-Allow-Headers: false method.response.header.Access-Control-Allow-Methods: false method.response.header.Access-Control-Allow-Origin: false Incorporating Authorization Implementing authorization within your API methods guarantees that access to specific resources is restricted to authenticated and authorized users. The AWS API Gateway offers various authorization options, including AWS Lambda authorizers, Cognito User Pools, and IAM roles. YAML MyAuthorizer: Type: AWS::ApiGateway::Authorizer Properties: Name: MyLambdaAuthorizer RestApiId: !Ref MyApi Type: TOKEN AuthorizerUri: arn:aws:apigateway:<region>:lambda:path/2015-03-31/functions/<lambda_arn>/invocations GetItemsMethodWithAuth: Type: AWS::ApiGateway::Method Properties: AuthorizationType: CUSTOM AuthorizerId: !Ref MyAuthorizer HttpMethod: GET ResourceId: !Ref ItemsResource RestApiId: !Ref MyApi Integration: Type: AWS_PROXY IntegrationHttpMethod: POST Uri: !Sub arn:aws:apigateway:${AWS::Region}:lambda:path/2015-03-31/functions/${MyFunction.Arn}/invocations MethodResponses: - StatusCode: 200 After implementation, here's how the API looks in AWS: Integration request: API Gateway Documentation can be found here: Amazon API. Conclusion Establishing CORS and integrating AWS API Gateway through CloudFormation offers an efficient and reproducible method for managing API access. By meticulously setting up CORS, you guarantee that your APIs remain secure and are accessible solely to permitted origins. Incorporating authorization adds a layer of security by limiting access to only those users who are authorized. Moreover, evaluating the advantages of utilizing optional methods instead of GET requests ensures that your API maintains both security and the flexibility necessary for managing intricate operations. The implementation of these configurations not only bolsters the security and performance of your API but also enhances the overall experience for end-users, facilitating seamless cross-origin interactions and the appropriate management of sensitive information.
In our post about extensive-react-boilerplate updates, we mentioned that we migrated e2e-testing from Cypress to Playwright. Now, let's delve a little deeper into this change. At the time of writing the automated tests, we had a small amount of functionality to cover and didn't face significant limitations when using Cypress. Yet, we decided to turn our attention to Playwright for several reasons. We wanted to explore the framework created by Microsoft and understand why it is gaining popularity. Additionally, similar to the case when we added MongoDB support, we received requests from the community and colleagues who wanted to start a project based on boilerplate with Playwright tests. As we began the process of migrating tests, we acknowledged that the volume of tests was insignificant. Therefore, we decided to rewrite the tests manually in order to familiarize ourselves with the new framework in more detail. Familiarizing Ourselves With a New Framework First, we start discussing the documentation. We can confidently say that Cypress's documentation surpasses Playwright. Cypress documentation is very detailed and contains numerous examples and tutorials. There is also an entire project on GitHub with examples for every action that can be performed on a typical website. Additionally, the Cypress community is larger in comparison to Playwright. Although experienced developers may be content with the information provided in Playwright’s documentation, less experienced developers may find learning Cypress more enjoyable. Moving on to setting up the configuration file. We don’t find significant differences between the two frameworks. We only need to configure the timeouts and the base URL. We also explored some new capabilities that Playwright offers in this regard, such as: Setting timeouts for each test, including test and Before/After hooks: TypeScript // playwright.config.ts export default defineConfig({ ... timeout: 2 * 60 * 1000, ... }); Support of testing on WebKit, which is based on Apple Safari, whereas Cypress lacks such support Playwright also has the ability to start a local development server with your project before running tests, which can be easily implemented using the webServer parameter. TypeScript webServer: { command: process.env.CI ? "npm run build:e2e && npm run start" : "npm run dev", url: "http://127.0.0.1:3000", reuseExistingServer: !process.env.CI, }, Next, we write our first test. The difference in syntax between the two frameworks is notable. Cypress uses chainable syntax and has its own implementation of asynchrony, while Playwright supports the ECMAScript 2015 standard (ES6) and works with convenient async/await construction for asynchronous functions. Here is a Playwright code sample: TypeScript test("should be successful open page and check data is displayed", async ({ page, }) => { await page.getByTestId("profile-menu-item").click(); await page.getByTestId("user-profile").click(); await page.waitForURL(/\/profile/); await expect(page.getByTestId("user-name")).toHaveText( `${firstName} ${lastName}` ); await expect(page.getByTestId("user-email")).toHaveText(email, { ignoreCase: true, }); await page.getByTestId("edit-profile").click(); await page.waitForURL(/\/profile\/edit/); await expect(page.getByTestId("first-name").locator("input")).toHaveValue( firstName ); await expect(page.getByTestId("last-name").locator("input")).toHaveValue( lastName ) And here is Cypress: JavaScript it("should be successful open page and check data is displayed", () => { cy.getBySel("PersonIcon").click(); cy.getBySel("user-profile").click(); cy.location("pathname").should("include", "/profile"); cy.getBySel("user-name").should("contain.text", firstName + " " + lastName); cy.getBySel("user-email").should("contain.text", email); cy.getBySel("edit-profile").click(); cy.location("pathname").should("include", "/profile/edit"); cy.get('[data-testid="first-name"] input').should("contain.value", firstName); cy.get('[data-testid="last-name"] input').should("contain.value", lastName); }); Framework Comparisons When it comes to running the tests, we notice the architectural differences between the frameworks. Cypress executes commands inside the browser, which gives it easy access to important components such as DOM, local storage, and window objects. On the other hand, Playwright uses a client-server architecture and communicates with browsers via a WebSocket connection. After rewriting all the tests, we ran them all and observed that by default, Playwright runs tests in parallel, providing this feature for free. In contrast, Cypress performs parallelization only for different machines, and it is a paid feature. Running the same tests in both frameworks revealed that Playwright completed the tests more quickly than Cypress. We conducted the tests and found that Playwright completed them in 2.7 minutes: While Cypress took 3.82 minutes, showing a significant time difference in favor of Playwright. Conclusion Considering all the above points, one might wonder why we decided to change the framework. Although we did not see significant benefits at that moment, we took into account the future of our project and potential projects that will be built on top of boilerplates from the bcboilerplates ecosystem. From this perspective, Playwright seemed more promising than Cypress due to its parallelization of tests, higher speed, the possibility of testing mobile applications, the ability to use programming languages other than JS and TS, and the backing of a major player like Microsoft.
In the realm of front-end development, ensuring that your application is thoroughly tested and maintains high quality is paramount. One of the strategies that can significantly enhance both the development and testing processes is the use of the data-testid attribute. This attribute, specifically designed for testing purposes, offers numerous advantages, particularly from a QA perspective. Benefits of Using data-testid Stable and Reliable Locators Benefit One of the primary challenges in automated testing is ensuring that test scripts remain stable as the UI evolves. Typically, selectors like classes and IDs are used to locate elements in the DOM, but these can change frequently as the design or structure of the UI is updated. data-testid provides a stable and reliable way to locate elements, as it is intended solely for testing purposes and is less likely to be altered. Impact on Automation Automated tests become more resilient and less prone to failure due to changes in the UI. This reduces the maintenance burden on the QA team, allowing them to focus on expanding test coverage rather than constantly updating selectors. Clear Separation of Concerns Benefit data-testid ensures that testing selectors are decoupled from the visual and functional aspects of the UI. Unlike classes and IDs, which are tied to styling and functionality, data-testid is dedicated solely to testing, meaning that changes to the UI’s look or behavior won't impact the test scripts. Impact on Automation This separation promotes a cleaner codebase and prevents tests from becoming fragile due to design changes. Developers can refactor UI components without worrying about breaking the test automation, as long as the data-testid values remain unchanged. Encourages a Test-First Approach Benefit The use of data-testid encourages developers to think about testability from the outset. By including data-testid attributes during development, teams can ensure that their UI components are easily testable and that the testing process is considered throughout the development lifecycle. Impact on Automation This test-first approach can lead to more robust and comprehensive test coverage. When testability is a priority from the beginning, automated tests can be created more quickly and with greater confidence in their effectiveness. How Can I Implement This Approach? I’ve created a separate step-by-step guide to implement this approach, "Mastering Test Automation: How data-testid Can Revolutionize UI Testing." Impact on Automation Development Simplified Locator Strategy By using data-testid attributes, test automation engineers can adopt a simplified and consistent locator strategy across the entire test suite. This reduces the complexity of writing and maintaining test scripts and minimizes the time spent dealing with flaky tests due to changing locators. Reduced Test Maintenance The stability provided by data-testid attributes means that automated tests require less frequent updates, even as the UI evolves. This leads to lower maintenance costs and allows the QA team to invest their time in creating new tests or enhancing existing ones. Improved Collaboration Between Developers and QA By using data-testid, developers and QA engineers can work more closely together. Developers can ensure that the elements they create are easily identifiable in tests, while QA engineers can provide feedback on which elements need data-testid attributes. This collaboration fosters a more cohesive development process and helps ensure that the application is thoroughly tested. Scalability of the Automation Suite A consistent use of data-testid makes the automation suite more scalable. As the application grows, the test suite can expand with it, confident that the locators will remain stable and that tests will continue to provide reliable results. Impact on Overall QA Process and Product Delivery Implementing data-testid attributes in front-end development has a profound impact on the overall QA process and product delivery: Increased Test Reliability Automated tests that rely on data-testid attributes are less likely to break, leading to more reliable test results. This reliability ensures that the QA team can quickly identify and address issues, reducing the likelihood of bugs making it into production. Faster Development and Testing Cycles With data-testid, both development and testing processes become more efficient. Developers can refactor code without fear of breaking tests, and QA engineers can write tests more quickly and with greater confidence. This efficiency leads to faster development and testing cycles, allowing the team to deliver high-quality products more rapidly. Reduced Technical Debt The stability and maintainability provided by data-testid attributes help reduce technical debt related to testing. With less time spent on test maintenance and more time available for enhancing test coverage, the QA team can focus on preventing bugs rather than constantly fixing them. Better Stakeholder Confidence Reliable, consistent test results build confidence among stakeholders, including product managers, developers, and end-users. Knowing that critical functionalities are thoroughly tested before release can provide peace of mind and support smoother product rollouts. Potential for Misuse While data-testid is a powerful tool, it should be used judiciously. Overuse of data-testid attributes on every element can clutter the HTML and lead to unnecessary complexity. It’s important to apply data-testid selectively, focusing on elements that are critical for testing, to avoid introducing unnecessary overhead. Conclusion Using data-testid attributes in front-end development is highly beneficial from a QA standpoint. It provides reliable locators, promotes best practices, and improves collaboration between development and QA teams. The impact on automation development is overwhelmingly positive, resulting in more robust, maintainable, and scalable automated test suites. However, it’s essential to use this approach judiciously to avoid unnecessary overhead. References playwright-locate-by-test-id Cypress-locate-by-test-id Selenium-locate-by-test-id
When we, developers, find some bugs in our logs, this sometimes is worse than a dragon fight! Let's start with the basics. We have this order of severity of logs, from most detailed to no detail at all: TRACE DEBUG INFO WARN ERROR FATAL OFF The default severity log for your classes is INFO. You don't need to change your configuration file (application.yaml). logging: level: root: INFO Let's create a sample controller to test some of the severity logs: @RestController @RequestMapping("/api") public class LoggingController { private static final Logger logger = LoggerFactory.getLogger(LoggingController.class); @GetMapping("/test") public String getTest() { testLogs(); return "Ok"; } public void testLogs() { System.out.println(" ==== LOGS ==== "); logger.error("This is an ERROR level log message!"); logger.warn("This is a WARN level log message!"); logger.info("This is an INFO level log message!"); logger.debug("This is a DEBUG level log message!"); logger.trace("This is a TRACE level log message!"); } } We can test it with HTTPie or any other REST client: Shell $ http GET :8080/api/test HTTP/1.1 200 Ok Checking in Spring Boot logs, we will see something like this: PowerShell ==== LOGS ==== 2024-09-08T20:50:15.872-03:00 ERROR 77555 --- [nio-8080-exec-5] LoggingController : This is an ERROR level log message! 2024-09-08T20:50:15.872-03:00 WARN 77555 --- [nio-8080-exec-5] LoggingController : This is a WARN level log message! 2024-09-08T20:50:15.872-03:00 INFO 77555 --- [nio-8080-exec-5] LoggingController : This is an INFO level log message! If we need to change it to DEBUG all my com.boaglio classes, we need to add this info to the application.yaml file and restart the application: logging: level: com.boaglio: DEBUG Now repeating the test, we will see a new debug line: PowerShell ==== LOGS ==== 2024-09-08T20:56:35.082-03:00 ERROR 81780 --- [nio-8080-exec-1] LoggingController : This is an ERROR level log message! 2024-09-08T20:56:35.082-03:00 WARN 81780 --- [nio-8080-exec-1] LoggingController : This is a WARN level log message! 2024-09-08T20:56:35.083-03:00 INFO 81780 --- [nio-8080-exec-1] LoggingController : This is an INFO level log message! 2024-09-08T20:56:35.083-03:00 DEBUG 81780 --- [nio-8080-exec-1] LoggingController : This is a DEBUG level log message! This is good, but sometimes we are running in production and we need to change from INFO to TRACE, just for quick research. This is possible with the LoggingSystem class. Let's add to our controller a POST API to change all logs to TRACE: @Autowired private LoggingSystem loggingSystem; @PostMapping("/trace") public void setLogLevelTrace() { loggingSystem.setLogLevel("com.boaglio",LogLevel.TRACE); logger.info("TRACE active"); testLogs(); } We are using the LoggingSystem.setLogLevel method, changing all logs from the package com.boaglio to TRACE. Let's try to call out POST API to enable TRACE: Shell $ http POST :8080/api/trace HTTP/1.1 200 Now we can check that the trace was finally enabled: PowerShell 2024-09-08T21:04:03.791-03:00 INFO 82087 --- [nio-8080-exec-3] LoggingController : TRACE active ==== LOGS ==== 2024-09-08T21:04:03.791-03:00 ERROR 82087 --- [nio-8080-exec-3] LoggingController : This is an ERROR level log message! 2024-09-08T21:04:03.791-03:00 WARN 82087 --- [nio-8080-exec-3] LoggingController : This is a WARN level log message! 2024-09-08T21:04:03.791-03:00 INFO 82087 --- [nio-8080-exec-3] LoggingController : This is an INFO level log message! 2024-09-08T21:04:03.791-03:00 DEBUG 82087 --- [nio-8080-exec-3] LoggingController : This is a DEBUG level log message! 2024-09-08T21:04:03.791-03:00 TRACE 82087 --- [nio-8080-exec-3] LoggingController : This is a TRACE level log message! And a bonus tip here, to enable DEBUG or TRACE just for the Spring Boot framework (which is great sometimes to understand what is going on under the hood), we can simply add this to our application.yaml: Shell debug:true or trace: true Let the game of trace begin!