Testing, Tools, and Frameworks Resources

DZone's Featured Testing, Tools, and Frameworks Resources

Integrating Lighthouse Test Automation Into Your CI/CD Pipeline

By maria bueno

Web performance can make or break your digital presence. While developers constantly push new features and updates, maintaining consistent quality across deployments remains a challenge. Lighthouse test automation has emerged as a powerful solution, transforming how development teams approach quality assurance and performance optimization. Understanding Lighthouse Test Automation Fundamentals Lighthouse test automation serves as the foundation for comprehensive performance testing. When integrated into continuous integration workflows, Google Lighthouse provides consistent, objective measurements of web application performance. This integration enables teams to catch performance regressions before they impact users. Streamlining Quality Assurance With Automated Testing Tools The implementation of automated testing tools through Lighthouse creates a robust testing environment. Development teams can establish performance budgets and automatically verify that new code meets established standards before deployment. Metric CategoryWhat Lighthouse TestsImpact on User ExperiencePerformanceLoading speed, interactivityDirect user satisfactionAccessibilityWCAG complianceInclusive user experienceBest PracticesSecurity, browser compatibilityTechnical reliabilitySEOSearch engine optimizationVisibility and reachPWAProgressive web app readinessMobile experience Maximizing Efficiency Through Lighthouse Test Automation Lighthouse test automation significantly reduces the manual effort required for performance testing. By automating these checks, teams can: Identify performance bottlenecks early in developmentMaintain consistent quality standards across deploymentsGenerate detailed reports for stakeholder communicationTrack performance trends over time Enhancing Development Workflows With Automated Testing Tools The integration of automated testing tools transforms traditional development pipelines. Teams can establish clear performance criteria and automatically prevent deployments that don't meet these standards, ensuring consistent quality across releases. Optimizing Performance With Google Lighthouse Integration Google Lighthouse provides comprehensive insights into various aspects of web application performance. When automated through CI/CD pipelines, these tests become an integral part of the development process. Advanced Metrics Through Lighthouse Test Automation The depth of analysis provided by lighthouse test automation extends beyond basic performance metrics. Teams gain insights into: Critical rendering paths Resource optimization opportunities JavaScript execution timing Network utilization patterns Core Web Vitals compliance Implementing Successful Lighthouse Test Automation Strategies Effective implementation of lighthouse test automation requires thoughtful planning and execution. Development teams must begin by setting realistic performance budgets that align with business objectives and user expectations. Establishing clear baseline metrics provides a foundation for measuring progress, while carefully defined failure thresholds help maintain quality standards without creating unnecessary deployment blockers. Teams should develop comprehensive response protocols for test failures, ensuring quick resolution of performance issues when they arise. This systematic approach to implementation ensures that automated testing becomes an asset rather than a bottleneck in the development process. Measuring Success With Automated Testing Tools Automated testing tools provide sophisticated metrics that enable teams to track progress comprehensively. Through detailed performance monitoring, teams can observe improvements in load times, user interaction metrics, and overall application responsiveness. These tools generate quantifiable data about accessibility compliance rates, demonstrating progress toward inclusive design goals. Best practices adherence can be tracked over time, showing the maturation of development processes. SEO optimization levels provide insight into the application's visibility potential, while performance scores create a clear picture of user experience improvements. This data-driven approach to measuring success ensures that teams can demonstrate concrete value from their testing automation investments. Future-Ready Testing With Lighthouse Test Automation As web technologies evolve, lighthouse test automation continues to adapt and provide relevant insights. The integration with CI/CD pipelines ensures that performance testing remains current with emerging web standards and best practices. Scaling Quality Assurance Through Lighthouse Test Automation The scalability of lighthouse test automation makes it particularly valuable for growing applications. Teams can maintain consistent quality standards even as their applications become more complex and their user base grows. Maximizing ROI Through Strategic Implementation The return on investment from lighthouse test automation becomes evident through:Reduced manual testing time Earlier detection of performance issues Improved user satisfaction metrics Enhanced search engine rankings Lower maintenance costs Best Practices for Lighthouse Integration To maximize the benefits of Google Lighthouse automation: Configure tests for both mobile and desktop environmentsSet appropriate thresholds based on business requirementsImplement detailed logging and monitoringEstablish clear remediation protocols Integrating lighthouse test automation into your CI/CD pipeline represents a strategic investment in your application's quality and performance. By leveraging these automated testing capabilities, development teams can maintain high standards while focusing on innovation and feature development. The key to success lies in thoughtful implementation, clear performance standards, and consistent monitoring of results. With proper setup and maintenance, lighthouse test automation becomes an invaluable tool for ensuring sustainable application quality and performance. More

The Power of Docker and Cucumber in Automation Testing

By naga Harini Kodey

Automation testing is a must for almost every software development team. But when the automation suite consists of many scenarios, the running time of automation suites tends to increase a lot, and sometimes, rather than helping a team to reduce the turnaround time of testing, it doesn’t help in a much-expected way. Thus, there is a need for parallelization of the automation suite. With parallelization comes another difficult thing. Running the automation suite parallelly is not much cheaper. It requires a bigger infrastructure to run the suite. With all these things, we still have one solution that comes to mind: to reduce the cost and the running time of the automation suite, i.e., utilizing the docker technology, which will act as a different architecture but comes with a much cheaper or almost no cost. So, in today’s article, we will discuss how to achieve our goal of reducing the total turnaround time of the testing team with the help of automation testing utilizing technologies like Docker and Cucumber. Why Cucumber? There are a lot of automation frameworks available in the market today to be chosen, but in all of those, Cucumber stands out as one of the best. It allows automation tests to be written from a business-oriented perspective, concealing complex coding logic and presenting the tests as human-readable sentences. This approach ensures that even individuals without automation development expertise can easily understand the tests and grasp the expected outcomes of these tests. Combining Docker and Cucumber Docker is a containerization technology that offers a closed environment for applications to operate. Among its many benefits, reliability stands out as a key advantage. By functioning as an isolated environment, Docker ensures that the application can access all the necessary artifacts within the closed environment, making the application more efficient and reliable. In automation suites, very often we see the flaky execution of the automation scripts. With Docker, the automation scripts run in a closed environment that helps to provide consistency and reliability to the automation run and helps the testing team with consistent automation runs. The combination of Docker and Cucumber makes a good automation suite, both from the reliability and ease of understanding point of view. On top of that, Docker allows us to run the automation suite in parallel to reduce the automation suite running time. Here are some of the benefits of using Docker and Cucumber together: Docker offers parallel execution of its containers, allowing the automation suite to run concurrently, thereby reducing its total execution time.Cucumber facilitates the use of a human-readable language, thereby enhancing the clarity and comprehension of test scenario workflow.Docker offers scalability and flexibility, enabling the developers to efficiently manage extensive automation suites. Docker and Cucumber streamline the long-term maintenance of automation suites, allowing for seamless updates to dependencies and scenarios. The Architecture of the Cucumber and Docker-Based Automation Framework In this article, we will use the WDIO framework as the base automation framework, with the entire suite’s architecture built around it. To facilitate parallel execution in our automation suite, a Selenium Grid-like architecture can easily be developed using Docker. Here are some of the key aspects of the automation suite architecture: 1. Docker Selenium Grid Architecture The Docker Selenium Grid architecture can easily be formed utilizing the Docker images provided by Selenium. We can use the official Selenium Grid Hub and the browser node’s docker images to form the Selenium Grid Architecture. 2. Docker Image of Automation Suite To utilize the efficiency and reliability of docker technology in our automation suite, an image of our automation suite code can be created. This docker image of our automation code can be created by creating a Docker file that holds the information about the code and a base Node.js image, on top of which the automation suite image will be built. 3. Docker Compose File Docker-compose file is one of the best features available to handle the containerization architecture. The Docker compose file is responsible for all the internal networking and the required ports, environment variables, and volume mounting. This file can be used to scale up or down the required containers of any specific service, i.e., this file will be responsible for orchestrating the actual Selenium Grid architecture and allowing us to run the multiple instances of our automation code to connect with the different browser nodes available, just like a real Selenium Grid architecture. How to Create a Docker Image of the Automation Suite A docker image is nothing but a YAML file template that tells the machine a set of instructions. A docker image contains multiple steps to form a bundle of the whole code that fulfills a specific task. So, creating a docker image of an automation suite is not a tedious task. All the docker images require a base image on top of which the whole image would be built. In our use case, we will be using a Node.js alpine image as a base image. The alpine images are a lightweight form of an actual image that reduces the image size and makes the image build and run process very fast and memory efficient. Here’s a sample of the automation codes docker image: Plain Text FROM node:20-alpine # Install Python 3 and update PATH RUN apk add --no-cache python3 # Set the Python 3 binary as the default python binary RUN ln -s /usr/bin/python3 /usr/local/bin/python # Add Python 3 binary location to the PATH environment variable ENV PATH="/usr/local/bin:${PATH}" # Install build tools RUN apk add --no-cache make g++ WORKDIR /cucmber-salad ADD . /cucmber-salad # Install all the required libraries using npm install RUN apk add openjdk8 curl jq && npm install # Use the Feature Name as Environment Variable ENV FEATURE=**/*.feature ENV ENVIRONMENT=staging ENV TAG=@Regression ENV CHROME_VERSION=109.0.5414.74 ENV HOST=***.***.**.*** Docker-Compose File Setup The docker-compose file is the main configuration file. It represents the overall architecture of the setup and helps organize the whole suite. A traditional docker-compose file consists of different services that connect and work together to form a network. In our case, the Selenium Hub, Browser Nodes, and the automation codes image reside in the docker-compose file. These images are responsible for constructing the Selenium Grid architecture and running the automation suite in parallel. A sample of the docker-compose file: Plain Text version: "3" services: hub1: image: seleniarm/hub:latest ports: - "4442:4442" - "4443:4443" - "5554:4444" chrome1: image: seleniarm/node-chromium:latest shm_size: '1gb' depends_on: - hub1 environment: - SE_EVENT_BUS_HOST=hub1 - SE_EVENT_BUS_PUBLISH_PORT=4442 - SE_EVENT_BUS_SUBSCRIBE_PORT=4443 - HUB_HOST=hub1 - SE_NODE_MAX_SESSIONS=2 - VNC_NO_PASSWORD=1 With this docker-compose file, we can configure the Selenium grid architecture in one simple command : Plain Text docker-compose up-d --scale chrome1=2 --scale chrome2=2 hub1 chrome1 hub2 chrome2 This command runs the two hubs and chrome nodes, and these chrome nodes can accommodate one automation image for each. The Scale flag enables multiple instances of a docker-compose service. Conclusion By leveraging Docker, Cucumber significantly reduces the execution time of the automation suite, thereby reducing the overall turnaround time during the testing phase of the software development cycle. With this setup, our automation suite running instances and Grid architecture looks like this: More

Thoughts On the Software Crisis

By Oleksandr Khrustalov

Optimizing Performance in Azure Cosmos DB: Best Practices and Tips

By Muhammad Imran Ansari

Setting Up a ScyllaDB Cluster on AWS Using Terraform

By Oleg Lipin

Load Testing Essentials for High-Traffic Applications

Today’s applications must simultaneously serve millions of users, so high performance is a hard requirement for this heavy load. When you consider marketing campaigns, seasonal spikes, or social media virality episodes, this demand can overshoot projections and bring systems to a grinding halt. To that end, monitoring performance and load testing has become an integral part of app development and deployment: it mimics real application performance under stress, and with this kind of testing, teams can make sure their apps are ready to scale up in times of demand and avoid bottlenecks before users get hurt by them. The Critical Importance of Load Testing for High-Traffic Applications As I already mentioned, load testing simulates high application traffic to check performance in critical situations. For instance, e-commerce sites, financial services, and media streaming platforms are particularly sensitive to traffic spikes, so they must make good use of load testing to ensure system readiness for just about anything. There’s no way of knowing if a shopping app can handle a Black Friday event and not result in a frustrating and stressful experience for shoppers without extensive load testing months in advance. But the purpose of load testing isn’t just to handle spikes in demand: it’s to identify performance bottlenecks and proactively work on APIs, databases, or server configurations to improve their performance in all types of scenarios, not just traffic spikes. Load testing, in my personal experience, was instrumental in the introduction of a new service that was to store customer payment card information for a large e-commerce retailer. Preliminary tests indicated it was nearly at the maximum supported by the Network Load Balancer, which was useful in trying to avoid slowdowns or outages because of sudden surges in traffic, such as those happening in peak shopping periods. What we did was upgrade to a more powerful host type in the short term to absorb the increased load and devise a plan to scale the load balancer itself for the long term, which allowed us to distribute the traffic even better as the system scaled. This ensured smooth payment processing at even very high-demand events, such as flash sales or seasonal campaigns. The key learning was to design infrastructure limits in advance, not just when such limits are reached. Understanding Various Types of Load Testing The methods of load testing are different and directed at different goals. Baseline testing shows normal-load performance and provides a benchmark for all further comparisons. Stress testing pushes systems to their limits, exposing failure thresholds and guaranteeing controlled, nondestructive failures. Spike testing simulates sudden surges in traffic, which is key for flash sales or major events, while soak or endurance testing reveals long-term issues like memory leaks by sustaining steady high loads. As an example, spike tests can help online gaming platforms detect login service bottlenecks in advance of a major in-game event. Similarly, a streaming service anticipating a surge at the launch of a show can run spike tests to test the responsiveness of auto-scaling. In one such case, tests showed that while capacity was adequate, scaling lagged behind sudden demand. It preheated the system and tuned the auto-scale policies to respond much more quickly. This ensured a seamless experience at launch, showing that raw capacity is not enough; responsiveness and proper scaling strategies are key to handling unpredictable traffic spikes. Approaching Load Testing: Essential Steps Just pounding the system with traffic is hardly the right approach to load testing. Take a more structured route in order to get actually useful information; that’s what’s going to result in real-world improvements. Do you want to improve response times, error rates, throughput, or resource usage? Well-defined goals help teams firm up test designs and tell what metrics are most useful to track. With clear goals, teams can construct actual usage scenarios that imitate users’ habits. A certain eCommerce application would possibly want to simulate user experiences with browsing, adding items to the cart, and subsequently checking out to get a better feel for how it would behave in the real world. Gradually adding the load identifies the point beyond which performance degradation would occur. Teams are allowed, by gradually adding requests or users, to find the exact points of degradation. The metrics monitored during the testing in general include response times, error rates, CPU and memory usage, database query time, and network latency. For instance, video streaming services run soak tests for hours while monitoring memory usage and server resources over time. This kind of test will reveal memory leaks or performance degradation that may not pop up in shorter tests. When launching a service to evaluate customer access for a streaming platform, we established a performance baseline to determine how much throughput a single host could handle before critical resources were overutilized. By simulating user interactions and gradually increasing load, we identified the maximum throughput threshold, which guided infrastructure planning and ensured cost-efficient scaling for high-traffic events. Best Practices for Effective Load Testing Ensuring load tests follow best practices, meaningful and actionable results are ensured; testing in a production-like setting provides more accurate data; integrating load tests into their CI/CD pipelines enables confirmation that each new release will meet performance standards. Realistic data sets and traffic patterns, including peak periods, make the tests far more relevant. Systems must degrade gracefully under load, holding core functions even if non-core components falter. For example, an e-payment gateway embeds the load testing feature in their CI/CD pipeline: any new feature automatically triggers some load tests, simulating several thousand transactions to see that the code is able to sustain the expected workloads. A streaming platform likewise embeds spike, soak, and throughput, continuously monitoring metrics such as response times, memory usage, CPU utilization, and throughput with every change made. Continuous testing catches issues early. A new dependency might reduce throughput, prompting baseline updates. Unexpected problems — like excessive logging draining resources or a memory leak surfacing under prolonged load — are detected before deployment. This ongoing feedback loop helps distinguish minor adjustments from genuine regressions, ensuring scalability, stability, and reliability in production. Choosing the Right Load Testing Tools and Frameworks Choosing the right load testing tooling and frameworks ensures full, effective testing and provides insightful feedback. The decision lies in the test objective, architecture of the system, and operation requirements. Apache JMeter supports distribution in tests for APIs and databases; Gatling can handle very large HTTP simulations, while k6 integrates nicely into your CI/CD pipelines. Locust does user journeys in Python. BlazeMeter extends JMeter tests to large-scale cloud-based scenarios, while AWS Fault Injection Simulator (FIS) enables injecting controlled disruptions-like network throttling or instance termination, to assess resilience and recovery. JMeter and k6 have been used in testing a customer access system for a streaming platform. This system had heavy loads and spikes in traffic. These tools helped quantify capacity. Beyond handling peak traffic, FIS allowed the simulation of real-world failures. For instance, latency spikes in upstream services indicated that more aggressive retry logic was required to handle delays much more quickly. Similarly, the simulation of sudden failures of the EC2 instances highlighted areas where the auto-scaling policies needed changes for rapid recovery. This blend of traditional load tests and failure-injection scenarios helped the system stay reliable, responsive, and friendly under adverse conditions. Overcoming the Common Challenges of Load Testing From simulating realistic traffic to managing testing costs, load testing is fraught with challenges. Tests should represent real user behavior, and it is best to use production data and a production-like environment. In the case of external dependencies, service virtualization or mock services can represent third-party APIs and introduce latency and failures without affecting the live system. Cloud-based solutions like BlazeMeter or k6 provide scalable, pay-as-you-go resources for large-scale tests. In such dynamically changing systems, such as a retail order processing platform, a dynamic, automated approach will sustain effective load tests. Identify the key elements that will make up tests, such as payment gateway APIs, database schemas, host types, and logic for order processing. Detect changes via automated triggers that update and reconfigure the tests by shifting thresholds and configuration. Rather than discrete targets, such as "500 orders/second," tests use ranges, like "475–525 orders/second," allowing for natural variation. This automated recalibration process streamlines updates when system changes occur. For example, a payment provider’s API update might increase checkout latency, prompting threshold adjustments. Integration with CI/CD pipelines ensures alerts are raised for host migrations or runtime upgrades, prompting a reevaluation of load test configurations. When a host-type upgrade resulted in minor increases in checkout latency, the recalibration process identified garbage collection settings as the root cause and allowed for rapid optimizations. With dynamic benchmarks, automated detection, and proactive recalibration, the system remains fast, stable, and ready for peak traffic. The Benefits of Continuous Load Testing In dynamic environments where code updates are frequent, besides the ever-changing user behavior, continuous load testing becomes very important in sustaining application performance. Integrating load testing into the development lifecycle ensures performance issues are caught early before they impact users. Regular load testing makes teams understand how exactly the performance of an application is trending over time, especially in relation to new features, code adjustments, or changes in infrastructure. Continuous load testing allows applications to meet the shifting trends of traffic and seasonal peaks occurring within all high-traffic applications. This would be a financial service provider that integrates load testing into its CI/CD pipeline, ensuring that every time new features are released, the transaction-processing system maintains the expected load at the end. In this case, the company can ensure nonstop testing that keeps it reliable and resilient, even within an ever-changing feature set. Conclusion Load testing ensures that the high-traffic applications are resilient, scalable, and reliable under varied conditions. Therefore, it can accurately locate any potential bottlenecks by emulating real-life traffic, thus enabling performance optimization. In this way, the application is prepared for peak usage, ensures seamless experiences, and supports business growth. With the growing use of ever-evolving applications and increasing expectations by users, load testing ensures that performance is proactively sustained and enables businesses to cope with today’s digital demands.

By Abhishek Vajarekar

Kubernetes Ephemeral Containers: Enhancing Security and Streamlining Troubleshooting in Production Clusters

Ephemeral containers in Kubernetes are a powerful feature that allows operators to debug and troubleshoot running Pods by creating short-lived containers within the same Pod. This is particularly helpful for issues that cannot be replicated in a separate environment. By using ephemeral containers, you can attach a container to a running Pod, inspect the file system, network settings, or running processes, and then discard the container without affecting the Pod’s primary containers. What Are Ephemeral Containers? Ephemeral containers are special containers that do not run as part of an application workload but are instead added to an existing Pod for the purpose of debugging. They share the same resources (network namespace, volumes, etc.) as the other containers in the Pod, making them ideal for real-time diagnosis. Once debugging is complete, the ephemeral container can be removed without needing to recreate the entire Pod. Key Points Short-lived: Ephemeral containers are meant only for debugging or troubleshooting.Non-disruptive: They do not impact existing application containers in the Pod.Resource-Sharing: They share resources like storage volumes and network namespaces with the Pod, making debugging more powerful. Security Considerations With Ephemeral Containers Ephemeral containers provide a safer debugging approach by limiting prolonged access to production Pods. You can enforce strict RBAC rules so only authorized users can add and run ephemeral containers, minimizing the window for potential threats. Because these containers vanish once debugging is done, the attack surface is reduced, reinforcing overall cluster security. Use Cases Troubleshooting Application Crashes: When you need to inspect logs or run debugging tools in a crashed or crashing container, ephemeral containers let you enter a running environment without altering the main container’s configuration.Network Debugging: You can install debugging tools (e.g., tcpdump, netstat) in the ephemeral container to diagnose network issues within the Pod’s network namespace.Live File System Checks: If you suspect file corruption or incorrect file paths, ephemeral containers let you check the file system in real time. Prerequisites Kubernetes Version: Ephemeral containers require at least Kubernetes 1.23+ where the EphemeralContainers feature is generally available (GA). On older Kubernetes versions, you might need to enable the feature gate EphemeralContainers.kubectl: Make sure your local kubectl client is at least the same or newer version than your cluster’s control plane.Sufficient RBAC Permissions: You need permission to use the kubectl debug command and to update Pods (the ephemeral container is added via an update to the Pod’s specification). Step-by-Step Guide: Using Ephemeral Containers Below is a generalized process that will work on any Kubernetes environment, including EKS (Elastic Kubernetes Service on AWS), AKS (Azure Kubernetes Service), GKE (Google Kubernetes Engine), or on-premises clusters. We will focus on the kubectl debug command, which is the primary mechanism for adding ephemeral containers. Verify Your Cluster’s Configuration Shell kubectl version Ensure that your Server Version is at least 1.23.Confirm that your Client Version is also compatible. If you are on a managed environment like EKS or AKS, check the cluster version from your cloud provider’s dashboard or CLI to ensure it’s 1.23 or later. Identify the Pod You Want to Debug List Pods in a specific namespace: Shell kubectl get pods -n <your-namespace> Pick the Pod name you need to troubleshoot, for example: my-app-pod-abc123. Add an Ephemeral Container Using kubectl debug Use the kubectl debug command to add an ephemeral container. For example, we’ll use a simple Ubuntu image: Shell kubectl debug my-app-pod-abc123 -n <your-namespace> \ --image=ubuntu \ --target=my-container \ --interactive=true \ --tty=true Here’s a breakdown of the flags: my-app-pod-abc123: The name of the existing Pod.--image=ubuntu: Docker image to use for the ephemeral container.--target=my-container: (Optional) Specifies which container in the Pod you want to target for namespace sharing. Typically, this is the main container in the Pod.--interactive=true and --tty=true: Allows you to get a shell session inside the ephemeral container. Once you run the above, you will get a shell prompt in the ephemeral container inside the existing Pod. You can now run debugging commands like ls, ps, netstat, or install extra packages. Confirm Ephemeral Container Creation In another terminal, or after exiting the ephemeral container’s shell, run: Shell kubectl get pod my-app-pod-abc123 -n <your-namespace> -o yaml You should see a new section under spec or status describing the ephemeral container. Debug and Troubleshoot From within the ephemeral container, you can: Check logs or app configuration.Use debugging tools like curl, wget, telnet to verify network connectivity.Inspect environment variables to confirm your application’s configuration. Shell # Examples curl http://localhost:8080/health env | grep MY_APP_ ps aux Clean Up Ephemeral Containers Ephemeral containers are removed automatically when the Pod is destroyed or after you remove them manually. To remove the ephemeral container from the Pod without destroying the entire Pod (on supported versions), you can patch the Pod spec. However, typically ephemeral containers are not meant to be long-lived. Once you delete the Pod or scale down your deployment, the ephemeral container will also be removed. Specific Notes for Managed Services Amazon EKS Ensure your EKS cluster is running Kubernetes 1.23 or higher.Confirm that the IAM permissions allow you to perform kubectl debug.If you’re using older EKS versions (1.22 or earlier), you’ll need to enable the EphemeralContainers feature gate. Azure AKS Use the Azure CLI (az aks update) to upgrade your AKS cluster to a compatible version if necessary. Confirm your kubectl context is set to the AKS cluster: Shell az aks get-credentials --resource-group <rg-name> --name <cluster-name> Other Managed or On-Prem Clusters Check your cluster’s documentation or ask your provider to confirm ephemeral containers are enabled.Most modern on-prem solutions (OpenShift, Rancher, etc.) have ephemeral containers enabled by default from Kubernetes 1.23 onwards, but you may have to manually enable the feature gate if you are on an older version. Best Practices Use Minimal Images: Choose lightweight images to reduce overhead (e.g., busybox, distroless debugging images).Restrict RBAC: Limit who can create ephemeral containers to minimize potential security risks.Log All Debug Sessions: Keep track of ephemeral container usage for auditing and compliance.Don’t Depend on Ephemeral Containers: They’re for debugging only. If you need a permanent sidecar or helper container, configure it in the Pod spec from the start. Conclusion Ephemeral containers are a versatile and powerful way to troubleshoot issues in real time without impacting the primary application containers. Whether you’re running Kubernetes on EKS, AKS, on-prem, or another managed solution, understanding and using ephemeral containers can significantly decrease your mean-time-to-recovery (MTTR) and improve operational efficiency. They complement traditional troubleshooting methods and should be part of any platform team’s toolkit for diagnosing complex application issues. By following the steps outlined above, you can confidently deploy ephemeral containers in your environment and streamline your debugging processes. Author’s Note: Drawn from real-world Kubernetes troubleshooting, this guide aims to help you debug Pods swiftly and without disruption in Production Environments.

By Sai Sandeep Ogety

CORE

How to Test PATCH Requests for API Testing With Playwright Java

Automated API testing offers multiple benefits, including speeding up the testing lifecycle and providing faster feedback. It helps in enhancing the efficiency of the APIs and allows teams to deliver the new features speedily to the market. There are multiple tools and frameworks available in the market today that offer automation testing of the APIs, including Postman, Rest Assured, SuperTest, etc. The latest entry on this list is the Playwright framework, which offers API and Web Automation Testing. In this tutorial blog, we will discuss and cover the following points: What is a PATCH API request?How do you test PATCH API requests in automation testing using Playwright Java? Getting Started It is recommended to check out the earlier tutorial blog to know about the details related to prerequisite, setup and configuration. Application Under Test We will be using the free-to-use RESTful e-commerce APIs available over GitHub. This application can be set up using NodeJS or Docker. It offers multiple APIs related to order management functionality that allows creating, retrieving, updating, and deleting orders. What Is a PATCH Request? A PATCH request is used for partially updating a resource. It is the same as a PUT request. However, the difference is that PUT requires the whole request body to be sent in the request, while with PATCH, we can send only the required fields in the request that need to be updated. Another difference between a PUT and a PATCH request is that a PUT request is always idempotent; that is, making the same request repeatedly does not change the state of the resource, whereas a PATCH request may not always be idempotent. The following is an example of updating the order with a PATCH request using the RESTful e-commerce API. The same PATCH API will be further used in this blog to write the automation tests using Playwright Java. PATCH (/partialUpdateOrder/{id}) This partially updates the order using its Order ID. This API needs the id i.e., order_id as the Path Parameter to check for the existing order to partially update it. The partial details of the order must be supplied in the JSON format in the request body. Since it is a PATCH request, we just need to send the fields that we need to update; all other details do not need to be included in the request. Additionally, as a security measure, a valid Authentication token must be supplied with the PATCH request, otherwise the request will fail. The PATCH request will return the updated order details in the response with a Status Code 200. In case the update using the PATCH request fails, based on the criteria, the following status codes will be displayed: Status CodeCriteria404 When there are no order for the respective order_idsupplied to update the order400Token Authenication Fails /Incorrect request body or No request body is sent in the request403No Authentication is supplied while sending the request How to Test PATCH APIs Using Playwright Java Playwright offers the required methods that allow performing API testing seamlessly. Let’s now delve into writing the API automation tests for PATCH API requests using Playwright Java. The PATCH API ( /partialUpdateOrder/{id}) will be used for updating the order partially. Test Scenario: Update Order Using PATCH Start the RESTful e-commerce service.Use POST requests to create some orders in the system.Update the product_name, product_amt and qty of order_id - "1."Check that the Status Code 200 is returned in the response.Check that the order details have been updated correctly. Test Implementation To update the order partially, we need to send in the request body with partial fields to update and the authorization token. This token ensures that a valid user of the application is updating the order. 1. Generate the Authentication Token The token can be generated using the POST /auth API endpoint. This API endpoint needs the login credentials to be supplied in the request body. The valid login credentials are as follows: Field nameValueUsernameadminPasswordsecretPass123 On passing these valid login credentials, the API returns the JWT token in the response with Status Code 200. We would be generating and using the token using the getCredentials() method from the TokenBuilder class that is available in the testdata package. Java public static TokenData getCredentials() { return TokenData.builder().username("admin") .password("secretPass123") .build(); } This getCredentials() method returns a TokenData object containing the username and password fields. Java @Getter @Builder public class TokenData { private String username; private String password; } Once the token is generated it can be used in the PATCH API request for partially updating the order. 2. Generate the Test Data for Updating Order The next step in updating the order partially is to generate the request body with the required data. As discussed in the earlier blog of POST request tutorial, we would be adding a new method getPartialUpdatedOrder() in the existing class OrderDataBuilder that generates that test data on runtime. Java public static OrderData getPartialUpdatedOrder() { return OrderData.builder() .productName(FAKER.commerce().productName()) .productAmount(FAKER.number().numberBetween(550,560)) .qty(FAKER.number().numberBetween(3, 4)) .build(); } This method will use only three fields, which are product_name , product_amount and qty and accordingly, use them to generate a new JSON object that would be passed on as the request body to the PATCH API request. 3. Update the Order Using PATCH Request We have come to the final stage now, where we will be testing the PATCH API request using Playwright Java. Let’s create a new test method testShouldPartialUpdateTheOrderUsingPatch() in the existing HappyPathTests class. Java @Test public void testShouldPartialUpdateTheOrderUsingPatch() { final APIResponse authResponse = this.request.post("/auth", RequestOptions.create().setData(getCredentials())); final JSONObject authResponseObject = new JSONObject(authResponse.text()); final String token = authResponseObject.get("token").toString(); final OrderData partialUpdatedOrder = getPartialUpdatedOrder(); final int orderId = 1; final APIResponse response = this.request.patch("/partialUpdateOrder/" + orderId, RequestOptions.create() .setHeader("Authorization", token) .setData(partialUpdatedOrder)); final JSONObject updateOrderResponseObject = new JSONObject(response.text()); final JSONObject orderObject = updateOrderResponseObject.getJSONObject("order"); assertEquals(response.status(), 200); assertEquals(updateOrderResponseObject.get("message"), "Order updated successfully!"); assertEquals(orderId, orderObject.get("id")); assertEquals(partialUpdatedOrder.getProductAmount(), orderObject.get("product_amount")); assertEquals(partialUpdatedOrder.getQty(), orderObject.get("qty")); } This method will first hit the Authorization API to generate the token. The response from the Authorization API will be stored in the authResponseObject variable that will further be used to extract the value from the token field available in response. The request body required to be sent in the PATCH API will be generated and will be stored in the partialUpdateOrder object. This is done so we can use this object further for validating the response. Next, the token will be set using the setHeader() method, and the request body object will be sent using the setData() method. The PATCH API request will be handled using the patch() method of Playwright, which will allow partial updating of the order. Response body: JSON { "message": "Order updated successfully!", "order": { "id": 1, "user_id": "1", "product_id": "1", "product_name": "Samsung Galaxy S23", "product_amount": 5999, "qty": 1, "tax_amt": 5.99, "total_amt": 505.99 } } The response received from this PATCH API will be stored in the response variable and will be used further to validate the response. The last step is to perform assertions, the response from the PATCH API returns a JSON object that will be stored in the object named updateOrderResponseObject. The message field is available in the main response body. Hence, it will be verified using the updateOrderResponseObject calling the get() method that will return the value of the message field. The JSON object order received in the Response is stored in the object named orderObject that will be used for checking the values of the order details. The partialUpdateOrder object that actually stores the request body that we sent to partially update the order will be used as expected values, and the orderObject will be used for actual values finally performing the assertions. Test Execution We will be creating a new testng.xml file ( testng-restfulecommerce-partialupdateorder.xml) to execute the test sequentially, i.e., first calling the POST API test to generate orders and then calling the PATCH API test to partially update the order. XML <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE suite SYSTEM "http://testng.org/testng-1.0.dtd"> <suite name="Restful ECommerce Test Suite"> <test name="Testing Happy Path Scenarios of Creating and Updating Orders"> <classes> <class name="io.github.mfaisalkhatri.api.restfulecommerce.HappyPathTests"> <methods> <include name="testShouldCreateNewOrders"/> <include name="testShouldPartialUpdateTheOrderUsingPatch"/> </methods> </class> </classes> </test> </suite> The following test execution screenshot from IntelliJ IDE shows that the tests were executed successfully, and a partial update of the order was successful. Summary PATCH API requests allow updating the resource partially. It allows flexibility to update a particular resource as only the required fields can be easily updated using it. In this blog, we tested the PATCH API requests using Playwright Java for automation testing. Testing all the HTTP methods is equally important while performing API testing. We should perform isolated tests for each endpoint as well as end-to-end testing for all the APIs to make sure that all APIs of the application are well integrated with each other and run seamlessly.

By Faisal Khatri

CORE

Dropwizard vs. Micronaut: Unpacking the Best Framework for Microservices

Microservices architecture has reshaped the way we design and build software, emphasizing scalability, maintainability, and agility. Two frameworks, Dropwizard and Micronaut, have gained prominence in the microservices ecosystem, each offering unique features to simplify and optimize development. In this article, we delve into a detailed comparison to help you determine which framework best suits your needs. Comparison Overview Dropwizard and Micronaut differ significantly in their design philosophies and capabilities: Dropwizard is a well-established Java framework that emphasizes simplicity and a "batteries-included" philosophy. It bundles popular libraries like Jetty, Jersey, and Jackson to create production-ready RESTful services quickly.Micronaut, a modern framework, targets cloud-native and serverless applications. It features compile-time dependency injection, AOT (Ahead-of-Time) compilation, and built-in support for reactive programming and serverless deployments. Advantages Dropwizard Mature and reliable: Dropwizard has been around for a long time and has a robust, well-documented ecosystem.Ease of use: With pre-integrated libraries, setting up a project is quick and straightforward.Metrics and monitoring: Dropwizard provides out-of-the-box support for monitoring and performance metrics using the Metrics library. Micronaut Performance: Micronaut’s AOT compilation and compile-time dependency injection reduce startup times and memory usage.Cloud-native features: It offers native integrations for AWS, Google Cloud, and other cloud providers, streamlining serverless deployments.Reactive programming: Micronaut has first-class support for non-blocking, event-driven architectures, improving scalability and responsiveness. Challenges Dropwizard Memory consumption: Dropwizard applications can be more memory-intensive and have longer startup times, making them less ideal for serverless use cases.Limited reactive support: Reactive programming requires additional libraries and configurations, as it is not natively supported. Micronaut Learning curve: Developers used to traditional frameworks like Spring may find Micronaut’s approach unfamiliar at first.Younger ecosystem: Although rapidly evolving, Micronaut’s ecosystem is newer and might not be as stable or extensive as that of Dropwizard. Use Cases Dropwizard Use Cases Building traditional REST APIs where rapid prototyping is crucialApplications requiring robust metrics and monitoring featuresMonolithic or microservices projects where memory usage is not a critical constraint Micronaut Use Cases Cloud-native applications requiring minimal memory and fast startup timesServerless deployments on platforms like AWS Lambda, Google Cloud Functions, or Azure FunctionsReactive microservices designed for scalability and low-latency responses Practical Examples Dropwizard Example Setting up a basic RESTful service in Dropwizard: Java public class HelloWorldApplication extends Application<Configuration> { public static void main(String[] args) throws Exception { new HelloWorldApplication().run(args); } @Override public void run(Configuration configuration, Environment environment) { environment.jersey().register(new HelloWorldResource()); } } @Path("/hello") public class HelloWorldResource { @GET public String sayHello() { return "Hello, Dropwizard!"; } } Micronaut Example Creating a similar service in Micronaut: Java import io.micronaut.http.annotation.*; @Controller("/hello") public class HelloWorldController { @Get public String sayHello() { return "Hello, Micronaut!"; } } Running an Application Dropwizard Set up a Maven project and include the Dropwizard dependencies.Define configuration files for the application.Use the java -jar command to run the Dropwizard service. Micronaut Use the Micronaut CLI to create a new project: mn create-app example.app.Configure any additional cloud or serverless settings if needed.Run the application with ./gradlew run or java -jar. Best Practices Dropwizard Monitor performance: Use the Metrics library to monitor the health and performance of your microservices.Keep it simple: Stick to Dropwizard's opinionated configurations for quicker development and maintenance. Micronaut Optimize for cloud: Leverage Micronaut’s cloud integrations for efficient deployment and scaling.Use GraalVM: Compile Micronaut applications to native images using GraalVM for even faster startup and lower memory usage. Conclusion Both Dropwizard and Micronaut are excellent frameworks for building microservices, but they cater to different needs. Dropwizard is a solid choice for teams seeking a well-integrated, production-ready solution with a mature ecosystem. Micronaut, with its cutting-edge features like AOT compilation and cloud-native support, is ideal for modern, scalable applications. Choosing the right framework depends on your project's specific requirements, including performance needs, deployment strategies, and team expertise. For traditional microservices with a need for reliability and simplicity, Dropwizard shines. For cloud-native and reactive architectures, Micronaut is the clear winner.

By Nilesh Jain

Using the Log Node in IBM App Connect Enterprise

In the world of IBM App Connect Enterprise (ACE), effective logging is crucial for monitoring and troubleshooting. With the introduction of the Log node, it's now easier than ever to log ExceptionList inserts directly into the activity log, which can be viewed from the WebUI. The Log node can be especially valuable, often replacing the Trace node in various scenarios. This article contains two sections: the first will guide you through the process of using the Log node to log these inserts, helping you streamline your debugging and monitoring processes. The second section explores some scenarios that provide Log node hints and tips around default values. Section 1: Logging ACE's ExceptionList Inserts (In the Activity Log With the Log Node) Introducing the Log Node The Log node is a recent addition to ACE (v12.0.11.0), originating from the Designer, which simplifies logging activities. By using the Log node, you can log custom messages directly into the activity log for easy monitoring and follow-up. Understanding the ExceptionList in ACE Since this article mentions and works with the ExceptionList, I’ll recap very quickly what the ExceptionList is. The ExceptionList in ACE is a structured (built-in) way of displaying captured exceptions within your message flows. It provides detailed information about errors, including the file, line, function, type, and additional context that can be invaluable during troubleshooting. As most of you know, the insert fields contain the most usable information so we will be focussing on those. Setting up the Demo Flow To demonstrate how to log ExceptionList inserts using the Log node, we'll set up a simple flow: 1. Create the Flow Add an HTTP Input node.Add an HTTP Request node.Add a Log node.Add an HTTP Reply node. 2. Connect the Nodes Connect the nodes as shown in the diagram to form a complete flow. Configuring the Log Node Next, we need to configure the Log node to capture and log the ExceptionList inserts: 1. Properties Go to the Map Inputs part of the properties.Configure the Log node to include the ExceptionList as a Map input. 2. Configure Open the Configure wizard for the Log Node. Set the basic values for the log message: “Log Level” and “Message detail”. Next, add a new property, give it a name, and set the type to “Array of strings”. Click on “Edit mappings” and click on the left button that looks like a menu with 3 items. Click on “ExceptionList”, scroll down, and select “Insert”. This gives you all the inserts of the ExceptionList. If that is what you want, great — we are done here. But if you only require the insert fields of the last two ExceptionList items, which tend to be the most interesting ones, you can select only those as well. It’s rather important to know that the ExceptionList ordering here is reversed compared to the message tree. So the last two ExceptionList items in the flow are the first two in this JSON representation. 3. Filter Click on Insert and go to “Edit expression”. Change the expression to “$mappingInput_ExceptionList[0..1].Insert”. Do not forget to hit “Save”! Sending a Message Through the Flow To test our configuration, we'll send a message through the flow using the Flow Exerciser: 1. Send Message Open the Flow Exerciser, create a new message, and send it through the HTTP Input node. 2. Monitor the Progress Observe the message flow and ensure the Log node captures and logs the ExceptionList details. To make sure the log message has been written, wait until you get a reply back. Viewing the ExceptionList in the Activity Log Once the message has been processed, you can view the logged ExceptionList inserts in the activity log through the WebUI: 1. Access the Activity Log Navigate to the activity log section in the WebUI to see the logged ExceptionList inserts: Integration Server > Application > Message flows > flow 2. Review Logged Details The activity log should display detailed information about the exceptions captured during the flow execution. The log entry in detail: This is enough to tell me what the issue is. Section 2: ACE Log Node Tips and Tricks, Default Values Having explored how to handle and parse the ExceptionList, let's now examine some scenarios using the Log node. Fallback or Default Values Imagine you want to log a specific field from an incoming message that may or may not be present. Let's use the HTTP Authorization header as an example. If you configure the Log node to use this header as an input parameter, it will either display it or omit it entirely. The syntax to retrieve the Authorization header from your input message is: {{$mappingInput_InputRoot.HTTPInputHeader.Authorization} Apply this to your Log node: When your message contains the header, it appears in the Activity Log Tags. If the field is missing, the tag disappears. This behavior isn’t always ideal and can complicate troubleshooting. Adding default values can help clarify the situation. When you go through the functions, there is no default or coalesce function available in JSONata, then how can you do it? If you would write this in JavaScript, you would simply type the following: Authorization = $mappingInput_InputRoot.HTTPInputHeader.Authorization || UNDEFINED But that doesn’t work in JSONata. What you can do is either one of these: Use a ternary operator expression.Use sequence flattening. Ternary Operator A ternary operator is essentially an IF statement in expression form: condition ? ifTrue : ifFalse Apply this to our JSONata example: {{$mappingInput_InputRoot.HTTPInputHeader.Authorization ? $mappingInput_InputRoot.HTTPInputHeader.Authorization: "FALLBACK"} What happens is that the first parameter, the actual field, is cast to a Boolean and that result is used to choose between the field value or the fallback value. If the field is an empty sequence, empty array, empty object, empty string, zero or null the expression will default to FALLBACK. Sequence Flattening Sequence flattening in JSONata is a useful feature for handling non-existent fields (i.e., an empty sequence). Consider this example: [$field1, $field2, “FALLBACK”][0] The code returns the first value of a flattened sequence. If $field1 has a value, it is returned; otherwise, if $field2 has a value, $field2 is returned. If neither has a value, “FALLBACK” is returned. This functions similarly to a chained COALESCE in ESQL. Here’s how it applies to our example: {{[$mappingInput_InputRoot.HTTPInputHeader.Authorization, "UNDEFINED"][0]} Example Here’s how both options look in a test setup with HEADER_TO and Header_SF: You can see the fields directly from the header as expected: When the fields are unavailable, the output is: These examples might not be the most realistic, but you could use them to determine the type of authentication used (e.g., basic, OAuth) or if a specific message field is filled in. Now that you know how to do it, it's up to you to find the right use case to apply it to. Conclusion By using the Log node to capture and log ExceptionList inserts, you can significantly enhance your ability to monitor and troubleshoot message flows in IBM App Connect Enterprise. This approach ensures that all relevant error details are readily available in the activity log, making it easier to diagnose and resolve issues. Acknowledgment to Dan Robinson and David Coles for their contribution to this article. Resources IBM App Connect Documentation: Log nodeIBM App Connect Documentation: Activity logsJSONata: Sequences"The One Liner If Statement (Kinda): Ternary Operators Explained"IBM App Connect Documentation: Adding entries to the activity log by using a Log node

By Matthias Blomme

How to Automate Blob Deletion in Azure Storage Using PowerShell

Azure storage accounts are a cornerstone for data storage solutions in the Azure ecosystem, supporting various workloads, from storing SQL backups to serving media files. Automating tasks like deleting outdated or redundant blobs from storage containers can optimize storage costs and ensure efficiency. This guide will walk you through using PowerShell to safely and effectively delete blobs from an Azure storage account. Whether you're managing SQL backups, application logs, or other unstructured data, this process can be applied to a wide range of scenarios where cleanup is a routine requirement. New to Storage Account? One of the core services within Microsoft Azure is the storage account service. Many services utilize storage accounts for storing data, such as Virtual Machine Disks, Diagnostics logs (especially application logs), SQL backups, and others. You can also use the Azure storage account service to store your own data, such as blobs or binary data. As per MSDN, Azure blob storage allows you to store large amounts of unstructured object data. You can use blob storage to gather or expose media, content, or application data to users. Because all blob data is stored within containers, you must create a storage container before you can begin to upload data. Step-by-Step Step 1: Get the Prerequisite Inputs In this example, I will delete a SQL database (backed up or imported to storage) stored in bacpac format in SQL container. PowerShell ## prerequisite Parameters $resourceGroupName="rg-dgtl-strg-01" $storageAccountName="sadgtlautomation01" $storageContainerName="sql" $blobName = "core_2022110824.bacpac" Step 2: Connect to Your Azure Subscription Using the az login command with a service principal is a secure and efficient way to authenticate and connect to your Azure subscription for automation tasks and scripts. In scenarios where you need to automate Azure management tasks or run scripts in a non-interactive manner, you can authenticate using a service principal. A service principal is an identity created for your application or script to access Azure resources securely. PowerShell ## Connect to your Azure subscription az login --service-principal -u "210f8f7c-049c-e480-96b5-642d6362f464" -p "c82BQ~MTCrPr3Daz95Nks6LrWF32jXBAtXACccAV" --tenant "cf8ba223-a403-342b-ba39-c21f78831637" Step 3: Check if the Container Exists in the Storage Account When working with Azure Storage, you may need to verify if a container exists in a storage account or create it if it doesn’t. You can use the Get-AzStorageContainer cmdlet to check for the existence of a container. PowerShell ## Get the storage account to check container exist or need to be create $storageAccount = Get-AzStorageAccount -ResourceGroupName $resourceGroupName -Name $storageAccountName ## Get the storage account context $context = $storageAccount.Context Step 4: Ensure the Container Exists Before Deleting the Blob We need to use Remove-AzStorageBlob cmdlet to delete a blob from the Azure Storage container. PowerShell ## Check if the storage container exists if(Get-AzStorageContainer -Name $storageContainerName -Context $context -ErrorAction SilentlyContinue) { Write-Host -ForegroundColor Green $storageContainerName ", the requested container exit,started deleting blob" ## Create a new Azure Storage container Remove-AzStorageBlob -Container $storageContainerName -Context $context -Blob $blobName Write-Host -ForegroundColor Green $blobName deleted } else { Write-Host -ForegroundColor Magenta $storageContainerName "the requested container does not exist" } Here is the full code: PowerShell ## Delete a Blob from an Azure Storage ## Input Parameters $resourceGroupName="rg-dgtl-strg-01" $storageAccountName="sadgtlautomation01" $storageContainerName="sql" $blobName = "core_2022110824.bacpac" ## Connect to your Azure subscription az login --service-principal -u "210f8f7c-049c-e480-96b5-642d6362f464" -p "c82BQ~MTCrPr3Daz95Nks6LrWF32jXBAtXACccAV" --tenant "cf8ba223-a403-342b-ba39-c21f78831637" ## Function to create the storage container Function DeleteblogfromStorageContainer { ## Get the storage account to check container exist or need to be create $storageAccount = Get-AzStorageAccount -ResourceGroupName $resourceGroupName -Name $storageAccountName ## Get the storage account context $context = $storageAccount.Context ## Check if the storage container exists if(Get-AzStorageContainer -Name $storageContainerName -Context $context -ErrorAction SilentlyContinue) { Write-Host -ForegroundColor Green $storageContainerName ", the requested container exit,started deleting blob" ## Remove the blob in Azure Storage container Remove-AzStorageBlob -Container $storageContainerName -Context $context -Blob $blobName Write-Host -ForegroundColor Green $blobName deleted } else { Write-Host -ForegroundColor Magenta $storageContainerName "the requested container does not exist" } } #Call the Function DeleteblogfromStorageContainer Here is the output: Conclusion Automating blob deletion in Azure storage accounts using PowerShell is a practical approach for maintaining a clutter-free and efficient storage system. By following the steps outlined, you can seamlessly integrate this process into your workflows, saving time and reducing manual efforts. This method is not just limited to SQL backup files. It can also be extended to managing other types of data stored in Azure Storage, such as application logs, diagnostic files, or media content. By ensuring the existence of containers and leveraging PowerShell's robust cmdlets, you can confidently manage your Azure resources in an automated, error-free manner.

By thiyagu selvaraj

Strategies for Effectively Managing Terraform State

Terraform is a leading infrastructure-as-code tool developed by HashiCorp and has grown to become a keystone in modern infrastructure management. By using a declarative approach, Terraform enables organizations to define, provision, and manage infrastructures that stretch across many cloud providers. One of the critical components at the core of Terraform’s functionality is the state file. This acts like a database of real-world resources managed by Terraform and their corresponding configurations. The state file is important in that it retains information about the current state of your infrastructure: resource IDs, attributes, and metadata. It helps in generating changes required by changes in configuration. In the absence of a state file, Terraform would be unable to know what is provisioned or even how to apply incremental changes or track the current state. This will act as the single source of truth for Terraform while handling infrastructures; this means Terraform can create, update, and delete infrastructures predictively and consistently. Why State Management Is Crucial State management, in a general sense, is the most important part of using Terraform. Improper handling of the state files might result in configuration drift, resource conflicts, and even accidental deletion of resources. As the state file contains some sensitive information of the infrastructure, handling this file must be appropriate, and it has to be kept safe from unauthorized access or corruption. Proper state management ensures that your infrastructure is reproduced identically across different environments, such as development, staging, and production. Keeping the state files correct and up-to-date enables Terraform to plan the changes correctly in your infrastructure and thus avoid discrepancies between its intended and real states. Another important role of state management is team collaboration. In multi-user environments, such as when different team members are working on the same infrastructure, there needs to be a way to share and lock state files to avoid racing conditions that might introduce conflicts or inconsistencies. That’s where remote state backends come in — storing state files centrally for collaboration on them as a team. In Terraform, state management is one of the basic constituents within the infrastructure-as-code approach. It ensures that your infrastructure is reliably, securely, and consistently managed across all environments, cloud accounts, and deployment regions. Understanding state files and how to manage them in the best way will allow organizations to have maximum value derived from Terraform and avoid common pitfalls related to automating the infrastructure. Understanding Terraform State A Terraform state is an integral part of Terraform management of infrastructure. It is a file recording the present state of every infrastructure resource managed by Terraform. The file holds information about each single resource, its attributes, and metadata, generally acting as the single source of truth about the state of the infrastructure. How Terraform Uses State Files? Terraform relies on the state file to map your infrastructure resources as defined in your configuration files to the actual resources in the cloud or other platforms. This mapping allows Terraform to understand what resources are being managed, how they relate to one another, and how they should be updated or destroyed. When you run a Terraform plan, Terraform compares the current state of resources, as stored in the state file, with the desired state specified in the configuration. This comparison helps Terraform identify what changes are needed to align the actual infrastructure with the intended configuration. For instance, if you’ve added a new resource in the configuration, Terraform will detect that this resource doesn’t exist in the state file and will proceed to create it. In addition to mapping resources, the state file also tracks metadata, including resource dependencies and other vital information that might not be explicitly defined in your configuration. This metadata is essential for Terraform to manage complex infrastructures, ensuring that operations like resource creation or destruction are performed in the correct order to maintain dependencies and prevent conflicts. Moreover, the state file enhances Terraform’s performance. Instead of querying the cloud provider or infrastructure platform every time it needs to assess the infrastructure, Terraform uses the state file to quickly determine what the current state is. This efficiency is especially important in large-scale environments, where querying each resource could be time-consuming and costly. Understanding the role of the Terraform state file is crucial for successful infrastructure management, as it underpins Terraform’s ability to manage, track, and update infrastructure accurately. Common Challenges in Terraform State Management State File Corruption State file corruption is one of the major risks associated with Terraform and may further create high-severity problems in infrastructure management. Due to irreconcilable corruption in a state file, Terraform will lose track of existing resources; therefore, if not detected and handled correctly, it will result in either wrong changes in infrastructure or their complete deployment failure. This type of corruption could be due to a variety of factors, such as file system errors, manual editing, or improper shutdowns during state operations. Such corruption can have a deep impact, ranging from expensive downtime to misconfigurations. Concurrency Issues Concurrency issues arise when several users or automation tools are attempting to update the Terraform state file at the same time. Since this state file is a key resource, Terraform is built so that only a single process can write to it at any particular time. If appropriate locking is not put in place, it can overwrite the state file or even corrupt it when concurrent operations are done, hence leading to inconsistencies in the infrastructure. Especially in collaborative environments, where many people in a team are working on the same infrastructure, this can pose quite an issue. State File Size and Performance As infrastructure grows, so does the Terraform state file. A large state file can lead to performance degradation, making operations like terraform plan and terraform apply slow and cumbersome. This slowdown occurs because Terraform must read, write, and update the entire state file during these operations. Large state files can also complicate debugging and increase the risk of corruption, making it harder to manage infrastructure efficiently. Proper state management strategies are essential to mitigate these performance issues, ensuring that Terraform remains a reliable and scalable tool for infrastructure management. Best Practices for Managing Terraform State Effective Terraform state management is important for reliability, security, and performance in your infrastructure as code workflows. State files in Terraform contain very vital information regarding the current state of your infrastructure; thus, mismanagement may result in issues such as corruption or even security vulnerabilities and performance bottlenecks. Below are best practices in managing Terraform state that can help mitigate such risks. 1. Use Remote State Storage One of the best state-management practices with Terraform is to store .state files in a remote backend. Terraform stores the state file by default on the local disk of the machine where it is executed. However, that may suffice for small projects or single-user environments; shortly after, it becomes very limiting for collaborative or production environments. Key benefits of remote state storage include: Better collaboration: The state file can be stored remotely, thereby enabling and ensuring a safe and effective place for more than one team member to access, mess up, and modify the infrastructure. This is critical in collaborative workflows involving many developers or DevOps engineers working on the same project.Improved security: This is also connected with the inherent security features of remote state storage backends, such as AWS S3, Azure Blob Storage, or Terraform Cloud, for encryption at rest and in transit, access control, and audit logs. This safeguards sensitive data stored in the state file, such as resource identifiers, IP addresses, and in some cases even credentials.No data redundancy or durability: remote storage usually makes automatic backups and replication by default, with high availability, to prevent the possibility of losing data after local hardware failures or unintentional deletion. With your Terraform backend configured, you can set up a remote state recipe using the storage service of a cloud provider. For instance, you would do this to use AWS S3. Plain Text terraform { backend "s3" { bucket = "your-terraform-state-bucket" key = "path/to/your/statefile" region = "us-west-2" } } 2. Enable State Locking State locking creates a lock on the state file to prevent concurrent operations from modifying it at the same time. If such operations are performed, this can cause state file corruption or inconsistent infrastructure. When locking is enabled, Terraform will automatically manage a lock for any modifying operation on state and release the lock when the operation is complete. State locking is very important, particularly in collaborative environments where various members of your team might be working on the infrastructure simultaneously. If this is not state locked, then two different users could change the state file accidentally at the same time, causing conflicts, and problems with your infrastructure. You can set up DynamoDB for state locking with AWS S3 as your backend by configuring it in this manner: Plain Text terraform { backend "s3" { bucket = "your-terraform-state-bucket" key = "path/to/your/statefile" region = "us-west-2" dynamodb_table = "terraform-lock-table" } } This configuration ensures that Terraform uses a DynamoDB table to lock the state file during operations, preventing concurrent modifications. 3. Version Control for State Files This is one of the fundamental practices in any codebase management and is just as relevant in Terraform state files. Keeping different versions of the state file enables going back to a previous state in the event of something going wrong with updating an infrastructure. Although Terraform doesn’t have intrinsic version control on state files, as it does on configurations, you can achieve version control by having the state files stored in a remote backend that allows for versioning. For example, AWS S3 lets you turn on versioning for an S3 bucket used for storing state files. If you do this, every change in the state file will be kept as a different version, and you can revert back to it whenever you want. Here is how to enable versioning for an S3 bucket: Launch the S3 console. Select the bucket used for Terraform state storage from the selected AWS account. Click “Properties.” Under the “Bucket Versioning” menu, click “Edit” and turn on versioning. It will keep a history of state changes, so in the case of a problem, previous states can be restored. 4. State File Encryption Since Terraform state files have sensitive information about one’s infrastructure, it is very important that such files be encrypted at rest and during transit. This will help in a situation when unauthorized people have access to the state file; they will not be able to read its content without appropriate decryption keys. You can enable encryption for your state files; this way, they will be protected even when you store them in some remote backends, such as AWS S3, Azure Blob Storage, or Terraform Cloud. On the other side, for instance, AWS S3 supports server-side encryption with Amazon S3-managed keys, known as SSE-S3; AWS Key Management Service, known as SSE-KMS; or customer-provided keys, known as SSE-C. Terraform uses SSE-S3 to encrypt its state file, which is stored in S3 by default. However, you will be able to use SSE-KMS to get more granular control over the encryption keys: Plain Text terraform { backend "s3" { bucket = "your-terraform-state-bucket" key = "path/to/your/statefile" region = "us-west-2" kms_key_id = "alias/your-kms-key" } } This configuration ensures that the state file is encrypted using a specific KMS key, providing additional security. 5. Minimize State File Size As your infrastructure grows, so does the Terraform state file. Large state files can slow down Terraform operations, making commands like terraform plan and terraform apply take longer to execute. To minimize the state file size and maintain performance, consider the following techniques: Use data sources: Instead of managing all resources directly in Terraform, use data sources to reference existing resources without storing their full details in the state file. This approach reduces the amount of information stored in the state and speeds up Terraform operations.Minimize resource configurations: Avoid unnecessary or redundant resource configurations that add to the state file size. Regularly review and clean up obsolete resources or configurations that are no longer needed.Split large configurations: If your Terraform configuration manages a very large infrastructure, consider splitting it into multiple smaller configurations, each with its own state file. This way, you can manage different parts of your infrastructure independently, reducing the size of each state file and improving performance. Implementing these best practices for managing Terraform state ensures that your infrastructure as code workflows are reliable, secure, and scalable. Proper state management is a cornerstone of successful Terraform usage, helping you avoid common pitfalls and maintain a healthy, performant infrastructure. Terraform State Management Strategies Effective state management is critical when using Terraform, especially in complex infrastructure setups. Here are key strategies to manage Terraform state effectively: 1. Managing State in Multi-Environment Setups In multi-environment setups (e.g., development, staging, production), managing state can be challenging. A common practice is to use separate state files for each environment. This approach ensures that changes in one environment do not inadvertently impact another. You can achieve this by configuring separate backends for each environment or using different state paths within a shared backend. For instance, in AWS S3, you can define different key paths for each environment: Plain Text terraform { backend "s3" { bucket = "your-terraform-state-bucket" key = "prod/terraform.tfstate" # Use "dev/" or "staging/" for other environments region = "us-west-2" } } This setup isolates states, reducing the risk of cross-environment issues and allowing teams to work independently on different stages of the infrastructure lifecycle. 2. Handling Sensitive Data in State Files Terraform state files may contain sensitive information, such as resource configurations, access credentials, and infrastructure secrets. Managing this data securely is vital to prevent unauthorized access. Key strategies include: Encryption: Always encrypt state files at rest and in transit. Remote backends like AWS S3, Azure Blob Storage, and Terraform Cloud offer encryption options, ensuring that state data is protected from unauthorized access.Sensitive data management: Avoid storing sensitive data directly in the Terraform configuration files or state. Instead, use environment variables, secure secret management systems (e.g., HashiCorp Vault, AWS Secrets Manager), or Terraform’s sensitive variable attribute to obscure sensitive values. By doing so, these values won’t appear in the state file or logs. Plain Text variable "db_password" { type = string sensitive = true } This configuration marks the variable as sensitive, preventing its value from being displayed in Terraform outputs. 3. Using Workspaces for Multi-Tenant Environments Terraform workspaces are an excellent way to manage state for different tenants or environments within a single backend. Workspaces allow you to manage multiple states in the same configuration directory, each representing a different environment or tenant. Create workspaces: You can create and switch between workspaces using the Terraform CLI commands: Plain Text terraform workspace new dev terraform workspace select dev Organize by tenant or environment: Each workspace has its own isolated state, making it easier to manage multiple tenants or environments without risking cross-contamination of state data.Best practices: When using workspaces, ensure that naming conventions are clear and consistent. Workspaces should be used in cases where you have similar infrastructure setups across different environments or tenants. However, for significantly different infrastructures, separate Terraform configurations might be more appropriate. Tools and Resources for Terraform State Management Terraform CLI Commands One of the important things about Terraform state files is understanding and applying Terraform CLI commands. Some of the important ones are as follows: Terraform state: This is a command for direct management of the state file. It allows one to list the resources, move resources between states, and even remove them from the state file in case they no longer exist in the configuration.Terraform refresh: This command refreshes the state file with the real-time state of the infrastructure, ensuring that it correctly reflects the current environment.Terraform import: This command allows the import of pre-existing infrastructure into the Terraform state file. This makes it possible to bring manually created resources under Terraform management. These are commands that allow the user to ensure the real infrastructure and state file are consistent, very much a part of Terraform state management. These commands help maintain consistency between the actual infrastructure and the state file, a critical aspect of Terraform state management. Third-Party Tools In addition to native Terraform tools, several third-party tools can enhance Terraform state management: Terraform Cloud: Terraform Cloud is more of a HashiCorp addition for Terraform, with inbuilt state management features like remote state storage, state locking, and versioning; it greatly provides a solid solution for the team.Atlantis: Atlantis is a tool that makes Terraform operations, such as planning and applying, a no-brainer with the seamless integration of Version Control Systems, most especially when you are working with a ton of fellow developers on the same infrastructure.Terragrunt: Terragrunt is a thin wrapper for Terraform that provides extra tools for working with multiple Terraform modules, automating remote state configuration, promoting DRY (Don’t Repeat Yourself) principles with your configurations, and managing locking.Atmosly: Atmosly supports Terraform pipelines, offering state management assistance and integration within Terraform workflows. This feature streamlines state handling and enhances pipeline automation, making it easier for teams to manage their Terraform deployments with greater efficiency. Together with Terraform native CLI commands, this presents a more comprehensive set of tools for ensuring your Infrastructure’s state is managed such that growth in infrastructure size/increase in infrastructure is predictable and secure. Conclusion Effective Terraform state management is important for integrity, security, and performance. This paper details some of the best practices you can implement, like remote state storage, state locking, encryption, splitting state files in large deployments, and multi-tenancy workspaces to significantly reduce risks associated with your state file corruption and concurrency. Take a closer look at how you’re managing Terraform states at the moment. Consider implementing the techniques and tools described for better infrastructure management.

By Ankush Madaan

A General Overview of TCPCopy Architecture

In the field of server-based request replay, there are generally two main approaches: offline replay and real-time online replication. Researchers often focus on offline replay, with little exploration in real-time replication. Based on feedback from SIGCOMM reviewers, there seems to be minimal research in real-time request replication. For real-time request replication, there are generally two types: Application-layer request replicationPacket-level request replication Traditional approaches often replicate requests at the application layer, as seen in tools like Oracle Database Replay. Although easier to implement, this approach has several drawbacks: Replicating requests from the application layer requires traversing the entire protocol stack, which can consume resources, such as valuable connection resources.Testing becomes coupled with the actual application, increasing the potential impact on online systems. Server-based replication, for instance, can cause request processing times to depend on the slowest request (e.g., max(actual request time, replicated request time)).Supporting high-stress replication is difficult and may severely impact online systems, according to feedback from some users.Network latency is challenging to control. Packet-level request replication, however, can avoid traversing the entire protocol stack. The shortest path can capture and send packets directly from the data link layer or, alternatively, at the IP layer. As long as TCP is not involved, the impact on online systems is significantly reduced. Taking a packet-based approach, server-side request replication is indeed a promising direction with significant potential. Unfortunately, the creator of tcpreplay only briefly explored this path with flowreplay before abandoning it. From what I’ve seen, this area remains under-researched (most studies instead focus on entire networks; even SIGCOMM reviewers haven't suggested similar research approaches). Diving Into TCPCopy’s Architectural Evolution The TCPCopy architecture has gone through three generations. The core principle remains the same: leveraging online packet data to simulate a TCP client stack and deceive the test application service on the target server. Since TCP interactions are inherently bidirectional, it is typically necessary to know the target server's response packets to construct a suitable request packet for the test application. Thus, regardless of the implementation, capturing response packets is essential unless the TCP protocol is drastically altered. The three architectures differ primarily in where response packets are intercepted. The First Architecture The earliest TCPCopy architecture was as follows: Figure 1. Initial TCPCopy Architecture Diagram. As shown above, tcpcopy captured request packets from the data link layer (via pcap) and sent packets from the IP layer. The target server's TCP stack received no interference from mechanisms like ip queue or nfqueue, and response packets would directly return to the online server(through routing settings). tcpcopy could capture these response packets at the data link layer, with packets typically being discarded at the IP layer (unless the client IP was the IP of the online server itself, in which case the packets would reach the TCP layer but be reset by TCP). Special thanks to TCPCopy’s originator, Wang Bo, who pioneered this initial exploration. Designed and implemented in 2009, this original 300-line version supported the early development of NetEase’s ad delivery system, achieving zero deployment errors and resolving hundreds of issues pre-launch. Returning to the architecture, this early version generally functioned only within the same network segment. For web applications, it was mostly limited to single-machine traffic, lacking the depth required to fully uncover potential issues or explore the broader capabilities of NetEase’s ad delivery system. Summary of the First Architecture Advantages Simple and direct.Suitable for smoke testing.Relatively realistic testing outcomes. Disadvantages Higher impact on the online environment due to response packets returning to the online server (though still less than application-layer replication).Network segment limitations.For web applications, it is challenging to utilize multiple live flows, which limits its value for stress testing.Internal applications are heavily restricted because the client IP of requests cannot match the replicated online server’s IP address. The Second Architecture This architecture was initially designed by TCPCopy’s originator Wang Bo (designed in 2010 and handed over to me in June 2011). The general architecture is outlined below: Figure 2. The Second TCPCopy Architecture Diagram. As shown in the diagram, tcpcopy now captures packets from the IP layer and also sends packets from the IP layer. Unlike the first architecture, this design intercepts response packets at the target server, with the intercept program returning the necessary response packet information to tcpcopy. This approach enables distributed load testing, which greatly advanced TCPCopy’s evolution compared to the first architecture. To analyze the interception of response packets, in theory, we could capture response packets at the IP layer or data link layer on the target server. Let’s examine these options: Capturing at the data link layer: If no routing is configured, the response packet would return to the actual client initiating the request, which would affect the client’s TCP module (frequent resets) and, under high load, could cause unnecessary interference to the switch, router, and even the entire network.Capturing at the IP layer: The netlink technology offers a solution to the above issues. Netlink is a communication method for interaction between user-space processes and the kernel. Specifically, we can use kernel modules such as ip_queue (for kernel versions below 3.5) or nfqueue (for kernel 3.5 or above) to capture response packets. We chose the second method, which captures response packets at the IP layer. Once a response packet is passed to intercept, we can retrieve the essential response packet information (generally TCP/IP header information) and transmit it to tcpcopy. We can also use a verdict to instruct the kernel on handling these response packets. Without a whitelist setting, these response packets will be dropped at the IP layer, making them undetectable by tcpdump (which operates at the data link layer). This design allows for the replication of traffic from multiple online servers onto a single target server. Within intercept, routing information is retained to determine which tcpcopy instance to return the response packet information. However, intercept does consume resources on the target server, and ip_queue or nfqueue may not perform efficiently, particularly for high-stress tests or short-connection load testing, leading to significant challenges. Summary of the Second Architecture Advantages Supports replicating traffic from multiple online servers.Minimizes impact on online servers, typically only returning TCP/IP header information. Disadvantages More complex than the first architecture.Performance limits are often tied to ip_queue or nfqueue.intercept lacks scalability, restricted by ip_queue and nfqueue’s inability to support multi-process response packet capture.intercept affects the final test results on the target server, especially under high-stress conditions.Incomplete testing on the target server (no coverage of data link layer egress).Less convenient for maintenance. The Third Architecture The following diagram illustrates the latest architecture, designed specifically for extreme testing. This setup offloads intercept from the target server and places it on a separate, dedicated assistant server (preferably an idle server on the same network segment). In this setup, response packets are captured at the data link layer instead of the IP layer, significantly reducing interference with the target machine (aside from routing configuration) and greatly enhancing the ability to capture response packets. Consequently, this architecture provides a more realistic testing environment. Figure 3. The Third TCPCopy Architecture Diagram. Detailed Overview Routing information is configured on the target server, where the application to be tested routes the necessary response packets to the assistant server. On the assistant server, we capture the response packets at the data link layer, extract useful information, and return it to the corresponding tcpcopy instance. To achieve high efficiency, this architecture recommends using pcap for packet capture, allowing filtering to be handled in the kernel space. Without pcap, filtering would only be possible in user space. Filtering can be configured on either the intercept or tcpcopy side (using the -F parameter, similar to tcpdump filters), enabling packet capture to be handled in a divide-and-conquer approach across multiple instances. This design improves scalability and is ideal for handling extremely high concurrency. This architecture requires more machine resources and is more challenging to use, as it involves knowledge of TCP, routing, and pcap filters (similar to tcpdump filtering conditions). Therefore, this architecture requires users to be familiar with the above knowledge. It’s important to note that in certain scenarios, pcap packet capture may experience higher packet loss rates than raw socket capture. Therefore, it’s advisable to use pf_ring for support or switch to raw socket capture. Summary of the Third Architecture Advantages Provides a more realistic testing environment.Highly scalable.Suitable for high concurrency scenarios.Avoids the limitations of ip_queue and nfqueue.Virtually no performance impact on the target server.Easier maintenance on the target server running services.Will not crash alongside the service-running server in the event of a failure. Disadvantages More challenging to operate.Requires additional machine resources.Demands more knowledge.The assistant server (running intercept) should ideally be on the same network segment as the target server to simplify deployment. Conclusion All three architectures have their merits. Currently, only the second and third architectures are open-source, and tcpcopy defaults to the third architecture. Finally, to minimize or eliminate the impact on the online environment when replicating requests, consider using the following approach: Use a high-performance bypass mechanism (if using mirroring, modify the destination address of client data packets) to replicate request data packets to a separate system. In this separate system, apply the third architecture to capture requests via the pcap interface and then forward them to the test application on the target server.

By Bin Wang

Playwright and Chrome Browser Testing in Heroku

I’ve always loved watching my unit tests run (and pass). They’re fast, and passing tests gives me the assurance that my individual pieces behave like they’re supposed to. Conversely, I often struggled to prioritize end-to-end tests for the browser because writing and running them was gruelingly slow. Fortunately, the tools for end-to-end in-browser testing have gotten much better and faster over the years. And with a headless browser setup, I can run my browser tests as part of my CI. Recently, I came across this Heroku blog post about automating in-browser testing with headless Chrome within Heroku CI. Heroku has a buildpack that installs headless Chrome, which you can invoke for your tests in the CI pipeline. The example setup from the blog post was a React app tested with Puppeteer and Jest. That’s a great start. But what if I use Playwright instead of Puppeteer? Is it possible? I decided to investigate. As it turns out, you can do this with Playwright, too! So, I captured the steps you would need to get Playwright tests running on the headless Chrome browser used in Heroku CI. In this post, I’ll walk you through the steps to get set up. A Quick Word on Browser Automation for End-to-End Testing End-to-end testing captures how users interact with your app in a browser, validating complete workflows. Playwright makes this process pretty seamless with testing in Chrome, Firefox, and Safari. Of course, running a full slate of browser tests in CI is pretty heavy, which is why headless mode helps. The Chrome for Testing buildpack from Heroku installs Chrome on a Heroku app so you can run your Playwright tests in Heroku CI with a lightweight setup. Introduction to the Application for Testing Since I was trying this out, I forked the GitHub repo that was originally referenced in the Heroku blog post. The application was a simple React app with a link, a text input, and a submit button. There were three tests: Verify that the link works and redirects to the right location.Verify that the text input properly displays the user input.Verify that submitting the form updates the text displayed on the page. Pretty simple. Now, I just needed to change the code to use Playwright instead of Puppeteer and Jest. Oh, and I also wanted to use pnpm instead of npm. Here’s a link to my forked GitHub repo. Modify the Code to Use Playwright Let’s walk through the steps I took to modify the code. I started with my forked repo, identical to the heroku-examples repo. 1. Use pnpm I wanted to use pnpm instead of npm (personal preference). So, here’s what I did first: Shell ~/project$ corepack enable pnpm ~/project$ corepack use pnpm@latest Installing pnpm@9.12.3 in the project… … Progress: resolved 1444, reused 1441, downloaded 2, added 1444, done … Done in 14.4s ~/project$ rm package-lock.json ~/project$ pnpm install # just to show everything's good Lockfile is up to date, resolution step is skipped Already up to date Done in 1.3s 2. Add Playwright to the Project Next, I removed Puppeteer and Jest, and I added Playwright. Shell ~/project$ pnpm remove \ babel-jest jest jest-puppeteer @testing-library/jest-dom ~/project$ $ pnpm create playwright Getting started with writing end-to-end tests with Playwright: Initializing project in '.' ✔ Do you want to use TypeScript or JavaScript? · JavaScript ✔ Where to put your end-to-end tests? · tests ✔ Add a GitHub Actions workflow? (y/N) · false ✔ Install Playwright browsers (can be done manually via 'pnpm exec playwright install')? (Y/n) · false ✔ Install Playwright operating system dependencies (requires sudo / root - can be done manually via 'sudo pnpm exec playwright install-deps')? (y/N) · false Installing Playwright Test (pnpm add --save-dev @playwright/test)… … Installing Types (pnpm add --save-dev @types/node)… … Done in 2.7s Writing playwright.config.js. Writing tests/example.spec.js. Writing tests-examples/demo-todo-app.spec.js. Writing package.json. I also removed the Jest configuration section from package.json. 3. Configure Playwright to Use Chromium Only You can run your Playwright tests in Chrome, Firefox, and Safari. Since I was focused on Chrome, I removed the other browsers from the projects section of the generated playwright.config.js file: JavaScript /* Configure projects for major browsers */ projects: [ { name: 'chromium', use: { ...devices['Desktop Chrome'] }, }, // { // name: 'firefox', // use: { ...devices['Desktop Firefox'] }, // }, // // { // name: 'webkit', // use: { ...devices['Desktop Safari'] }, // }, ], … 4. Exchange the Puppeteer Test Code for Playwright Test Code The original code had a Puppeteer test file at src/tests/puppeteer.test.js. I moved that file to tests/playwright.spec.js. Then, I updated the test to use Playwright’s conventions, which mapped over quite cleanly. The new test file looked like this: JavaScript const ROOT_URL = 'http://localhost:8080'; const { test, expect } = require('@playwright/test'); const inputSelector = 'input[name="name"]'; const submitButtonSelector = 'button[type="submit"]'; const greetingSelector = 'h5#greeting'; const name = 'John Doe'; test.beforeEach(async ({ page }) => { await page.goto(ROOT_URL); }); test.describe('Playwright link', () => { test('should navigate to Playwright documentation page', async ({ page }) => { await page.click('a[href="https://playwright.dev/"]'); await expect(page.title()).resolves.toMatch('| Playwright'); }); }); test.describe('Text input', () => { test('should display the entered text in the text input', async ({ page }) => { await page.fill(inputSelector, name); // Verify the input value const inputValue = await page.inputValue(inputSelector); expect(inputValue).toBe(name); }); }); test.describe('Form submission', () => { test('should display the "Hello, X" message after form submission', async ({ page }) => { const expectedGreeting = `Hello, ${name}.`; await page.fill(inputSelector, name); await page.click(submitButtonSelector); await page.waitForSelector(greetingSelector); const greetingText = await page.textContent(greetingSelector); expect(greetingText).toBe(expectedGreeting); }); }); 5. Remove start-server-and-test; Use Playwright’s webServer Instead To test my React app, I needed to spin it up (at http://localhost:8080) in a separate process first, and then I could run my tests. This would be the case whether I used Puppeteer or Playwright. With Puppeteer, the Heroku example used the start-server-and-test package. However, you can configure Playwright to spin up the app before running tests. This is pretty convenient! I removed start-server-and-test from my project. Shell ~/project$ pnpm remove start-server-and-test In playwright.config.js, I uncommented the webServer section at the bottom, modifying it to look like this: JavaScript /* Run your local dev server before starting the tests */ webServer: { command: 'pnpm start', url: 'http://127.0.0.1:8080', reuseExistingServer: !process.env.CI, }, Then, I removed the test:ci script from the original package.json file. Instead, my test script looked like this: JSON "scripts": { … "test": "playwright test --project=chromium --reporter list" }, 6. Install Playwright Browser on Local Machine Playwright installs the latest browser binaries to use for its tests. So, on my local machine, I needed Playwright to install its version of Chromium. Shell ~/project$ pnpm playwright install chromium Downloading Chromium 130.0.6723.31 (playwright build v1140) from https://playwright.azureedge.net/builds/chromium/1140/chromium-linux.zip 164.5 MiB [====================] 100% Note: The Chrome for Testing buildpack on Heroku installs the browser we’ll use for testing. We’ll set up our CI so that Playwright uses that browser instead of spending the time and resources installing its own. 7. Run Tests Locally With that, I was all set. It was time to try out my tests locally. Shell ~/project$ pnpm test > playwright test --project=chromium --reporter list Running 3 tests using 3 workers ✓ 1 [chromium] > playwright.spec.js:21:3 > Text input > should display the entered text in the text input (911ms) ✘ 2 [chromium] > playwright.spec.js:14:3 > Playwright link > should navigate to Playwright documentation page (5.2s) ✓ 3 [chromium] > playwright.spec.js:31:3 > Form submission > should display the "Hello, X" message after form submission (959ms) ... - waiting for locator('a[href="https://playwright.dev/"]') 13 | test.describe('Playwright link', () => { 14 | test('should navigate to Playwright documentation page', async ({ page }) => { > 15 | await page.click('a[href="https://playwright.dev/"]'); | ^ 16 | await expect(page.title()).resolves.toMatch('| Playwright'); 17 | }); 18 | }); Oh! That’s right. I modified my test to expect the link in the app to take me to Playwright’s documentation instead of Puppeteer’s. I needed to update src/App.js at line 19: HTML <Link href="https://playwright.dev/" rel="noopener"> Playwright Documentation </Link> Now, it was time to run the tests again. Shell ~/project$ pnpm test > playwright test --project=chromium --reporter list Running 3 tests using 3 workers ✓ 1 [chromium] > playwright.spec.js:21:3 > Text input > should display the entered text in the text input (1.1s) ✓ 2 [chromium] > playwright.spec.js:14:3 > Playwright link > should navigate to Playwright documentation page (1.1s) ✓ 3 [chromium] > playwright.spec.js:31:3 > Form submission > should display the "Hello, X" message after form submission (1.1s) 3 passed (5.7s) The tests passed! Next, it was time to get us onto Heroku CI. Deploy to Heroku to Use CI Pipeline I followed the instructions in the Heroku blog post to set up my app in a Heroku CI pipeline. 1. Create a Heroku Pipeline In Heroku, I created a new pipeline and connected it to my forked GitHub repo. Next, I added my app to staging. Then, I went to the Tests tab and clicked Enable Heroku CI. Finally, I modified the app.json file to remove the test script which was set to call npm test:ci. I had already removed the test:ci script from my package.json file. The test script in package.json was now the one to use, and Heroku CI would look for that one by default. My app.json file, which made sure to use the Chrome for Testing buildpack, looked like this: JSON { "environments": { "test": { "buildpacks": [ { "url": "heroku-community/chrome-for-testing" }, { "url": "heroku/nodejs" } ] } } } 2. Initial Test Run I pushed my code to GitHub, and this triggered a test run in Heroku CI. The test run failed, but I wasn’t worried. I knew there would be some Playwright configuration to do. Digging around in the test log, I found this: Shell Error: browserType.launch: Executable doesn't exist at /app/.cache/ms-playwright/chromium-1140/chrome-linux/chrome Playwright was looking for the Chrome browser instance. I could install it with the playwright install chromium command as part of my CI test setup. But that would defeat the whole purpose of having the Chrome for Testing buildpack. Chrome was already installed; I just needed to point to it properly. Looking back at my test setup log for Heroku, I found these lines: Shell Installed Chrome dependencies for heroku-24 Adding executables to PATH /app/.chrome-for-testing/chrome-linux64/chrome /app/.chrome-for-testing/chromedriver-linux64/chromedriver Installed Chrome for Testing STABLE version 130.0.6723.91 So, the browser I wanted to use was at /app/.chrome-for-testing/chrome-linux64/chrome. I would just need Playwright to look there for it. 3. Help Playwright Find the Installed Chrome Browser Note: If you’re not interested in the nitty-gritty details here, you can skip this section and simply copy the full app.json lower down. This should give you what you need to get up and running with Playwright on Heroku CI. In Playwright’s documentation, I found that you can set an environment variable that tells Playwright if you used a custom location for all of its browser installs. That env variable is PLAYWRIGHT_BROWSERS_PATH. I decided to start there. In app.json, I set an env variable like this: JSON { "environments": { "test": { "env": { "PLAYWRIGHT_BROWSERS_PATH": "/app/.chrome-for-testing" }, ... I pushed my code to GitHub to see what would happen with my tests in CI. As expected, it failed again. However, the log error showed this: Shell Error: browserType.launch: Executable doesn't exist at /app/.chrome-for-testing/chromium-1140/chrome-linux/chrome That got me pretty close. I decided that I would do this: Create the folders needed for where Playwright expects the Chrome browser to be. That would be a command like: Shell mkdir -p "$PLAYWRIGHT_BROWSERS_PATH/chromium-1140/chrome-linux" Create a symlink in this folder to point to the Chrome binary installed by the Heroku buildpack. That would look something like this: Shell ln -s \ $PLAYWRIGHT_BROWSERS_PATH/chrome-linux64/chrome \ $PLAYWRIGHT_BROWSERS_PATH/chromium-1140/chrome-linux/chrome However, I was concerned about whether this would be future-proof. Eventually, Playwright would use a new version of Chromium, and it wouldn’t look in a chromium-1140 folder anymore. How could I figure out where Playwright would look? That’s when I discovered you can do a browser installation dry run. Shell ~/project$ pnpm playwright install chromium --dry-run browser: chromium version 130.0.6723.31 Install location: /home/alvin/.cache/ms-playwright/chromium-1140 Download url: https://playwright.azureedge.net/builds/chromium/1140/chromium-linux.zip Download fallback 1: https://playwright-akamai.azureedge.net/builds/chromium/1140/chromium-linux.zip Download fallback 2: https://playwright-verizon.azureedge.net/builds/chromium/1140/chromium-linux.zip That “Install location” line was crucial. And, if we set PLAYWRIGHT_BROWSERS_PATH, here is what we would see: Shell ~/project$ PLAYWRIGHT_BROWSERS_PATH=/app/.chrome-for-testing \ pnpm playwright install chromium --dry-run browser: chromium version 130.0.6723.31 Install location: /app/.chrome-for-testing/chromium-1140 ... That’s what I want. With a little awk magic, I did this: Shell ~/project$ CHROMIUM_PATH=$( \ PLAYWRIGHT_BROWSERS_PATH=/app/.chrome-for-testing \ pnpm playwright install --dry-run chromium \ | awk '/Install location/ {print $3}' ) ~/project$ echo $CHROMIUM_PATH /app/.chrome-for-testing/chromium-1140 With all that figured out, I simply needed to add a test-setup script to app.json. Because PLAYWRIGHT_BROWSERS_PATH is already set in env, my script would be a little simpler. This was my final app.json file: JSON { "environments": { "test": { "env": { "PLAYWRIGHT_BROWSERS_PATH": "/app/.chrome-for-testing" }, "buildpacks": [ { "url": "heroku-community/chrome-for-testing" }, { "url": "heroku/nodejs" } ], "scripts": { "test-setup": "CHROMIUM_PATH=$(pnpm playwright install --dry-run chromium | awk '/Install location/ {print $3}'); mkdir -p \"$CHROMIUM_PATH/chrome-linux\"; ln -s $PLAYWRIGHT_BROWSERS_PATH/chrome-lin ux64/chrome $CHROMIUM_PATH/chrome-linux/chrome" } } } } I’ll briefly walk through what test-setup does: Accounting for PLAYWRIGHT_BROWSERS_PATH, uses playwright install --dry-run with awk to determine the root folder where Playwright will look for the Chrome browser. Sets this as the value for the CHROMIUM_PATH variable.Creates a new folder (and any necessary parent folders) to CHROMIUM_PATH/chrome-linux, which is the actual folder where Playwright will look for the chrome binary.Creates a symlink in that folder, for chrome to point to the Heroku buildpack installation of Chrome (/app/.chrome-for-testing/chrome-linux64/chrome). 4. Run Tests Again With my updated app.json file, Playwright should be able to use the Chrome installation from the buildpack. It was time to run the tests once again. Success! The test-setup script ran as expected. Playwright was able to access the chrome binary and run the tests, which passed. Conclusion End-to-end testing for my web applications is becoming less cumbersome, so I’m prioritizing it more and more. In recent days, that has meant using Playwright more, too. It’s flexible and fast. And now that I’ve done the work (for me and for you!) to get it up and running with the Chrome for Testing buildpack in Heroku CI, I can start building up my browser automation test suites once again. The code for this walkthrough is available in my GitHub repository.

By Alvin Lee

CORE

AI Copilot Using AWS Multi-Agent Orchestrator

Personal AI copilots are emerging as real game changers. They have the potential to transform how we manage our daily tasks and responsibilities. Unlike basic chatbots, these intelligent assistants are sophisticated systems that understand the nuances of our personal lives, making our day-to-day activities much smoother and more efficient. Building such an AI copilot can be complex without proper infrastructure. AWS Multi-Agent Orchestrator is a flexible and powerful framework for managing multiple AI agents. The orchestrator's classifier selects the appropriate agent based on the user input, the agent's characteristics, and the conversation history. The orchestrator also facilitates the storage of the conversation history per agent. Source: AWS Multi-Agent Orchestrator Building an AI Copilot for Various Tasks Let us build an AI copilot that can check the calendar, suggest fitness routines, and read the news simultaneously. These are completely separate tasks that can have different contexts throughout the user's interactions with them. The full source code for this personal assistant can be found here. Calendar Agent This is a chain agent that internally uses ApiAgent to fetch calendar invitations using Calendly APIs. The response from the ApiAgent is streamed to BedrockLLMAgent to summarize the invitations. Python agent1 = ApiAgent(ApiAgentOptions( endpoint = f"https://api.calendly.com/user_busy_times?user={CALENDLY_USER_URI}&start_time={start_time}&end_time={end_time}", method = "GET", name = "Calendly Schedule Agent", description = "Specializes in Calendar scheduling", streaming=False, headers_callback=custom_headers_callback, )) agent2 = BedrockLLMAgent(BedrockLLMAgentOptions( name="Calendar Summarization", streaming=True, description="You are an AI agent specialized in summarizing calendar events. Given a list of events, produce a concise summary"\ " highlighting key details such as event names, dates, times, and participants. Ensure the summary is clear, brief, and "\ "informative for quick understanding. Do not provide duplicate information or irrelevant details.", model_id="anthropic.claude-3-5-sonnet-20240620-v1:0", callbacks=ChainlitAgentCallbacks() )) News Reader Agent The news reader agent also utilizes the chain agent, which internally uses ApiAgent to fetch the news using Gnews APIs. The response from the ApiAgent is streamed to BedrockLLMAgent to summarize the news. Python agent1 = ApiAgent(ApiAgentOptions( endpoint = f"https://gnews.io/api/v4/search?q=example&apikey={GNEWS_API_KEY}", method = "GET", name = "News Reader Agent", description = "Specializes in reading news from various sources", streaming=False )) agent2 = BedrockLLMAgent(BedrockLLMAgentOptions( name="News Summarization Agent", streaming=True, description="You are a skilled journalist tasked with creating concise, engaging news summaries."\ "Given the following text, produce a clear and informative summary that captures the key points," \ "main actors, and significant details. Your summary should be objective, well-structured, "\ "and easily digestible for a general audience. Aim for clarity and brevity while maintaining the essence of the news story.", model_id="anthropic.claude-3-5-sonnet-20240620-v1:0", callbacks=ChainlitAgentCallbacks() )) Fitness Agent The fitness agent is a standalone agent that uses the LLM model to suggest fitness routines, diet plans, and health tips. Python fitness_agent = BedrockLLMAgent(BedrockLLMAgentOptions( name="Fitness Agent", streaming=True, description="Specializes in fitness, health, and wellness. It can provide workout routines, diet plans, and general health tips.", model_id="anthropic.claude-3-5-sonnet-20240620-v1:0", callbacks=ChainlitAgentCallbacks() )) Running the App Follow these instructions for the prerequisites. 1. Clone the repository: Shell git clone https://github.com/ravilaudya/task-copilot.git cd task-copilot 2. Create a virtual environment: Shell conda create -p venv python=3.12 conda activate ./venv 3. Install the required dependencies: Shell pip install -r requirements.txt 4. Run the app: Shell chainlit run app.py --port 8888 Interacting With the Copilot Once the app is running, a browser window will open where you can interact with the AI copilot. Some example questions you can ask include: "What is the latest news?""How is my calendar this week?""How can I lose fat?" Conclusion The AI copilot streamlines daily tasks and enhances productivity by providing quick and relevant information tailored to individual preferences. With the right infrastructure in place, building such an AI copilot becomes a manageable and rewarding project. Having such a copilot is like having a reliable partner by your side, making life a little bit easier.

By Ravi Laudya

Testing, Tools, and Frameworks

DZone's Featured Testing, Tools, and Frameworks Resources

Top Testing, Tools, and Frameworks Experts

The Latest Testing, Tools, and Frameworks Topics