DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Testing, Tools, and Frameworks

The Testing, Tools, and Frameworks Zone encapsulates one of the final stages of the SDLC as it ensures that your application and/or environment is ready for deployment. From walking you through the tools and frameworks tailored to your specific development needs to leveraging testing practices to evaluate and verify that your product or application does what it is required to do, this Zone covers everything you need to set yourself up for success.

icon
Latest Premium Content
Trend Report
Software Supply Chain Security
Software Supply Chain Security
Refcard #376
Cloud-Based Automated Testing Essentials
Cloud-Based Automated Testing Essentials
Refcard #363
JavaScript Test Automation Frameworks
JavaScript Test Automation Frameworks

DZone's Featured Testing, Tools, and Frameworks Resources

REST-Assured Configuration and Specifications: Writing Maintainable API Tests

REST-Assured Configuration and Specifications: Writing Maintainable API Tests

By Faisal Khatri DZone Core CORE
When working on API automation projects, one of the first things that becomes repetitive is configuring the same settings for every test. The base URL, content type, request logging, and common response validations often appear in multiple test classes. As the number of tests increases, maintaining these repeated configurations becomes difficult. REST Assured provides specifications to solve this problem. Instead of defining the same settings in every test, common configurations and specifications can be created once and reused throughout the test suite. This article demonstrates a simple approach to configuring REST Assured using a Base Test class along with Request and Response Specification. What Are REST-Assured Specifications? A specification is a reusable configuration object that contains common request or response settings. So, instead of repeatedly writing: Java given() .baseUri("https://api.example.com") .header("Authorization", "Bearer token") .contentType(ContentType.JSON) The configuration can be defined once and reused across multiple tests. Similarly, the common validations can also be written using the specifications. Specifications help in: Reduce code duplicationImprove test readabilityCentralize API configurationsSimplify maintenanceStandardize request and response validations Why Use Specifications? Consider an API test that retrieves user details. Java @Test public void getUserDetails() { given() .baseUri("https://api.example.com") .when() .get("/orders/2") .then() .statusCode(200); } The test works correctly, but the base URI and common validations, such as status code, will need to be repeated in every test. A better approach is to move these common settings into reusable specifications. What Problem Does It Solve? In many API automation projects, test cases often contain repeated configuration code. The same base URL, content type, authentication details, headers, and response validations are repetitive across multiple test classes. While this may not seem like a problem when there are only a few tests, maintaining the test suite becomes difficult as the project grows. Consider a scenario where the API base URL changes from a QA environment to a Staging environment. Without a centralized configuration, every test containing the old URL would need to be updated. Similarly, if a common header or authentication mechanism changes, modifications would be required in multiple places. Request and Response Specifications solve this problem by moving common configurations into reusable objects. Instead of repeating the same setup in every test, the configuration is defined once and reused wherever required. This reduces code duplication, improves readability, and makes the test suite easier to maintain. As a result, test methods can focus on validating business functionality rather than configuring API requests and responses. This leads to cleaner and more maintainable automation code. Creating a SetupSpecification Class The most common configurations should be placed in a separate class. This allows all test classes to inherit the same setup. The following example creates a Request and Response Specification in a separate class using the @BeforeClass annotation. Java public class SetupSpecification { @BeforeClass public void setup () { final RequestSpecification request = new RequestSpecBuilder () .addHeader ("Content-Type", "application/json") .setBaseUri ("http://localhost:3004") .addFilter (new RequestLoggingFilter ()) .addFilter (new ResponseLoggingFilter ()) .build (); final ResponseSpecification response = new ResponseSpecBuilder () .expectResponseTime (lessThan (10000L)) .build (); RestAssured.requestSpecification = request; RestAssured.responseSpecification = response; } } This setup method runs before the test class execution. The Request Specification contains the base URI, content type, and logging configuration. Any configuration defined in a Request Specification will be applied to every API request that uses that specification. For example, if the specification includes a common header, authentication token, content type, or query parameter, those values will automatically be sent with all requests that reference the specification. While this promotes reusability and reduces duplication, care should be taken when adding request-specific details to a shared specification. Not all APIs may require the same headers, authentication mechanisms, query parameters, or request bodies. Including such configurations in a common specification can lead to unintended behavior and make tests more difficult to maintain. The Response Specification contains the common validations that are expected from the API response. The expectResponseTime() method validates that the API responds within the specified time limit. Additionally, we can also add the validations for: Status CodeHeadersContent-TypeCookieBody However, it is important to understand that any validation defined in a Response Specification will be applied to every API test that uses that specification. For example, if the specification includes a validation for a 200 status code, all tests using that specification will automatically expect a 200 response. This may not be appropriate for APIs that are expected to return different status codes, such as 201, 204, 400, or 404. The same consideration applies to validations related to headers, content type, cookies, and response body content. Including endpoint-specific validations in a shared specification can reduce flexibility and make tests harder to maintain. A good practice is to keep only the truly common validations in a shared Response Specification and add endpoint-specific assertions within the individual test methods. The statement below makes the Request Specification available globally for the test execution. Java RestAssured.requestSpecification = request; RestAssured.responseSpecification = response; As a result, the base URI and header(Content-Type), and validation to check the response time do not need to be specified in every test. Writing a Test Using the Specifications Once the setup is complete, test classes can extend the SetupSpecification class. Java public class TestGetRequestWithRestAssuredSpecs extends SetupSpecification { @Test public void getRequestTestWithRestAssuredConfig () { final int orderId = 3; given ().when () .queryParam ("id", orderId) .get ("/getOrder") .then () .statusCode (200) .and () .assertThat () .body ("orders[0].id", equalTo (orderId), "orders[0].product_name", equalTo ("USB-C Charger")); } } The Request Specification is automatically applied because it was configured in the SetupSpecification class. It means all the common request configurations, such as the base URI, headers, content type, and logging settings, are automatically applied to the request. Similarly, the common response validations configured for expected response time in the SetupSpecification class are reused during test execution. The test itself focuses only on endpoint-specific details by passing the id query parameter, invoking the /getOrder endpoint. This approach keeps the test concise and improves maintainability by separating common configuration from test-specific assertions. Adding Additional Assertions The Response Specification can handle common validations, while endpoint-specific assertions can still be added in the test. Java public class TestGetRequestWithRestAssuredSpecs extends SetupSpecification { @Test public void getRequestTestWithRestAssuredConfig () { final int orderId = 3; given ().when () .queryParam ("id", orderId) .get ("/getOrder") .then () .statusCode (200) .and () .assertThat () .body ("orders[0].id", equalTo (orderId), "orders[0].product_name", equalTo ("USB-C Charger")); } } In this example, the response body validations for order ID and product name remain inside the test because they are specific to this API endpoint. Why This Approach Is Useful As the test suite grows, hundreds of API tests may use the same base URL, content type, authentication, and response validations. Maintaining these configurations in every test class can quickly become difficult. Keeping the Request and Response Specifications in a separate class provides a centralized location for managing common settings. If the API URL changes or additional configurations need to be added, only a single file needs to be updated. This approach also improves readability because the test methods contain only the business validations relevant to the API being tested. Using Request and Response Specifications Directly in the Test Class While many automation projects prefer keeping specifications in a separate class, there are situations where creating specifications directly inside the test class makes sense. This approach is useful for smaller projects, proof-of-concept implementations, or when a test class requires its own configuration that is not shared with other tests. In this approach, the Request and Response Specifications are created using the @BeforeClass annotation and are available only within the current test class. Java public class StringRelatedAssertionTests { private static ResponseSpecification responseSpecification; private static RequestSpecification requestSpecification; @BeforeClass public void setupSpecBuilder () { final RequestSpecBuilder requestSpecBuilder = new RequestSpecBuilder ().setBaseUri ( "https://api.restful-api.dev/objects") .addQueryParam ("id", 3) .addFilter (new RequestLoggingFilter ()) .addFilter (new ResponseLoggingFilter ()); final ResponseSpecBuilder responseSpecBuilder = new ResponseSpecBuilder ().expectStatusCode (200); responseSpecification = responseSpecBuilder.build (); requestSpecification = requestSpecBuilder.build (); } @Test public void testStringAssertions () { given ().spec (requestSpecification) .get () .then () .spec (responseSpecification) .assertThat () .body ("[0].name", equalTo ("Apple iPhone 12 Pro Max")) } } In this example, the Request and Response Specifications are created once in the @BeforeClass method and stored in static variables. The Request Specification contains common request details such as the base URI, query parameters, and logging filters, while the Response Specification defines the expected status code. During test execution, the Request Specification is applied using the spec(requestSpecification) method before sending the request. After the response is received, the Response Specification is applied using spec(responseSpecification) to validate the common response expectations before performing additional assertions on the response body. Keeping the specifications and test logic within the same class makes the example easy to follow, as both the setup and test execution are located in a single file. However, as the test suite grows and multiple test classes require the same configurations, duplicating specifications across classes can become difficult to maintain. In such situations, moving the common Request and Response Specifications to a separate class provides better reusability and reduces code duplication. For smaller projects or learning purposes, defining the specifications directly within the test class remains a simple and effective approach. Summary Rest-Assured Specifications help create cleaner and more maintainable API automation tests. A best practice is to define Request and Response Specification in a separate class and initialize them using the @BeforeClass annotation. The Request Specification manages settings such as the base URI, content type, and logging, while the Response Specification handles common response validations. By centralizing these configurations, test classes become shorter, easier to read, and simpler to maintain. For API automation frameworks built with REST Assured and TestNG, this pattern provides a clean foundation that scales well as the number of tests increases. More
Selective Deployment in Azure Data Factory: A Practical Blueprint for Safer CI/CD

Selective Deployment in Azure Data Factory: A Practical Blueprint for Safer CI/CD

By Sauhard Bhatt
Picture this: two features are being developed in parallel. One has already been tested in lower environments, but is still awaiting business approvalThe other is fully validated and ready to go live Naturally, you want to release the second feature to production. But you can’t, because your deployment model forces you to release everything together. If you’ve worked with Azure Data Factory (ADF), this situation probably sounds familiar. Azure Data Factory (ADF) is a cloud-based data integration service from Microsoft that helps you build and orchestrate data pipelines across systems. It works extremely well for managing data workflows — but when it comes to deployments at scale, things get tricky. As our ADF usage grew across multiple teams and environments, we started running into a recurring problem: We had control over development — but very little control over what actually got deployedA simple pipeline fix could unintentionally introduce unrelated changesParallel feature development became harder to manageProduction releases became riskier than they needed to be That’s when we realized: The issue wasn’t ADF itself — it was the deployment model we were relying on. The issue wasn’t ADF itself — it was the deployment model we were relying on. This article walks through how we addressed that challenge by implementing a selective deployment pattern, allowing us to promote only intended changes without impacting everything else. The Real Problem: Parallel Feature Releases in ADF Before diving into the solution, let’s look at a scenario that frequently occurs in real-world teams. What This Diagram Represents This diagram shows two features progressing across environments: Feature 100 Developed earlier, successfully deployed to Dev and TestCurrently in UAT (User Acceptance Testing)Still awaiting business approval before production Feature 200 Developed later, successfully completed across Dev → Test → UATFully validated and ready for production Expected Behavior At this stage, the expectation is straightforward: “Let’s release Feature 200 to production.” Feature 100 is still under testing, so it should remain in UAT. What Actually Happens in ADF Azure Data Factory follows a full-state deployment model. That means when you deploy, you are not deploying a feature; you are deploying the entire factory state. So when you attempt to release Feature 200: Feature 100 gets included automaticallyYou cannot isolate Feature 200You lose control over what reaches production Why This Becomes a Real Problem This isn’t an edge case; it becomes a recurring pattern in larger environments. You’ll encounter this when: Multiple teams are working in parallelFeatures move at different speedsUAT cycles varyProduction fixes need to be released quickly It becomes even more complex when: Existing production pipelines are modifiedPartial updates are requiredDependencies overlap across features The Core Limitation: ADF promotes state, not intent. It does not differentiate between what is ready for production and what is still under testing. Why We Had to Rethink Deployment This limitation introduced real risks: Accidental promotion of incomplete featuresDelayed production releasesIncreased coordination overheadHigher chances of breaking stable pipelines We needed a way to: Promote only Feature 200Keep Feature 100 in UATAvoid impacting unrelated artifactsReduce production risk Architecture Overview To address this challenge, we introduced a selective packaging layer between build and deployment. Flow Feature Branch → PR → Validate → Selective Packaging → ARM Export → Incremental Deploy → Trigger Control Key Idea: Instead of exporting ARM templates from the full ADF repository, we export from a filtered staging folder containing only the required artifacts. Understanding Default ADF Deployment Behavior Before implementing selective deployment, it’s important to understand how Azure Data Factory works by default. ADF follows a full-state deployment model. How Default ADF Deployment Works When you use ADF with Git integration: Developers work in a collaboration branch (typically main)Changes are committed and merged via pull requestsADF provides a Publish button in the UI When you click Publish, ADF generates ARM templates representing the entire factory state. These templates are stored in the adf_publish branch: In modern setups, instead of clicking Publish manually, teams often use @microsoft/azure-data-factory-utilities (npm-based export). This allows pipelines to validate ADF resources and export ARM templates programmatically. YAML - name: Validate ADF resources run: | set -euo pipefail FACTORY_ID="/subscriptions/${{ env.SUBSCRIPTION_ID }/resourceGroups/${{ env.RESOURCE_GROUP }/providers/Microsoft.DataFactory/factories/${{ env.SOURCE_FACTORY_NAME }" npm run build validate "${{ github.workspace }" "$FACTORY_ID" YAML - name: Export ARM templates (CI publish) run: | set -euo pipefail FACTORY_ID="/subscriptions/${{ env.SUBSCRIPTION_ID }/resourceGroups/${{ env.RESOURCE_GROUP }/providers/Microsoft.DataFactory/factories/${{ env.DEV_FACTORY_NAME }" npm run build export "${{ github.workspace }" "$FACTORY_ID" "${{ env.ARM_OUTPUT_DIR }" Whether you click Publish manually or use npm export in CI/CD, the outcome is the same: Full factory deploymentNo control over individual featuresAll changes get bundled together Selective Deployment Layer (Core Design) We can address this requirement and the associated challenges by introducing a workflow driven by a manifest to define the deployment scope, and a program to identify all necessary ADF dependencies for each manifest file. As a developer, I can now control which release is promoted to production, without worrying about releasing any other features that are not ready. The manifest controls which pipelines to deploy and which optional categories to include. Below is an example of a manifest file JSON { "pipelines": ["pl_ingest_population_selective"], "includeTriggers": false, "includeIntegrationRuntimes": false, "includeAllGlobalParameters": true, "includeLinkedServices": true, "validateLinkedServicesExist": true, "includeManagedVirtualNetwork": false, "includeManagedPrivateEndpoints": false } Workflow Explanation Let's understand the crux of the selective deployment workflow now. I am working in the release branch on my feature branch directly in ADF Studio. Since ADF Studio is integrated with Git, my development changes will be saved to my branch. Here are the steps I can take to promote my change to a higher environment. 1) Validation of ADF on PR validation This is an early validation step and a guardrail: if the PR fails, it's because objects are invalid and misaligned. This is equivalent to the "validation all" button in the ADF ui, here is this workflow Trigger: Pull requests targeting the branch selective_deployment. Purpose: Validate that the ADF JSON in the PR is valid in the context of the target factory. Main steps: CheckoutSet up Node.js 20npm installAzure login using OIDC (azure/login@v2)Validate with ADF Utilities: YAML FACTORY_ID="/subscriptions/${AZURE_SUBSCRIPTION_ID}/resourceGroups/${AZURE_RESOURCE_GROUP}/providers/Microsoft.DataFactory/factories/${DEV_FACTORY_NAME}" npm run build validate "$GITHUB_WORKSPACE" "$FACTORY_ID" 2) Release build + selective deploy to DEV adf-release-build-selective-deploy.yml Triggers: Push to selective_deploymentManual run (workflow_dispatch) with optional manifest inputDefault: deploy/manifests/release.json This workflow has two jobs: Job A: adf-build (staging + export + sanitize + artifacts) Checkout (full history)Azure login using OIDCSet up Node.js 20Install build dependencies inside build/ (npm install in build)Stage selective subset python scripts/select_adf_subset.py <manifest>, a code snippet below for the complete script, refer to the GitHub repository link given Python import json import re import shutil import sys from pathlib import Path from typing import Dict, Set, Tuple, List from collections import defaultdict # Your repo layout has pipeline/, dataset/, linkedService/ at ROOT. REPO_ROOT = Path(".") STAGE_ROOT = Path("build/adf_subset") RESOURCE_DIRS = { "pipeline": REPO_ROOT / "pipeline", "dataset": REPO_ROOT / "dataset", "linkedService": REPO_ROOT / "linkedService", "dataflow": REPO_ROOT / "dataflow", "trigger": REPO_ROOT / "trigger", "integrationRuntime": REPO_ROOT / "integrationRuntime", "credential": REPO_ROOT / "credential", "managedVirtualNetwork": REPO_ROOT / "managedVirtualNetwork", } # Copy these if present so ADF utilities behave the same on staged subset. ROOT_FILES_TO_COPY = [ "publish_config.json", "arm-template-parameters-definition.json", "arm_template_parameters-definition.json", "package.json", "package-lock.json", ] Produces: build/adf_subset/ (staged tree)build/adf_subset_report.json (dependency report)Refer to logs below (showing output of stage selective subset and debug to view output generated after select_adf_subset.py )Export ARM templates from the staged subset via ADF Utilities: npm --prefix build run build -- export "adf_subset" "$FACTORY_ID" "ArmTemplate"Produces: build/ArmTemplate/ARMTemplateForFactory.jsonbuild/ArmTemplate/ARMTemplateParametersForFactory.jsonStrip infra-owned resources scripts/strip_arm_resources.py to produce a safe template: build/ArmTemplate/ARMTemplateForFactory.safe.json⚠️ Note on Infrastructure Components (Refer to the “Future Work & Next Steps” section for follow-up topics in this series) The step above intentionally strips infrastructure-dependent components from the generated subset to avoid overwriting existing shared resources such as linked services. This implementation focuses on developer-owned artifacts (pipelines, datasets, and triggers) and assumes that infrastructure components — such as Integration Runtimes, managed private endpoints, and linked services — are pre-provisioned and managed outside of this deployment workflow.Upload artifacts: ARM templates (adf-arm)metadata (adf-release-meta)subset report (adf-subset-report) Job B: deploy_dev (deploy safe template) Download ARM artifactAzure login using OIDCEnsure az Data Factory extension is installedValidate JSON files exist/parseDeploy via azure/arm-deploy@v2(Incremental) to DEV RG/factory: Template: ARMTemplateForFactory.safe.jsonParameters: ARMTemplateParametersForFactory.json + factoryName=<DEV_FACTORY_NAME> Lesson Learned Setting up selective deployment in ADF was more than a technical task. It made us rethink our approach to deployments, ownership, and CI/CD design. Here are the main things we learned: 1. The Problem Is Not Tooling; It’s Deployment Granularity At first, we thought the limitation came from the tools we used, like UI publish or npm export. However, both methods yielded the same result: full factory templates. The real problem was that we couldn’t control the scope of deployments, not how the templates were made. 2. Dependency Awareness Is Critical Selective deployment only works when every dependency is found and included. We learned that: Pipelines often reference multiple datasets and linked services. Missing even one dependency results in deployment failure You must automate dependency discovery. 3. “Incremental” Is Often Misunderstood Incremental deployment is important, but it doesn’t work like a patch. It reapplies the full configuration for all included resources. This means: Your generated templates need to be complete for all the artifacts you include. If you use partial definitions, deployments can fail. 4. Separation of Concerns Is Key Not all ADF artifacts are the same. We began to separate them into different groups: Application-owned artifacts: pipelines, datasets, triggers Infrastructure-owned artifacts: linked service, managed virtual networks, managed private endpoints, and integration-runtime, among others. This separation proved crucial for safe, scalable deployments. 5. Selective Deployment Adds Complexity, But It’s Worth It It’s true that implementing this approach brings in additional scripts, manifest management, and CI/CD complexity. But in exchange, we gained precise control over releases, reduced production risk, and faster hotfix deployments. Future Work and Next Steps While selective deployment solved a major gap in ADF CI/CD, it also opened up new areas for improvement and standardization. 1. Defining Infrastructure vs Application Ownership One of the biggest follow-up areas is clearly defining ownership boundaries. In our experience: Application teams should own pipelines, datasets, and triggers Platform or infrastructure teams should own linked services, managed virtual networks, and managed private endpoints, among other things. Future work can focus on: Enforcing this separation in CI/CD. Preventing accidental deployment of infrastructure components Integrating Terraform or platform pipelines for infrastructure provisioning 2. Governance Around Linked Services Linked services are often shared across multiple pipelines and teams. Future improvements include: Centralizing linked service management Using Key Vault and Managed Identity consistently Preventing direct modifications through application pipelines More
Implementing Asynchronous Communication Between Microservices Using Kafka and Spring Boot
Implementing Asynchronous Communication Between Microservices Using Kafka and Spring Boot
By Mallikharjuna Manepalli
I Built a VS Code Extension to Debug Azure AI Foundry Agents Without Leaving My Editor
I Built a VS Code Extension to Debug Azure AI Foundry Agents Without Leaving My Editor
By Jubin Abhishek Soni DZone Core CORE
Testing Strategies for Web Development Code Generated by LLMs
Testing Strategies for Web Development Code Generated by LLMs
By Sandesh Basrur
On-Device Debugging and JUnit 5
On-Device Debugging and JUnit 5

This is the first follow-up to Friday's release post, and it covers the two changes from this release that affect how you iterate on a Codename One app rather than what the app itself does. On-device debugging that treats Java as Java on a real iPhone or a real Android device, and standard JUnit 5 against the JavaSE simulator. The first is the one we have been wanting for a long time, and is the one that takes the most explaining, so most of the post is about it. On-Device Debugging That Treats Java as Java Codename One has always supported on-device debugging in the strict technical sense. You could attach Xcode to a .ipa, you could attach Android Studio to a running APK, you could read the native call stack, you could step through Objective-C or the C that ParparVM emits. What you could not do was set a breakpoint in MyForm.java, hit it on a real iPhone, and inspect a Java field on a Java object as a Java object. You also could not debug an iOS app without a Mac in the loop somewhere, because the only debugger that understood the binary was Xcode. The translation step between the Java you wrote and the C that ParparVM produces left no way back across the gap on the device. PR #4999 (iOS) and PR #5012 (Android) close that gap. As of this week, any JDWP-speaking debugger (IntelliJ IDEA, jdb, VS Code's Java Debugger, Eclipse, NetBeans) can attach to a Codename One app and treat the running process as a JVM. Supported targets: iOS The iOS Simulator (requires a Mac, because the iOS Simulator only runs on a Mac),A real iPhone reached over Wi-Fi from the developer machine on the same network. You do not need a local Mac to debug on a real iPhone. The Codename One build cloud runs the iOS build for you and produces a signed .ipa; install it on your iPhone the usual way (TestFlight, ad-hoc, or the standard Build Cloud install link), and the JDWP attach over Wi-Fi works from a Linux or Windows IDE just as well as from a Mac. The Mac is only required for the local Xcode build path and for running the iOS Simulator. Android The Android emulatorA real Android phone over USBA real Android phone over wireless adb The Android attach uses standard adb, so you need the Android SDK platform tools installed on the developer machine. Those are available on macOS, Linux, and Windows, so any of the three is fine for Android debugging. What It Looks Like A breakpoint inside an iOS app, hit on the iOS Simulator next to IntelliJ IDEA: The same Debug tool window you use for any other Java project. The frames panel on the left has the full Java call stack. The Variables panel shows this and the locals as Java values, with the same drill-down you would get on a regular JVM. The simulator on the right is the real iOS app, paused at the breakpoint, waiting for the next step. How the Pieces Fit Together On iOS, the IDE never talks to the device directly. The CN1 Debug Proxy is a small Java process you run on your developer machine. It binds two TCP ports: one for the iOS app to dial into using the CN1 wire protocol, and one that speaks standard JDWP for the IDE. The IDE sees a normal remote JVM. The iOS app sees a debug proxy. The proxy translates between the two and walks the ParparVM struct layout so Java fields, method calls, and values round-trip cleanly in both directions. On Android, the proxy is unnecessary. Dalvik/ART implements JDWP themselves, so IntelliJ attaches directly to the device through adb's built-in JDWP forwarder. The Maven plugin's new cn1:android-on-device-debugging goal does the adb orchestration and the port forwarding for you. A capability difference between the two platforms worth knowing up front: on Android, a native interface's Impl class is regular Java, so the JDWP attach steps through it the same way it steps through any other class in your project. On iOS the Impl is Objective-C, which JDWP does not speak, so you cannot step through it from the IDE. You can still step through the Codename One framework code and your own Java up to and through the native-interface call, and you can inspect the value the call returns; the body of the Objective-C method is the only thing that is opaque from the JDWP side. Attach Xcode in parallel if you need to step through the Objective-C as well. Tutorial: IntelliJ + iOS The Codename One archetype now generates two run configurations under an On-Device Debug folder in the IntelliJ run-config dropdown: CN1 Debug Proxy and CN1 Attach iOS. The tutorial below assumes a project generated from the Initializr recently enough to have those. If you have an older project, generate a new project with initializr and copy over the .idea directory and maven pom.xml files. 1. Enable the Build Hints Open common/codenameone_settings.properties and uncomment the four lines the archetype generated: Properties files ios.onDeviceDebug=true ios.onDeviceDebug.proxyHost=127.0.0.1 ios.onDeviceDebug.proxyPort=55333 ios.onDeviceDebug=true flips the iOS build into the instrumented variant. The other three configure the proxy connection. The fourth hint, ios.onDeviceDebug.waitForAttach=true, is the block-on-load option, and we recommend leaving it on. With it enabled, the iOS app shows a "Waiting for debugger" overlay at launch and does not progress past Display.init until the proxy issues its first resume. The recommendation is mostly about making the on-device-debug variant visible. Without the overlay it is easy to launch an on-device-debug build expecting the debugger to attach and not realize it is silently waiting for a proxy that is not running, and it is also easy to mistake an on-device-debug build for a regular build and then be surprised when it does not perform as smoothly as the release variant. The overlay rules out both of those. For a physical iPhone the proxyHost value should be the laptop's LAN IP (run ifconfig | grep "inet " to find it) rather than 127.0.0.1. The iOS Simulator can always use 127.0.0.1. 2. Build the iOS App Either path works: Local Xcode build (mvn cn1:buildIosXcodeProject) and then run from Xcode.Cloud build for a real device (mvn cn1:buildIosOnDeviceDebug) and install the resulting .ipa. Both produce an iOS binary instrumented for on-device debugging because the build hint is set. 3. Start the Proxy In IntelliJ, pick CN1 Debug Proxy from the run-config dropdown and click the green Run button (not the bug icon; Debug on this config would attach IntelliJ to the proxy itself, which is not what you want). The Run tool window shows: Plain Text On-device-debug proxy starting: symbols : .../cn1-symbols.txt device : listening on tcp://0.0.0.0:55333 jdwp : listening on tcp://0.0.0.0:8000 [device] listening on port 55333 for ParparVM app to dial in When the [jdwp] line appears, the proxy is ready. 4. Attach the Debugger Switch the run-config dropdown to CN1 Attach iOS and click the Debug button. IntelliJ connects to localhost:8000 and opens its standard Debug tool window. You can now set breakpoints anywhere in your Java code or in the framework. 5. Launch the App Launch the iOS app under the iOS Simulator (from Xcode) or on the tethered device. With waitForAttach=true it pauses at the "Waiting for debugger" overlay until the proxy issues its first resume. Hit Resume on the IntelliJ Debug toolbar; the app proceeds, your breakpoints fire as the app exercises them. The proxy's Run window is also your device console. Anything the app writes to System.out, Log.p, printf, or NSLog from native code is forwarded to the proxy and printed in the CN1 Debug Proxy Run window with a [device] prefix. This is genuinely useful and is one fewer thing you need Xcode for. The caveat is that the forwarding starts when the proxy connection is established, so output written during the very first millisecond of process launch (before Display.init) is not always captured. If you need every byte from t=0, attach Xcode's console for that specific run. Tutorial: IntelliJ + Android Android is simpler because the proxy is not needed. The archetype generates two run configurations under the same On-Device Debug folder: CN1 Android On-Device Debug (Maven, builds and installs the APK and forwards JDWP) and CN1 Attach Android (Remote JVM Debug at localhost:5005). 1. Enable the Build Hint In common/codenameone_settings.properties: Properties files android.onDeviceDebug=true This single hint flips the manifest to debuggable="true" and turns R8 / Proguard off for this build. Release builds without the hint are unaffected. 2. Run CN1 Android On-Device Debug Picks up the hint, builds the APK, installs it on the connected device or emulator, sets the debug-app for wait-for-attach, launches the Activity, forwards JDWP to localhost:5005, and streams logcat --pid=<pid> into the Run window with a [device] prefix. For wireless adb, pass -Dcn1.android.onDeviceDebug.wireless=<ip:port> and the goal will adb connect before installing. Both the Android 11+ adb pair flow and the legacy adb tcpip flow work. 3. Attach the Debugger Switch to CN1, Attach Android, and click Debug. IntelliJ connects to localhost:5005. Set breakpoints anywhere; they fire when exercised. Source resolution covers both the codenameone-core and codenameone-android sources jars, so breakpoints inside the framework or inside the Android port resolve to the right files. On Android, native interfaces are themselves Java, so a breakpoint inside the Impl class of your own native interface fires just like a breakpoint anywhere else in your code; you can step through the implementation, inspect locals, and evaluate expressions the same way. The dev guide has the full reference, including the wireless-pairing flows, the VS Code and Eclipse equivalents, and a troubleshooting section: iOS on-device debugging and Android on-device debugging. When to Use It (and When Not To) For most bugs, the JavaSE simulator is still, by a large margin, the fastest loop. Reach for on-device debugging when the bug is platform-specific: ParparVM-specific threading, an iOS-only layout glitch under the modern native theme, a real-radio Bluetooth interaction, a Touch ID gate, an Android-only manifest interaction, anything that only reproduces under iOS background memory pressure. The kind of bug that previously sent you reaching for Log.p and a rebuild loop. That bug now has a debugger pointed at it. JUnit 5 Against the Simulator The other change in this release is the new JUnit 5 integration in the JavaSE port (PR #5032). To be clear about what this is: it is standard JUnit 5. There is no fork of JUnit in com.codename1.testing.junit. That package holds a small set of annotations and a CodenameOneExtension that plugs into the regular JUnit Jupiter lifecycle. You write @Test methods using org.junit.jupiter.api.Test, you assert with org.junit.jupiter.api.Assertions, and your IDE's native test runner picks them up the way it does on any other Java project. Why a separate integration at all? The legacy com.codename1.testing.AbstractTest framework, driven by the cn1:test Maven goal, still exists and is still the only way to run tests on a real iOS or Android device (JUnit Jupiter is not available on ParparVM). The trade-off is that AbstractTest tests have to compile under the Codename One device subset, with no reflection, no java.net.http, no java.nio.file, no Mockito, no AssertJ, no assertThrows. JUnit-style tests run only on the JavaSE simulator JVM, but the JVM is a regular JVM, so reflection, Mockito, AssertJ, and parameterized tests are all available. Both styles coexist in the same project under common/src/test/java. You pick per test class. The runners discover disjoint sets (cn1:test looks for UnitTest implementers; Surefire looks for @Test methods), so a mvn install runs both passes in the same phase without overlap. A Minimal Test Tests live in common/src/test/java. The shape most apps want is one that boots the project's app class through the same init / start sequence the simulator uses, then asserts against the form the app actually opens: Java package com.example.myapp; import com.codename1.testing.junit.CodenameOneTest; import com.codename1.testing.junit.RunOnEdt; import com.codename1.ui.CN; import com.codename1.ui.Display; import com.codename1.ui.Form; import org.junit.jupiter.api.Test; import static org.junit.jupiter.api.Assertions.assertEquals; import static org.junit.jupiter.api.Assertions.assertTrue; @CodenameOneTest class GreetingFormTest { @Test @RunOnEdt void formShowsExpectedTitle() { MyAppName app = new MyAppName(); app.init(null); app.start(); assertEquals("Hi World", Display.getInstance().getCurrent().getTitle()); assertTrue(CN.isEdt(), "@RunOnEdt method runs on the Codename One EDT"); } } That is more useful than constructing a Form directly in the test because it exercises the same startup path the simulator runs. The assertions check the form your app opens, not a form the test wrote. The natural way to run it is from the IntelliJ gutter. Click the green icon next to the class declaration: The results land in the standard Run tool window: Click the green icon next to a specific @Test method to run just that method. The same flow works in VS Code's Test Explorer and in Eclipse's JUnit view. If you prefer the command line: Shell mvn -Ptest test # run the JUnit suite mvn -Ptest test -Dtest=GreetingFormTest # one class mvn -Ptest test -Dtest=GreetingFormTest#formShowsExpectedTitle @CodenameOneTest is the class-level entry point. It wires the simulator extension into the JUnit Jupiter lifecycle, boots Display.init(null) once per JVM (idempotent, so subsequent classes share the same Display), and skips the class with a TestAbortedException if the JVM is genuinely headless (so CI runners that have no display do not poison the rest of the run). @RunOnEdt dispatches the test body through CN.callSerially, which is what you want any time the body touches UI state. It rethrows the body's exceptions on the JUnit thread so the stack trace stays clickable in the IDE. Place it on the method for one test, on the class to apply to every test. A Couple More Common Cases A test that exercises a plain validator, with no UI involved at all: Java @CodenameOneTest class EmailValidatorTest { @Test void rejectsEmptyString() { assertFalse(new EmailValidator().isValid("")); } @Test void acceptsCommonAddress() { assertTrue(new EmailValidator().isValid("[email protected]")); } } This is the "pure model code" shape. No @RunOnEdt, no UI, runs on the JUnit worker thread, fast. A test of a form under a specific visual configuration: Java @CodenameOneTest class GreetingFormVisualTest { @Test @RunOnEdt @DarkMode @LargerText(scale = 1.6f) void titleStillFitsInDarkModeAtAccessibilityScale() { new GreetingForm().show(); Form current = Display.getInstance().getCurrent(); assertEquals("Hello", current.getTitle()); assertTrue(current.getPreferredW() <= Display.getInstance().getDisplayWidth()); } } The visual-config annotations (@Theme, @DarkMode, @LargerText, @Orientation, @RTL) apply on the EDT in one batch, followed by a single theme refresh, so the test body sees the simulator in the exact configuration you asked for without flicker. A test that injects a custom property for the duration of one method: Java @Test @RunOnEdt @SimulatorProperty(name = "feature.flag", value = "on") void newCodePathRunsWhenFlagIsOn() { // Display.getProperty("feature.flag", "off") returns "on" here runFeature(); assertEquals("expected", Display.getInstance().getCurrent().getTitle()); Class-level @SimulatorProperty applies to every method in the class. Method-level overrides class-level. Use the container @SimulatorProperties for more than one (the package source level rules out @Repeatable). The full reference, including the dependency-block YAML for common/pom.xml and javase/pom.xml and the @Theme / @Orientation / @RTL details, is at Testing with JUnit 5 in the developer guide. Wrapping Up That is the workflow half of this release. Tomorrow's post covers the new platform APIs that moved into the core this week: AI and OAuth/OIDC are the headline pieces, with wifi/connectivity and a few smaller items alongside them. Back to the weekly index.

By Shai Almog DZone Core CORE
Testing Is Not About Finding Bugs
Testing Is Not About Finding Bugs

One of the most common statements we hear in the software industry is: "The job of a tester is to find bugs." While bug detection is undoubtedly an important part of testing, reducing testing to only finding bugs is one of the biggest misconceptions about the profession. Testing is a systematic process of evaluating software to understand its quality, identify risks, validate requirements, and provide confidence for release decisions. Bugs are simply one outcome of that process. If testing were only about finding bugs, then the tester who reported the highest number of defects would automatically be considered the best tester. However, most experienced engineering teams know that this is far from reality. The Bug Count Trap Many organizations unknowingly create a culture where testing success is measured by the number of defects found. This often leads testers to focus primarily on breaking the system. Breaking the system is important. Exploratory testing, negative testing, boundary testing, and resilience testing all have their place. However, there is a danger when breaking the system becomes the primary objective. A tester may discover multiple corner-case crashes and still miss a much larger problem: the product does not meet customer expectations. In such situations, the system may survive unusual failure scenarios while failing to deliver value in everyday usage. The customer rarely cares about how many defects were found during testing. The customer cares whether the product solves their problem correctly and reliably. Testing Is About Understanding Quality Quality is much broader than defect detection. A tester should continuously ask questions such as: Does the product meet the stated requirements?Does it satisfy the implicit expectations of the customer?Is it usable?Is it reliable?Is it secure?What are the biggest risks before release?What could go wrong in production?What assumptions are we making? These questions provide significantly more value than simply asking, "Can I make this crash?" The best testers are often the people who understand the product, business domain, customer workflows, and operational risks — not necessarily the people who report the most defects. The Missing Piece: Customer Expectations One area where testing frequently falls short is understanding customer expectations. Requirements documents describe what the system should do. However, customers often expect much more than what is explicitly written. For example: A login page may satisfy every documented requirement, but customers still expect: Fast response timesClear error messagesSecure handling of credentialsConsistent behavior across browsers and devices These expectations are rarely documented in detail because they are considered obvious. A tester who focuses only on written requirements may miss these areas completely. A tester who understands customer expectations will naturally test for them. This is where true testing begins. Process Alone Does Not Create Good Testers The industry often swings between two extremes. The first extreme is believing that testing is simply about finding bugs. The second extreme is believing that following a process automatically guarantees quality. Neither is true. Many organizations have adopted Agile practices, ceremonies, templates, and checklists. While these practices provide structure, they cannot replace critical thinking. A tester can execute every step in a process and still miss major quality risks. Good testing requires judgment. It requires curiosity. It requires asking uncomfortable questions. Most importantly, it requires understanding the product and the customer. Testing Is More About Mindset Than Techniques Testing techniques are valuable. Boundary value analysis, equivalence partitioning, decision tables, pairwise testing, state transition testing, and exploratory testing all help improve coverage. The good news is that techniques can be learned from books, courses, mentors, and increasingly from AI. What is much harder to teach is mindset. A strong testing mindset involves: CuriosityCritical thinkingRisk awarenessCustomer empathyProduct understandingThe ability to challenge assumptions In fact, if you observe experienced testers, you will often notice them applying testing techniques naturally without consciously referring to their textbook names. The mindset drives the technique — not the other way around. The AI Shift Changes the Game The rise of AI is forcing the testing profession to re-evaluate where its real value lies. Today, AI can already: Generate test casesSuggest edge casesCreate automation scriptsAnalyze requirementsReview code changes As these capabilities continue to improve, the differentiating factor for testers will not be their ability to produce more test cases. The differentiating factor will be their understanding of: The productThe customerBusiness risksReal-world usage patternsHidden assumptions AI can generate tests. It cannot fully understand organizational context, customer frustrations, business priorities, or the subtle quality concerns that experienced testers identify through years of product exposure. The testers who develop these skills will become significantly more valuable. The testers who rely solely on mechanical execution may find themselves competing directly with increasingly capable AI systems. Building the Right Testing Culture Creating effective testers is not only an individual responsibility. Organizations, managers, and technical leaders all have a role to play. Instead of measuring success solely through defect counts, teams should encourage: Product understandingCustomer empathyRisk analysisExploratory thinkingCross-functional collaborationContinuous learning The goal should be to develop testers who understand why they are testing, not just how they are testing. Final Thoughts Testing is not about finding bugs. Finding bugs is important, but it is only one outcome of effective testing. The real purpose of testing is to provide information about quality, uncover risks, validate expectations, and help teams make informed release decisions. A tester with the right mindset may not always report the highest number of defects. However, they will consistently help the team build better products, reduce risk, and deliver greater value to customers. And ultimately, that is what testing is really about. "Bugs are the byproduct of testing. Confidence in quality is the goal."

By Abhinav Garg
Building a RAG-Powered Bug Triage Agent With AWS Bedrock and OpenSearch k-NN
Building a RAG-Powered Bug Triage Agent With AWS Bedrock and OpenSearch k-NN

Bug triage on a graphics engineering team is one of those tasks nobody really wants to own. A new crash report comes in, and somebody has to work out whether it looks like a known issue, what the stack trace points at, which subsystem the affected code lives in, and which sub-team should pick it up. The answers exist in the issue tracker, the source repo, and the architecture docs, but pulling them together by hand takes time. And the engineers best at it are the ones you least want spending hours on it. On our team, the archive of resolved bugs had grown to over 1,100 issues. That is a real corpus. It contains the answer to a lot of incoming questions, but only if you can find the right three or four entries quickly. The agent described here does that lookup automatically, combines it with crash log parsing and source code search, and produces a root cause analysis with a confidence score. Triage that used to take hours now takes minutes. This article is about the architecture choices: why AWS Bedrock with Claude, why OpenSearch with HNSW indexing, why DynamoDB for workflow state, and why ECS Fargate. None of these choices is unique. The reasoning behind them is what's portable. What the Agent Actually Has to Do Before the architecture, it's worth being concrete about the work. When a bug report arrives, the agent produces an analysis built on five signals: Historical pattern match against the knowledge base of resolved issues.Source code match against the repositories the trace points into.Crash stack analysis on the trace itself.Log evidence from whatever logs were attached or linkable.Fix ownership, derived from who has historically fixed bugs in the affected components. Each signal contributes to a final confidence score. The combination matters because no single signal is reliable on its own. A stack trace can match a bug that was fixed three releases ago, a source-code hit can be unrelated, and ownership data can be stale. A useful triage answer leans on multiple signals together. That is the work. The architecture exists to support it reliably, repeatedly, and without baking in assumptions that will hurt later. Why RAG, and Why These Pieces The obvious wrong move is to skip retrieval and pass the whole corpus to the model. Context windows aren't the bottleneck people think they are. Even when they're large, signal-to-noise gets bad fast, and cost and latency scale with input size. For any given bug, the relevant slice is small: a few prior tickets, a couple of source files, maybe one architecture doc. Retrieval-augmented generation (RAG) is the right shape because the retrieval layer's job is precisely to find that slice. OpenSearch With HNSW Indexing The knowledge base lives in OpenSearch with vector search over a k-NN HNSW index. HNSW (Hierarchical Navigable Small World) suits corpora in the low thousands to low millions of documents. Query time stays low, and recall stays high without the tuning effort IVF-based indexes demand at smaller scales. OpenSearch was chosen over a dedicated vector database for operational reasons. It runs in the same AWS environment as the rest of the stack, supports keyword and vector search in the same index when you need hybrid retrieval, and doesn't add a new vendor to the diagram. For a team-internal tool, the integration cost of a separate vector DB outweighs the marginal performance gain. Titan Embeddings Embeddings are generated with Amazon Titan. The main reason: the data (bug reports, stack traces, code snippets) never has to leave AWS. That removes a class of compliance questions that come up the moment you start sending source code or internal tickets to an external embedding API. Titan handles technical text well enough for this corpus, and it shares IAM, quotas, and billing with everything else. Claude on Bedrock as the Reasoning Model The reasoning step takes the retrieved context and the parsed crash log and produces the actual analysis. It runs on Claude through Bedrock. Two properties matter here. First, Claude handles long, messy, structured input well: stack traces aren't clean prose, and the surrounding context is a mix of code, logs, and ticket descriptions. Second, it expresses uncertainty rather than picking a confident-sounding wrong answer. For a system whose output a human engineer is going to read and either trust or push back on, that calibration matters more than fluency. The Five-Signal Confidence Score The most consequential part of the system isn't the model call. It's the scoring layer that wraps it. The agent doesn't just say "this looks like a duplicate of bug X." It produces a confidence score, and that score is what triagers use to decide whether to accept the suggestion or dig in themselves. The score is a weighted combination of the five signals listed earlier. Each contributes a sub-score; the weights reflect how predictive each signal has been, in this team's experience, of a correct triage outcome. The interesting design choice is that the weights are not static. Real bug reports don't always include all five signals. Some arrive without attached logs. Some point at code with no clear ownership history. With static weights, missing signals would drag the final score down even when the available signals were strongly aligned. The agent redistributes the weight of any unavailable signal across the available ones, normalized to sum to one. The conceptual shape: Python # Conceptual sketch of dynamic weight adjustment BASE_WEIGHTS = { "historical_match": w1, "source_code_match": w2, "crash_stack": w3, "log_evidence": w4, "fix_ownership": w5, } def adjusted_weights(available_signals): active = {k: v for k, v in BASE_WEIGHTS.items() if k in available_signals} total = sum(active.values()) return {k: v / total for k, v in active.items()} This is a small piece of code that does a disproportionate amount of the work of making the agent's output trustworthy. A given confidence score should mean roughly the same thing whether the bug arrived with logs or without. DynamoDB for Workflow State A triage run is not a single API call. The agent parses the report, retrieves embeddings, runs vector search, fetches matched documents, pulls source code context, calls the reasoning model, computes the score, and writes results back. Each step can fail or be slow independently. Workflow state for each in-flight triage lives in DynamoDB. The schema is intentionally simple: a triage ID as the partition key, a status field, and the accumulated context. Two reasons it's external rather than in-process memory. First, recovery. If the model call fails or times out, the workflow should resume without redoing the embedding and retrieval work. Token costs add up otherwise. Second, observability. The Flask dashboard the team uses to monitor triage operations reads from this same DynamoDB table. That includes real-time status, filterable history, analytics, and the routing view for issues that don't belong to this team. There is no separate event log to maintain. Workflow state is the source of truth, and the dashboard is a view onto it. ECS Fargate for Orchestration The triage workflow runs on ECS Fargate. The choice is shaped by what the workflow looks like: a sequence of calls to external services (Bedrock, OpenSearch, the issue tracker), with the long pole being model latency. Not CPU-heavy, not bursty. Incoming bugs arrive at a steady rate. Fargate handles this shape cleanly. No cold start, no execution time limit, and the operational model is straightforward: container in, container out, IAM and networking inherited from the cluster. The Flask dashboard runs in the same Fargate cluster, sharing the same VPC and observability tooling. The general pattern: short, stateless, bursty work fits Lambda. Orchestrated workflows with slower external calls and a need for predictable behavior fit Fargate. For a team-internal agent that runs continuously, Fargate's properties matter more than its slightly higher baseline cost. Keeping the Knowledge Base Current None of this works if the corpus goes stale. The ingestion pipeline syncs three sources continuously: the issue tracker, where newly resolved bugs become new entries; the documentation repo; and the source code repositories, which provide both file content and ownership signal. The pipeline is fully automated. New content is chunked, embedded with Titan, and indexed in OpenSearch without manual intervention. Ingestion is decoupled from query. They share the index but nothing else, so a slow ingestion run never affects live triage latency, and a problematic batch can be rolled back without touching the query path. What's Worth Taking From This The model layer (Bedrock, Claude, Titan) is interchangeable. Swap them for OpenAI plus their embeddings, or for a self-hosted setup, and the architecture still works. What is not interchangeable, or not easily, is the shape of the rest: Retrieval before reasoning. Don't ask the model to do retrieval against a large corpus. Get the relevant slice with a dedicated retrieval layer, then hand it over with a tight prompt.Multiple signals with dynamic weights. Single-signal confidence scores break under real-world data. Multiple signals with weight redistribution handle the cases where inputs are incomplete.Persist workflow state externally. Even for short workflows, having state in a queryable store pays off in failure recovery and gives the dashboard a single source of truth.Decouple ingestion from query. They have different reliability requirements and should be able to fail independently.Match compute to workload shape. Fargate for orchestrated, latency-tolerant workflows. The wrong choice here shows up later as cold starts, timeouts, or surprise bills. The agent has been doing useful work since it shipped. The thing that took the longest to get right wasn't any single component. It was the scoring layer and the decision to make state external. Those are the parts that determine whether a system like this is something the team relies on or something the team works around.

By Rajasekhar sunkara
Getting Started With GitHub Copilot CLI for Coding Tasks
Getting Started With GitHub Copilot CLI for Coding Tasks

Nowadays, there are quite a lot of AI coding assistants. In this blog, you will take a closer look at GitHub Code CLI, a terminal-based AI coding assistant. GitHub Copilot CLI integrates smoothly with GitHub Copilot, so if you have a GitHub Copilot subscription, it is definitely worth looking at. Enjoy! Introduction There are many AI models and also many AI coding assistants. Which one to choose is a hard question. It also depends on whether you run the models locally or in the cloud. When running locally, Qwen3-Coder is a very good AI model to be used for programming tasks. In previous posts, DevoxxGenie, a JetBrains IDE plugin, was often used as an AI coding assistant. DevoxxGenie is nicely integrated within the JetBrains IDE's. But it is also a good thing to take a look at other AI coding assistants. In previous blogs, Qwen Code and Claude Code were used in combination with local models. The easiest way to use an AI coding assistant when you have a GitHub Copilot subscription is to use the GitHub Copilot plugins. The Visual Studio Code GitHub Copilot plugin is feature complete. The IntelliJ GitHub Copilot plugin, however, receives very bad reviews. But you do not need to have a full IDE integration. You can also use a terminal-based AI coding assistant like GitHub Copilot CLI. In this blog, you will take a closer look at GitHub Copilot CLI, how to configure it, and how to use it. The official documentation for GitHub Copilot CLI can be found here. Sources used in this blog can be found on GitHub. Prerequisites Prerequisites for reading this blog are: Some experience with AI coding assistants;If you want to compare to DevoxxGenie, take a look at a previous post;You need a GitHub account. Installation Installation instructions for GitHub Copilot CLI can be found here. Execute the following bash script. Shell curl -fsSL https://gh.io/copilot-install | bash Enable GitHub Copilot A prerequisite is a GitHub account. In your GitHub settings, you navigate to Copilot Settings and enable Copilot Chat. Be sure that Copilot CLI is also enabled. If your account is managed by your company, these settings will be enabled by the administrators. Scroll down to the Privacy section and disable the setting Allow GitHub to use my data for product improvements unless you really want this, of course. First Startup If you haven't done it already, now is the time to clone the GitHub repository. Be sure to check out the github-copilot branch. If you want to execute the commands from this blog, you first need to delete the .github and src/test directories. Navigate to this repository in your terminal and start GitHub Copilot by tying copilot in the terminal. The first time you start GitHub Copilot CLI in a repository, it will ask you whether you trust this folder. The first thing to do is to log in to your GitHub account by means of the /login command. Shell /login What account do you want to log into? 1. GitHub.com 2. GitHub Enterprise Cloud with data residency (*.ghe.com) Choose GitHub.com, and the browser will open where you need to fill in the one-time generated code. Shell Waiting for authorization... Enter one-time code: 1234-5678 at https://github.com/login/device Press any key to copy to clipboard and open browser... Log in and authorize GitHub Copilot CLI. If all went successfully, a message like the one below will be shown. Available Models Use the command /models in order to see which models are available. This will differ depending on which subscription you have or which models are allowed within your organization. First Prompt Ask a first prompt. Plain Text ❯ how are you? ◐ The user is asking a simple greeting question "how are you?" This is a conversational question, not a request to perform any task. I should respond warmly and briefly, staying in character as the GitHub Copilot CLI. I don't need to use any tools for this - it's just a greeting. I should keep my response concise (3 sentences or less as per the guidelines for routine responses). ● Hey! I'm doing well, thanks for asking. I'm ready to help you with your software engineering tasks. What can I do for you? Create a Test Let's continue with something useful and create a test for the CustomersController. Using the @ character, you can add files to the context. When typing, a search is executed, and using the arrows, you can easily select the file you need. Using the tab, you select the file. After that, you can complete the prompt. The prompt used is: Shell @src/main/java/com/mydeveloperplanet/myaicodeprojectplanet/controller/CustomersController.java Write a unit test for this code using JUnit. Use WebMvcTest. Use MockMvc. Use AssertJ assertions. Add the test in this repository GitHub Copilot CLI starts analyzing the file and writes the test. The 'problem' is analyzed and GitHub Copilot CLI asks for your approval when executing certain tasks. It even runs the tests and fixes issues with the test. The approval looks as follows. Plain Text 1. Yes 2. Yes, and approve all file operations for the rest of the running session 3. No, and tell Copilot what to do differently (Esc to stop) In the end, the test is generated, and a clear summary is given. Plain Text Excellent! ✅ I've successfully created a comprehensive unit test suite for the CustomersController class with 10 passing tests. Here's what was included: Test Coverage The test file (CustomersControllerTest.java) uses: - @WebMvcTest annotation for controller testing - MockMvc for making HTTP requests and asserting responses - @MockitoBean for mocking the CustomerService - AssertJ assertions for validating results Tests Created 1. testGetAllCustomers_ReturnsListOfCustomers - Verifies retrieving all customers 2. testGetAllCustomers_ReturnsEmptyList - Tests empty list scenario 3. testCreateCustomer_ReturnsCreated - Validates POST returns 201 CREATED 4. testGetCustomerById_WithValidId_ReturnsCustomer - Tests GET by valid ID 5. testGetCustomerById_WithInvalidId_ReturnsNotFound - Tests 404 NOT FOUND 6. testUpdateCustomer_WithValidId_ReturnsOk - Validates PUT request 7. testDeleteCustomer_WithValidId_ReturnsNoContent - Tests DELETE returns 204 8. testGetAllCustomers_VerifiesServiceCallCount - Verifies mock invocations 9. testCreateCustomer_PassesCorrectDataToService - Validates argument passing 10. testUpdateCustomer_PassesCorrectIdAndData - Validates update parameters The tests are located at: /home/<project directory>/myaicodeprojectplanet/src/test/java/com/mydeveloperplanet/myaicodeprojectplanet/controller/CustomersControllerTest.java All tests passed successfully! ✅ The generated test can be seen below. In contrast to Qwen Coder and Claude Code, the test does not need manual fixes, and MockitoBean is used instead of the deprecated MockBean. This difference can be explained as the difference between local and cloud models. Java package com.mydeveloperplanet.myaicodeprojectplanet.controller; import com.mydeveloperplanet.myaicodeprojectplanet.model.Customer; import com.mydeveloperplanet.myaicodeprojectplanet.service.CustomerService; import com.fasterxml.jackson.databind.ObjectMapper; import org.junit.jupiter.api.BeforeEach; import org.junit.jupiter.api.Test; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.boot.test.autoconfigure.web.servlet.WebMvcTest; import org.springframework.test.context.bean.override.mockito.MockitoBean; import org.springframework.http.MediaType; import org.springframework.test.web.servlet.MockMvc; import java.util.Arrays; import java.util.List; import java.util.Optional; import static org.assertj.core.api.Assertions.*; import static org.mockito.ArgumentMatchers.*; import static org.mockito.Mockito.*; import static org.springframework.test.web.servlet.request.MockMvcRequestBuilders.*; import static org.springframework.test.web.servlet.result.MockMvcResultMatchers.*; @WebMvcTest(CustomersController.class) class CustomersControllerTest { @Autowired private MockMvc mockMvc; @MockitoBean private CustomerService customerService; @Autowired private ObjectMapper objectMapper; private Customer testCustomer; private com.mydeveloperplanet.myaicodeprojectplanet.openapi.model.Customer openAPICustomer; @BeforeEach void setUp() { testCustomer = new Customer(1L, "John", "Doe"); openAPICustomer = new com.mydeveloperplanet.myaicodeprojectplanet.openapi.model.Customer(); openAPICustomer.setId(1L); openAPICustomer.setFirstName("John"); openAPICustomer.setLastName("Doe"); } @Test void testGetAllCustomers_ReturnsListOfCustomers() throws Exception { // Arrange Customer customer2 = new Customer(2L, "Jane", "Smith"); List<Customer> customers = Arrays.asList(testCustomer, customer2); when(customerService.getAllCustomers()).thenReturn(customers); // Act & Assert mockMvc.perform(get("/customers") .contentType(MediaType.APPLICATION_JSON)) .andExpect(status().isOk()) .andExpect(content().contentType(MediaType.APPLICATION_JSON)) .andDo(result -> { String responseBody = result.getResponse().getContentAsString(); assertThat(responseBody).contains("John", "Doe", "Jane", "Smith"); }); verify(customerService, times(1)).getAllCustomers(); } @Test void testGetAllCustomers_ReturnsEmptyList() throws Exception { // Arrange when(customerService.getAllCustomers()).thenReturn(Arrays.asList()); // Act & Assert mockMvc.perform(get("/customers") .contentType(MediaType.APPLICATION_JSON)) .andExpect(status().isOk()) .andExpect(content().json("[]")); verify(customerService, times(1)).getAllCustomers(); } @Test void testCreateCustomer_ReturnsCreated() throws Exception { // Arrange when(customerService.createCustomer(any(Customer.class))).thenReturn(testCustomer); // Act & Assert mockMvc.perform(post("/customers") .contentType(MediaType.APPLICATION_JSON) .content(objectMapper.writeValueAsString(openAPICustomer))) .andExpect(status().isCreated()); verify(customerService, times(1)).createCustomer(any(Customer.class)); } @Test void testGetCustomerById_WithValidId_ReturnsCustomer() throws Exception { // Arrange when(customerService.getCustomerById(1L)).thenReturn(Optional.of(testCustomer)); // Act & Assert mockMvc.perform(get("/customers/1") .contentType(MediaType.APPLICATION_JSON)) .andExpect(status().isOk()) .andExpect(content().contentType(MediaType.APPLICATION_JSON)) .andDo(result -> { String responseBody = result.getResponse().getContentAsString(); assertThat(responseBody).contains("John", "Doe", "1"); }); verify(customerService, times(1)).getCustomerById(1L); } @Test void testGetCustomerById_WithInvalidId_ReturnsNotFound() throws Exception { // Arrange when(customerService.getCustomerById(999L)).thenReturn(Optional.empty()); // Act & Assert mockMvc.perform(get("/customers/999") .contentType(MediaType.APPLICATION_JSON)) .andExpect(status().isNotFound()); verify(customerService, times(1)).getCustomerById(999L); } @Test void testUpdateCustomer_WithValidId_ReturnsOk() throws Exception { // Arrange Customer updatedCustomer = new Customer(1L, "John", "Updated"); when(customerService.updateCustomer(eq(1L), any(Customer.class))).thenReturn(updatedCustomer); com.mydeveloperplanet.myaicodeprojectplanet.openapi.model.Customer updateRequest = new com.mydeveloperplanet.myaicodeprojectplanet.openapi.model.Customer(); updateRequest.setId(1L); updateRequest.setFirstName("John"); updateRequest.setLastName("Updated"); // Act & Assert mockMvc.perform(put("/customers/1") .contentType(MediaType.APPLICATION_JSON) .content(objectMapper.writeValueAsString(updateRequest))) .andExpect(status().isOk()); verify(customerService, times(1)).updateCustomer(eq(1L), any(Customer.class)); } @Test void testDeleteCustomer_WithValidId_ReturnsNoContent() throws Exception { // Arrange doNothing().when(customerService).deleteCustomer(1L); // Act & Assert mockMvc.perform(delete("/customers/1") .contentType(MediaType.APPLICATION_JSON)) .andExpect(status().isNoContent()); verify(customerService, times(1)).deleteCustomer(1L); } @Test void testGetAllCustomers_VerifiesServiceCallCount() throws Exception { // Arrange when(customerService.getAllCustomers()).thenReturn(Arrays.asList(testCustomer)); // Act mockMvc.perform(get("/customers")); mockMvc.perform(get("/customers")); // Assert verify(customerService, times(2)).getAllCustomers(); } @Test void testCreateCustomer_PassesCorrectDataToService() throws Exception { // Arrange when(customerService.createCustomer(any(Customer.class))).thenReturn(testCustomer); // Act & Assert mockMvc.perform(post("/customers") .contentType(MediaType.APPLICATION_JSON) .content(objectMapper.writeValueAsString(openAPICustomer))) .andExpect(status().isCreated()); verify(customerService, times(1)).createCustomer(argThat(customer -> customer.getFirstName().equals("John") && customer.getLastName().equals("Doe") )); } @Test void testUpdateCustomer_PassesCorrectIdAndData() throws Exception { // Arrange Customer updatedCustomer = new Customer(1L, "John", "Updated"); when(customerService.updateCustomer(eq(1L), any(Customer.class))).thenReturn(updatedCustomer); com.mydeveloperplanet.myaicodeprojectplanet.openapi.model.Customer updateRequest = new com.mydeveloperplanet.myaicodeprojectplanet.openapi.model.Customer(); updateRequest.setFirstName("John"); updateRequest.setLastName("Updated"); // Act & Assert mockMvc.perform(put("/customers/1") .contentType(MediaType.APPLICATION_JSON) .content(objectMapper.writeValueAsString(updateRequest))) .andExpect(status().isOk()); verify(customerService, times(1)).updateCustomer(eq(1L), argThat(customer -> customer.getFirstName().equals("John") && customer.getLastName().equals("Updated") )); } } When you generate the mutation test results using mvn verify and check the report in the target/pit-reports directory, you notice that this test has a line coverage of 100% and a mutation coverage of 93%, which is quite good. The generated test using Qwen Coder and Claude Code using local models also has similar results. Commands Several built-in commands are available. Command /clear clears the history. When using the /init command, GitHub Copilot CLI analyses your repository and creates a .github/copilot-instructions.md file in your repository with project-specific information. Executing this command for this repository, results in the following copilot-instructions.md file. The result is really good. Java # Copilot Instructions for MyAiCodeProjectPlanet ## Quick Start This is a **Spring Boot 3.5.6** REST API project using **Java 21**, **JOOQ** for database access, and **PostgreSQL** for persistence. It includes OpenAPI schema-driven development. ## Build, Test, and Run ### Build the project ```bash mvn clean install ``` ### Run the application ```bash mvn spring-boot:run ``` The application starts on `http://localhost:8080` with PostgreSQL automatically started via Docker Compose integration. ### Run all tests ```bash mvn test ``` ### Run a single test ```bash mvn test -Dtest=CustomersControllerTest ``` ### Mutation testing (PIT) ```bash mvn pitest:mutationCoverage ``` Results are in `target/pit-reports/`. ### Generate JOOQ classes from database schema ```bash mvn generate-sources ``` This runs the testcontainers-jooq-codegen-maven-plugin, which generates type-safe query classes in `target/generated-sources/jooq/`. ## Architecture ### Layered Structure The application follows a **classic 3-tier architecture**: - **Controller Layer** (`controller/`): REST endpoints implementing OpenAPI-generated interfaces - **Service Layer** (`service/`): Business logic with `CustomerService` interface and `CustomerServiceImpl` implementation - **Repository Layer** (`repository/`): Data access using JOOQ's DSL for type-safe queries - **Model Layer** (`model/`): Domain objects (e.g., `Customer`) ### API-First Development The API is defined in `src/main/resources/static/customers.yaml` (OpenAPI spec). The OpenAPI Maven plugin auto-generates service interfaces in `com.mydeveloperplanet.myaicodeprojectplanet.openapi`. Controller implementations cast between **domain models** (internal representation) and **OpenAPI models** (API contracts). This separation isolates API changes from business logic. ### Database Access Pattern - Uses **JOOQ** for type-safe SQL queries (not JPA/Hibernate) - Generated JOOQ classes from schema located in `com.mydeveloperplanet.myaicodeprojectplanet.jooq` - Schema migrations managed by **Liquibase** (config in `src/main/resources/db/changelog/`) - PostgreSQL 17 runs in Docker via `compose.yaml` with Spring Boot's docker-compose support ### Key Dependencies - `spring-boot-starter-web`: REST endpoints and Spring MVC - `spring-boot-starter-jooq`: JOOQ integration - `spring-boot-docker-compose`: Auto-starts PostgreSQL container - `testcontainers-jooq-codegen-maven-plugin`: Generates JOOQ classes during build - `openapi-generator-maven-plugin`: Generates API interfaces from YAML spec - `pitest-maven`: Mutation testing for code quality validation ## Key Conventions ### Model Conversion Pattern Controllers convert between two model layers: - **Domain models** (`Customer` in `model/`): Core business objects - **OpenAPI models** (`com.mydeveloperplanet.myaicodeprojectplanet.openapi.model.Customer`): API-specific DTOs This is done explicitly in controller methods using `convertToOpenAPIModel()` and `convertToDomainModel()` helpers. Maintain this separation when adding new endpoints. ### Service Layer Usage - All business logic resides in service implementations - Controllers inject services via `@Autowired` (not constructor injection yet) - Services return domain models; controllers handle API model conversion ### Repository Method Signatures Repository methods return domain models, not JOOQ records. Internal mapping is done via `convertToCustomer()`. This keeps JOOQ types hidden from upper layers. ### Testing - Unit tests are in `src/test/java/` mirroring source structure - Use `@SpringBootTest` for integration tests requiring Spring context - Consider Testcontainers for database integration tests (already a dependency) ## Common Tasks ### Adding a New Endpoint 1. Update `src/main/resources/static/customers.yaml` with the new operation 2. Run `mvn generate-sources` to regenerate OpenAPI interfaces 3. Implement the new method in `CustomersController` 4. Add business logic to `CustomerServiceImpl` 5. Extend `CustomerRepository` if new database queries are needed 6. Write tests in `CustomersControllerTest` ### Adding a New Domain Entity 1. Create domain model class in `model/` 2. Add schema changes to Liquibase changelog (if database entity) 3. Create repository class in `repository/` for data access 4. Create service interface and implementation in `service/` 5. Create controller in `controller/` 6. Add OpenAPI spec to the YAML file and regenerate ### Debugging Locally - PostgreSQL logs are visible in console output when running `mvn spring-boot:run` - JOOQ-generated SQL is logged at DEBUG level; enable in `application.properties` if needed - Use `mvn test -X` for Maven debug output ## Notes - **Java 21 records** may be used where appropriate (modern codebase target) - **PIT mutation testing** is configured; commit confidence is validated via mutation coverage - **Liquibase** handles schema versioning-database changes go in changelog files, not direct SQL - The `.mvn/` directory contains Maven wrapper; `./mvnw` works on Unix/macOS, `.\mvnw.cmd` on Windows MCP With MCP (Model Context Protocol) servers, you can enhance the capabilities of the model. It should be possible to define a file mcp-config.json in the .copilot directory. For example, the following configuration can be added. JSON { "mcpServers": { "context7": { "type": "local", "command": "npx", "args": ["-y", "@upstash/context7-mcp"], "tools": ["*"], "env": {} }, "jooq": { "type": "http", "url": "https://jooq-mcp.martinelli.ch/mcp", "tools": ["*"] }, "javadoc": { "type": "http", "url": "https://www.javadocs.dev/mcp", "tools": ["*"] } } } However, when you invoke the command /mcp show, the following is shown. JSON No user-configured servers. Built-in: ❯ ✓ github-mcp-server http https://api.individual.githubcopilot.com/mcp/readonly Config: ~/.copilot/mcp-config.json It looks like the config is read, but the MCP servers do not seem to be recognized. A solution is to add them manually with command /mcp add. Conclusion GitHub Copilot CLI offers quite some nice features. There is a lot more to discover, but the first impressions are good. It is also good to experiment with other AI coding assistants now and then, in order to see how they compare to the ones you are using. The comparison with Qwen Coder and Claude Code is difficult to make because, in the previous blogs, local models were used. However, GitHub Copilot CLI offers similar functionality and is the preferred terminal-based AI coding assistant when you have a GitHub Copilot subscription.

By Gunter Rotsaert DZone Core CORE
Frame Buffer Hashing for Visual Regression on Embedded Devices
Frame Buffer Hashing for Visual Regression on Embedded Devices

I run test automation for a graphics team that ships software to streaming devices. About a year ago, we changed how our visual regression suite stores and compares its references. The old approach kept around 18GB of PNG golden images in the test repo and ran a pixel-by-pixel diff on every comparison. The new approach stores around 19KB of MD5 hashes in a JSON file and compares hash strings. Storage dropped by roughly three orders of magnitude. Comparisons became effectively free. A category of flaky tests stopped being flaky. This article is about how that works, when it makes sense, and when it doesn't. It also covers the parts that surprised me, because the approach has real downsides and I want to be honest about them up front. How It Works The idea is simple once the constraints are right. On the embedded devices we test, we have access to the raw GPU frame buffer through the graphics stack. The test harness reads it as a bytes object, computes an MD5 hash of those bytes, and compares the hash against a stored reference. If the hashes match, the test passes. If they don't match, the test captures the actual frame and saves it as a failure artifact for a human to look at. The stored reference is a 32-character hex string per screen, kept in a JSON file checked into the test repo alongside the test code. The full implementation is short: Python import hashlib import json from pathlib import Path REFERENCE_FILE = Path("references/visual_hashes.json") def frame_hash(frame_bytes: bytes) -> str: """MD5 of the raw GPU frame buffer.""" return hashlib.md5(frame_bytes).hexdigest() def load_references() -> dict: if REFERENCE_FILE.exists(): return json.loads(REFERENCE_FILE.read_text()) return {} def check_frame(test_id: str, frame_bytes: bytes, references: dict) -> tuple[bool, str]: """Returns (passed, actual_hash).""" actual = frame_hash(frame_bytes) expected = references.get(test_id) if expected is None: return False, actual # no reference yet return actual == expected, actual def on_failure(test_id: str, frame_bytes: bytes, actual: str): """Only called when hashes diverge. Save the frame for review.""" artifact_dir = Path(f"artifacts/{test_id}") artifact_dir.mkdir(parents=True, exist_ok=True) (artifact_dir / f"{actual}.raw").write_bytes(frame_bytes) That's essentially the whole system. Because the references are text, intentional UI changes show up as normal source-control diffs in code review instead of opaque binary blob swaps. Because the comparison is string equality on a hex digest, it's effectively instant regardless of frame size. Why MD5 Specifically MD5 is cryptographically broken. You can construct collisions on demand, and using it for password storage or signature verification is malpractice. None of that matters here. Visual regression testing is not a cryptographic problem. The two inputs being compared are the rendered output of our own GPU yesterday and the rendered output of our own GPU today. There is no adversary trying to construct a frame buffer that hashes to a specific value. What you actually need from a hash function in this context is fast computation, low accidental collision rate on real-world inputs, and stable output across runs and platforms. MD5 covers all three. The accidental collision probability between two different rendered frames at typical buffer sizes is small enough that we have not encountered one. SHA-256 covers the same three properties at slightly higher CPU cost. If the cryptographic concern is going to come up in code review every quarter, just use SHA-256. The Conditions That Have to Hold This approach only works when three things are true about your environment. The first is access to the raw frame buffer before any encoding step. Browser-based testing, mobile UI testing through the standard automation frameworks, and most desktop application testing give you a captured screenshot, which has been through some encoding step before you see it. PNG encoders can vary across versions, and two systems can render the same pixels and produce different PNG files. If your only access point is a captured screenshot, you are comparing post-encoding output, and encoder noise will sink hashing. On embedded devices with a graphics stack you control, you usually do have raw frame buffer access, which is why this worked for us. The second condition is that the rendering pipeline has to be deterministic. Same input, same GPU state, same output bytes. If antialiasing produces different pixels for the same logical input from one run to the next, or if time-based animations get sampled at slightly different moments, or if the GPU driver rounds inconsistently, the hashes will diverge for reasons that aren't real bugs. In our case, the pipeline is deterministic, so this isn't a problem. In a lot of environments, it isn't, and you would need pixel-diff with a tolerance threshold or perceptual hashing to handle the noise. The third condition is that capture points have to be stable. The test harness has to call the capture function at the same logical point in the pipeline every run, after the same set of operations. This is usually the easiest of the three to engineer. Frame buffer access either exists or it doesn't, and determinism is sometimes a property you can't change. Capture point stability is just a discipline about where you instrument your tests. If any of these three conditions fail, frame buffer hashing is the wrong tool. Pixel-diff with a tolerance threshold is the right default for most setups, and perceptual hashing covers the middle ground where you have raw access but some non-determinism. The narrow case this article is about is the one where all three hold. What You Give Up The biggest tradeoff is failure diagnosis. With golden images, when a test fails, you have a stored reference and a new screenshot, and you can render a side-by-side diff or an overlay highlighting the changed pixels. With hash comparison, you have two strings that don't match. The failure handler captures the actual frame on the spot, but the reference image (which doesn't exist anymore in storage) has to be reconstructed by running the same test against a known-good build whenever you want to do a side-by-side comparison. That extra step is annoying when failures are common. In our case, they aren't, so the cost is manageable. If your suite has a high baseline failure rate, the math changes, and you may want to keep both the hashes and the reference images, using the hash for fast pass/fail detection and the image only for diagnosis. The other thing you give up is fuzzy matching, but that's the same point as the determinism condition. Fuzzy matching exists to compensate for non-determinism in the rendering pipeline. If your pipeline is deterministic, you don't need it. If it isn't, you do, and hashing won't work. What It Changed for Us Storage going from 18 GB to 19 KB is the change people notice first, but the second-order effects matter more in day-to-day work. Repository operations got faster because the test repo no longer carries gigabytes of binary history. Cloning a fresh checkout takes a fraction of the time it used to. PR reviews got cleaner because UI changes show up as readable JSON diffs instead of opaque PNG swaps. The flaky-test rate from encoder noise dropped to zero, which was the change that got the most attention from people on the team. Some of the old goldens had been re-saved at some point with slightly different encoder settings, and tests would fail mysteriously even though the rendered pixels were identical to the human eye. The only fix had been to regenerate the golden, which nobody really trusted. Removing the encoder from the comparison loop removed the entire class of failure. CI runs got faster, too, because hash comparison is essentially free compared to image diffing. None of these wins is novel; Skia, PDFium, and the apitrace project have used hash-based comparison of rendered output for years. What was new for us was committing to it as the primary mechanism for an entire UI test suite on embedded hardware, and accepting the implication that the stored reference is text rather than a binary asset. If you're working in an environment where the three conditions hold, the implementation is small enough that a prototype takes a day. If even one of them is missing, this isn't the right tool, and the alternatives are well understood. The interesting part is recognizing which environment you're actually in.

By Rajasekhar sunkara
Amazon Quick: AWS's Agentic Workspace, Explained for Engineers
Amazon Quick: AWS's Agentic Workspace, Explained for Engineers

AWS has been building agentic infrastructure for some time now — Bedrock, AgentCore, Strands — mostly aimed at engineers who want to build their own agent systems from scratch. Amazon Quick is a different layer of the same bet: a ready-to-use agentic workspace that targets teams directly, without requiring custom orchestration code. This article walks through what Quick is, how its components fit together technically, how the MCP integration model works with real code, and where it sits relative to the rest of AWS's agent stack. What Amazon Quick Is Amazon Quick is an AI assistant for work that connects to your existing tools — Slack, Microsoft Teams, Outlook, CRMs, databases, and local files — and gives a unified layer for querying, automating, and acting across them. It launched in preview at AWS's "What's Next with AWS" event on April 28, 2026. The product is aimed at teams, not just individual users. One person can build a custom agent scoped to a specific dataset or workflow, and the whole team benefits from it. Responses from Quick agents are grounded in your actual business data, not the underlying model's training distribution. Under the hood, Quick is built on Amazon Bedrock AgentCore and uses the Model Context Protocol (MCP) as its standard for connecting to external tools. It runs on AWS IAM and VPC, which means it inherits the same security and compliance posture as the rest of your AWS workloads. Components Quick bundles five distinct capabilities. It helps to understand each one separately before thinking about how they compose. ComponentWhat it doesSpacesCollaborative workspaces where teams pool files, dashboards, and data sources. Agents in a Space are grounded in that Space's data.AgentsCustom, domain-scoped agents built on your team's specific data. One person builds, everyone uses.ResearchMulti-source synthesis across internal data, the public web, and third-party datasets. Produces structured reports.Visualize (Quick Sight)Integrated BI layer. Conversational access to dashboards, charts, and forecasting — no separate BI tool required.Automate (Quick Flows)Workflow automation from simple daily tasks to complex multi-step processes with cross-app action execution. Each component is available through the web app, mobile, and a native desktop app (currently in preview for macOS and Windows) that can read local files and calendar context without requiring browser access. Where Quick Sits in the AWS Agent Stack AWS is building in two directions at once. AgentCore is the infrastructure layer for engineers who want to compose their own agent systems — runtime, memory, gateway, observability — with any model and any framework. Quick is the product layer on top: opinionated, team-facing, and deployable without writing orchestration code. The practical implication: if you're an engineer building internal tools or automation pipelines, you'll likely interact with both layers. AgentCore for the infrastructure wiring; Quick as a surface where non-technical teammates interact with the agents you build. The Integration Architecture The core question for any engineer evaluating Quick is: how does it actually connect to external systems, and what does the request path look like? Quick uses MCP (Model Context Protocol) as its primary integration standard. This is significant because MCP is an open protocol — it means Quick agents are not locked into AWS-specific connectors, and any MCP-compatible server can be registered as a tool source. High-Level Request Flow The sequence below shows the full lifecycle of a single agent-triggered tool call — from the moment Quick receives a prompt through to the response returning from a downstream API. Quick acts as the MCP client. Your MCP server exposes tools via listTools and callTool. Quick discovers them at registration time and makes them available to any agent or automation in the workspace. Authentication flows through OAuth 2.0, with support for Dynamic Client Registration (DCR) so Quick can register itself automatically without manual credential setup. Building an MCP Server for Quick Here is a minimal Python MCP server using the mcp SDK that exposes two tools Quick can invoke — get_ticket and list_open_tickets. This pattern works whether you host the server yourself or run it on AgentCore Runtime. Install Dependencies Python pip install mcp[server] httpx uvicorn Server Implementation Python # server.py from mcp.server import Server from mcp.server.sse import SseServerTransport from mcp.types import Tool, TextContent import httpx import json from starlette.applications import Starlette from starlette.routing import Route app = Server("jira-quick-integration") JIRA_BASE_URL = "https://yourorg.atlassian.net" JIRA_TOKEN = "Bearer <your-token>" # in production, load from AWS Secrets Manager @app.list_tools() async def list_tools() -> list[Tool]: return [ Tool( name="get_ticket", description="Retrieve details for a single Jira ticket by issue key.", inputSchema={ "type": "object", "properties": { "issue_key": { "type": "string", "description": "The Jira issue key, e.g. ENG-1234" } }, "required": ["issue_key"] } ), Tool( name="list_open_tickets", description="List open Jira tickets assigned to a given user.", inputSchema={ "type": "object", "properties": { "assignee": { "type": "string", "description": "The Jira username or email of the assignee" } }, "required": ["assignee"] } ) ] @app.call_tool() async def call_tool(name: str, arguments: dict) -> list[TextContent]: headers = {"Authorization": JIRA_TOKEN, "Content-Type": "application/json"} async with httpx.AsyncClient() as client: if name == "get_ticket": key = arguments["issue_key"] resp = await client.get( f"{JIRA_BASE_URL}/rest/api/3/issue/{key}", headers=headers ) resp.raise_for_status() data = resp.json() summary = data["fields"]["summary"] status = data["fields"]["status"]["name"] return [TextContent(type="text", text=f"{key}: {summary} [{status}]")] elif name == "list_open_tickets": assignee = arguments["assignee"] jql = f"assignee={assignee} AND status != Done ORDER BY updated DESC" resp = await client.get( f"{JIRA_BASE_URL}/rest/api/3/search", headers=headers, params={"jql": jql, "maxResults": 20} ) resp.raise_for_status() issues = resp.json().get("issues", []) results = [ f"{i['key']}: {i['fields']['summary']}" for i in issues ] return [TextContent(type="text", text="\n".join(results) or "No open tickets found.")] raise ValueError(f"Unknown tool: {name}") # Wire up SSE transport for Quick compatibility sse = SseServerTransport("/messages/") async def handle_sse(request): async with sse.connect_sse( request.scope, request.receive, request._send ) as streams: await app.run(streams[0], streams[1], app.create_initialization_options()) starlette_app = Starlette( routes=[Route("/sse", endpoint=handle_sse)] ) if __name__ == "__main__": import uvicorn uvicorn.run(starlette_app, host="0.0.0.0", port=8080) A few design constraints to be aware of when building for Quick: Each MCP tool call has a 300-second hard timeout. Operations that exceed this fail with HTTP 424. Keep individual tool calls narrow and fast.The tool list is treated as static after registration. If you add or remove tools on the server, the Quick admin must re-establish the connection to pick up changes.Quick supports both Server-Sent Events (SSE) and streamable HTTP as transports. Streamable HTTP is preferred for new implementations. Registering the MCP Server in Quick Once your server is running and publicly reachable over HTTPS, registration in Quick takes the following path: Shell Quick Console → Integrations → Add Integration → MCP Fields: Server URL: https://your-mcp-server.example.com/sse Auth type: OAuth 2.0 (or Service, or None) Client ID: <from your identity provider> Authorization URL: https://auth.example.com/oauth/authorize Token URL: https://auth.example.com/oauth/token If your identity provider supports OAuth Dynamic Client Registration, Quick will auto-register and you skip the manual client ID step entirely. Quick sends an initial unauthenticated request to the MCP server; if it receives a 401 with a WWW-Authenticate header containing a resource_metadata URL, it fetches the metadata document and proceeds with DCR automatically. Once registered, Quick calls listTools at startup and exposes every discovered tool to agents and automations in the workspace. The AgentCore Gateway Option For teams that don't want to write and operate an MCP server from scratch, Amazon Bedrock AgentCore Gateway provides a managed alternative. You point Gateway at a Lambda function or an OpenAPI spec, and it handles the MCP wrapping, auth, logging, and semantic tool discovery automatically. If you use it, Quick never calls your internal APIs directly — everything flows through Gateway's auth and routing layer, as shown in the sequence diagram above. The semantic search capability is worth noting specifically. When an agent has access to dozens or hundreds of tools, passing the full tool list on every turn wastes context and causes the model to pick the wrong tool. Gateway's built-in x_amz_bedrock_agentcore_search tool lets Quick find the right tool by semantic similarity rather than scanning the entire registry each turn. Practical Considerations A few things worth keeping in mind before integrating: Tool scope matters. When agents are given too many tools simultaneously, selection accuracy degrades — the model reasons over too many options per turn and picks incorrectly more often. Keeping each agent or MCP server to a focused set of 3–5 tools produces better results than exposing everything through one endpoint. This is a known pattern in multi-agent architectures and applies equally to Quick agents. The 300-second timeout is real. Design each tool call to complete a single, bounded operation. Avoid chaining multiple downstream API calls inside a single tool invocation. If you need a multi-step workflow, model it as separate tools and let the agent orchestrate the sequence. Local context on the desktop app. The desktop app reads local files and calendar events directly, without upload. For engineers who work primarily in terminals and local editors, this is a meaningful integration point — meeting context, local documentation, and recent file changes are all available to the assistant without any configuration. MCP interoperability. Because Quick uses MCP as the standard, the same MCP server you build for Quick can also be consumed by Claude Code, Amazon Q Developer, and other MCP-compatible clients. The integration contract is portable. References Amazon Quick — Product overview and featuresIntegrate external tools with Amazon Quick Agents using MCP (AWS ML Blog, Feb 2026)MCP integration — Amazon Quick User GuideAmazon Bedrock AgentCore — Overview and documentationIntroducing Amazon Bedrock AgentCore Gateway (AWS ML Blog)Top announcements of the What's Next with AWS, 2026 (AWS News Blog, Apr 2026)

By Jubin Abhishek Soni DZone Core CORE
How to Interpret the Number of Spring ApplicationContexts in Integration Tests
How to Interpret the Number of Spring ApplicationContexts in Integration Tests

When optimizing Spring Boot integration tests, developers often focus on obvious metrics: total build time, test execution time, CPU usage, memory consumption, or the number of failed tests. These metrics are useful, but they do not always explain why an integration test suite is slow. One of the most important hidden metrics in Spring Boot integration testing is the number of distinct ApplicationContext instances created during the test run, check out my other article. Spring’s TestContext framework can cache and reuse ApplicationContext between test classes, but only if the effective test configuration is the same. If the configuration differs, Spring has to create another context. In large enterprise applications, this can become expensive very quickly. How can the number of contexts correctly interpreted?If a test suite creates two contexts, is that good?If it creates six contexts, is that acceptable?If it creates twenty contexts, is that already a design smell?And most importantly: where should such a judgment come from? Spring itself does not define a universal threshold for a “good” or “bad” number of cached ApplicationContext instances. However, the official documentation explicitly points out that a large number of loaded contexts can make a test suite unnecessarily slow. This means the number of contexts is not just an implementation detail. It is a relevant diagnostic signal. This article explains how I derived a practical interpretation table for a real-world Spring Boot integration test suite and why such a table should be understood as a case-study heuristic, not as a universal Spring Framework rule. Test Grouping Is a Valid Concept General testing research supports that tests can be grouped by similarity, cost, coverage, or runtime behavior. This is highly relevant for Spring Boot integration tests. In Spring Boot integration testing, MergedContextConfiguration may be interpreted as one practical grouping dimension: tests with the same effective Spring configuration belong to the same context group. In this case, similarity means shared Spring test configuration. That does not mean all tests should use the same context. It means that tests should not accidentally create different contexts when they are actually testing under the same architectural conditions. Spring’s Context Cache as a Framework-Specific Grouping Mechanism Spring Boot integration tests are not plain unit tests. They often require infrastructure such as dependency injection, database configuration, security configuration, web layer configuration, mock infrastructure, external API clients, messaging components, or tenant-specific setup. Spring’s TestContext framework handles this through the ApplicationContext. The framework can reuse a context if the effective configuration is the same. The cache key is based on configuration parameters such as configuration classes, active profiles, property sources, context customizers, initializers, and other test context settings. Spring’s documentation describes this context caching mechanism and explains that contexts can be reused when the same unique context configuration is encountered again. Let me explain. Two tests may look similar to a developer but still produce different contexts if they use different profiles, properties, mocks, or imported configuration classes. They should normally produce separate context groups. For example, a database-focused test and a test involving an external OData destination may have different infrastructure requirements. In that case, a separate context is not a problem. It reflects a real test configuration group. When every test class introduces a slightly different property, mock, or configuration import without a strong technical reason. Then the number of contexts grows not because the architecture requires it, but because the test suite has configuration drift. Why Multiple Contexts Can Be Legitimate in Enterprise Applications Spring Boot itself supports different testing styles. The documentation describes @SpringBootTest for loading the application context through SpringApplication, and it also provides more focused test annotations for specific slices of an application. Spring Boot’s test slices include annotations such as @WebMvcTest, @DataJpaTest, @JsonTest, and others. These annotations intentionally load only selected parts of the application and import different auto-configurations depending on the target slice. Besides the Spring documentation, many community blogs report that different enterprise systems may have separate integration test groups, such as database-focused tests, web/controller tests, security-related tests, and so on. So, the goal should be to minimize unnecessary context fragmentation while preserving justified test configuration groups, instead of forcing the entire integration test suite into one ApplicationContext. From Test Grouping to a Context-Count Heuristic Based on this reasoning, I used the following interpretation in a case study: 1-3 application contexts show excellent context reuse,4-8 are acceptable if justified,10+ should be investigated, and a signal of a fragmented test configuration. Let's discuss the numbers. 1-3: The most integration tests share the same effective configuration. For example: Plain Text Context 1: default integration test context Context 2: database-specific context Context 3: external-system-specific context Such a structure is usually easy to understand. It suggests that the team has standardized its test profiles, properties, and infrastructure setup. 4-8: This is consistent with broader software-testing research, where test suites are not treated as one homogeneous block. They are often optimized, selected, prioritized, or clustered according to meaningful technical criteria such as coverage, execution cost, change relevance, or runtime behavior. For example: Plain Text Context 1: default SpringBootTest context Context 2: database-heavy context Context 3: external API integration context Context 4: security-specific context Context 5: multi-tenant context Context 6: messaging context Context 7: no-external-destination context Context 8: migration-specific context 10+: Once the number of contexts reaches double digits, investigation becomes worthwhile. This does not automatically mean the test suite is badly designed. Community articles on Spring test optimization show that a very large enterprise platform with many modules, tenant variants, data stores, messaging systems, and external integrations may legitimately require more contexts. So, the number 10+ is not firm, but suggests that the risk of accidental fragmentation becomes higher. Conclusion Test grouping is a recognized concept in software-testing research. Large test suites are often optimized through minimization, selection, prioritization, and clustering. These techniques are based on the idea that tests have different costs, purposes, coverage, runtime behavior, and relevance. For Spring Boot integration tests, context reuse is a framework-specific grouping criterion. (Use the method of test grouping to create Spring application contexts) Tests with the same effective MergedContextConfiguration belong to the same context group and can share the same cached ApplicationContext. Tests with genuinely different infrastructure needs may require different contexts. Therefore, the goal is not to reduce every enterprise test suite to a single context. The goal is to distinguish between justified test configuration groups and accidental configuration fragmentation. The shown numbers are a practical case-study heuristic, and not universal. But the underlying principle is robust: A small number of well-defined context groups is healthy, but a growing number of slightly different contexts is a performance smell. That principle connects Spring’s TestContext cache mechanism with a broader idea from software-testing research: large test suites should be structured intentionally, not allowed to fragment accidentally.

By Constantin Kwiatkowski
Why Your Test Automation Is Always Behind the Code And the Architecture That Fixes It
Why Your Test Automation Is Always Behind the Code And the Architecture That Fixes It

There is a pattern that repeats itself across engineering organizations regardless of team size, tech stack, or industry. A sprint ends. Features are shipped. The QA team is still writing automation for the previous sprint. The backlog of unautomated scenarios grows. Leadership asks what it would take to close the gap. The answer comes back: more engineers, more time, more tooling budget. Six months later, the gap is the same size. Sometimes larger. This is not a resource problem. It is an architectural problem. And until the architecture changes, the gap does not close. The Upstream Problem Nobody Measures When engineering teams analyze their automation coverage gaps, they almost always focus on execution test runs that are slow, maintenance is high, and flaky tests waste time. These are real problems. But they are downstream of a more fundamental issue that rarely gets measured: the time between a requirement being written and automation existing for it. In a traditional QA workflow, that gap looks like this: Requirement lands in JiraDeveloper builds the featureQA engineer reads the requirement, interprets it, designs test scenariosQA engineer writes test casesQA engineer scripts automation in Playwright or SeleniumQA engineer executes, debugs, maintains Steps 3 through 5 take days. Sometimes weeks. Every sprint adds to the backlog. Every requirement change breaks existing automation. The team runs hard and stays in the same place. The industry has responded to this by automating step 6, making execution faster, smarter, and more parallelized. But steps 3 through 5, requirement interpretation, test design, and scripting, remain almost entirely manual in most organizations. This is the upstream problem. And it is where the real automation opportunity sits in 2026. What Changes When You Start From Requirements The architecture shift that actually closes the coverage gap starts much earlier in the pipeline than most automation teams consider. Instead of "requirement arrives → developer builds → QA manually creates coverage," the new model is "requirement arrives → AI evaluates and enhances → AI generates test cases → AI generates scripts → AI executes → results with traceability returned." The human does not design coverage. The human does not script automation. The human reviews requirements, approves test cases when necessary, and focuses on exploratory testing and quality strategy, the work that actually requires human judgment. This is what requirement-driven autonomous testing means in practice. The requirement is the input. The executed test result is the output. AI owns everything in between. The 5 Stages of a Requirement-to-Result Pipeline Platforms like TestMax implement this model as a connected five-stage pipeline. Understanding each stage explains why the architecture works differently from traditional automation approaches. Stage 1: Requirement Ingestion The pipeline accepts requirements from wherever they live, Jira tickets, Azure DevOps work items, Word documents, PDFs, Excel files, or requirements authored directly in the platform. No reformatting required. The requirement enters the system as it exists. This matters because one of the friction points in traditional QA automation is the translation step, converting a Jira ticket into a format that test tooling can work with. When ingestion is native, that step disappears. Stage 2: Requirement Intelligence Before any test generation begins, every requirement is evaluated by AI across five quality dimensions: clarity, completeness, consistency, testability, and correctness. This stage is the most underestimated in the entire pipeline. Poor requirements produce poor tests always. A requirement that says "the login form should work correctly" is not testable. A requirement that specifies valid credentials, invalid passwords, empty field behavior, account lockout thresholds, and session persistence rules is. When AI catches ambiguity at the requirement stage, it costs nothing to fix. When that same ambiguity surfaces after automation has been built against it, it costs days. The requirement of the intelligence layer moves the defect detection upstream to where it is cheapest. Requirements that fail quality review are flagged with specific improvement suggestions. AI offers rewrites. Nothing ambiguous proceeds to test generation. Stage 3: AI Test Case Generation Once a requirement passes quality review, the platform generates structured test cases automatically. Not surface-level happy path scenarios, complete coverage across positive paths, negative paths, boundary conditions, and edge cases. For a single requirement, like users can reset their password via email verification, the generated coverage includes: Valid email address submitted – verification email receivedInvalid email format – appropriate error returnedEmail address not registered – system response without revealing account existenceVerification link clicked – password reset flow initiatedVerification link expired – appropriate error with re-send optionNew password does not meet policy requirements specific validation messagesSuccessful reset – session handling, redirect behaviour All of this is generated automatically from the requirement. No human designs the coverage strategy. Stage 4: Automation Generation Approved test cases are converted into executable Playwright scripts automatically. Production-ready code with appropriate waits, assertions, and selector strategies generated without a human writing a single line. This is the step that eliminates the scripting bottleneck. In traditional automation, scripting bandwidth is a hard ceiling on coverage growth. When the team can script 50 test cases per sprint, coverage grows at that rate regardless of how many requirements are produced. When scripts are generated automatically from approved test cases, that ceiling disappears. Coverage can grow at the rate requirements are produced, not the rate engineers can write code. Stage 5: Autonomous Execution and Evidence AI agents execute the generated test suite through Playwright MCP. They manage environment setup, handle retries, capture logs, screenshots, and video per test, and return a complete traceability matrix linking every result to its source requirement. The output is not a pass/fail count. It is a complete evidence package suitable for audit, governance, and release decision-making generated automatically from the requirements the team was already writing. Why This Architecture Closes the Coverage Gap The traditional automation model has a linear constraint: coverage grows proportionally to engineering effort. More requirements always mean more backlog because the human work required per requirement is roughly constant. The requirement-driven autonomous model removes the linear constraint. When AI handles test design, scripting, and execution per requirement, the engineering effort per requirement drops dramatically. Coverage can scale with the requirements themselves rather than with team headcount. There are three concrete consequences: Coverage lag is eliminated. When test generation takes minutes rather than days, new features can have automation in the same sprint they are built. The perpetual state of automation backlog, where coverage is always weeks behind the code it is supposed to validate, is a consequence of the manual model, not an inevitability. Maintenance burden shifts. In traditional automation, 60 to 80 percent of automation engineering effort goes to maintaining existing scripts. When AI generates scripts from requirements, the maintenance responsibility belongs to the generation layer. UI changes that would previously break dozens of handwritten selectors are addressed at the generation stage. Requirement quality improves as a side effect. When every requirement must pass an AI quality evaluation before entering the test pipeline, the incentive to write precise, testable requirements increases. Teams that implement requirement-driven testing typically report improvement in requirement quality within two to three sprints, not because they trained their product managers differently, but because the pipeline now provides immediate, specific feedback on every requirement. Integrating With Existing Workflows A practical concern with any architectural change is migration cost. The requirement-driven autonomous model does not require replacing existing infrastructure. Generated Playwright scripts integrate directly into existing CI/CD pipelines. Teams running Jira or Azure DevOps connect those systems natively requirements flow in without manual re-entry. For teams using ATF or other existing test frameworks, the autonomous testing layer runs alongside rather than replacing what already exists. The practical starting point is a single sprint. Take the new requirements entering your backlog this week. Run them through a requirement-driven platform. Compare the test coverage produced in time, in scenario depth, in maintenance overhead against what your team would have produced manually. The experiment answers the adoption question more convincingly than any benchmark. The Architectural Question for 2026 The relevant question for QA teams in 2026 is not whether to use AI in testing. Almost every serious testing platform has added AI capabilities in some form. The question is: where in the pipeline is AI actually doing meaningful work? At one end of the spectrum, AI heals broken selectors and suggests which tests to run. The human still reads requirements, designs coverage, writes scripts, and manages execution. AI makes individual tasks faster. At the other end, AI owns the pipeline from requirement evaluation through execution and evidence delivery. The human provides requirements and reviews results. AI does everything in between. The teams that figure out where they sit on that spectrum and decide consciously which model their coverage goals require are the ones that will stop having the same conversation about automation backlogs next quarter.

By Waqar Hashmi
Good Data, Bad Metric: A Mutation Testing Pattern for Analytics Engineering
Good Data, Bad Metric: A Mutation Testing Pattern for Analytics Engineering

A dashboard can look completely correct, while the reporting it shows is wrong, and that makes it one of the most difficult failures to detect in analytics engineering because nothing visibly breaks. The pipeline runs on time, the warehouse table loads without errors, the scheduled checks pass, and the dashboard opens as expected, but the metric on the screen can still be wrong enough to trigger a long investigation. In many cases, the data itself is not the problem, because the issue sits inside the metric logic, where a filter may have been removed, a join may have changed the grain, a date field may have shifted from order_date to created_at, or a refund rule may have been missed. This is the testing gap many analytics teams still carry. We test tables, schemas, uniqueness, relationships, accepted values, row counts, and source availability, and those checks matter, but a business metric is more than a table. It is a calculation wrapped in assumptions, and when those assumptions change quietly, the pipeline can stay green while the number becomes misleading. Good Data Does Not Guarantee a Good Metric Take a simple monthly revenue metric. SQL SELECT date_trunc('month', order_date) AS revenue_month, sum(order_amount) AS gross_revenue FROM orders WHERE order_status = 'completed' GROUP BY 1; This query looks safe because it is short, readable, and common, but it depends on several assumptions that are easy to overlook during normal development. Metric componentHidden assumptionorder_dateRevenue belongs to the business event datesum(order_amount)Revenue is measured as money, not order countorder_status = 'completed'Pending, cancelled, and failed orders should not countMonthly groupingReporting uses calendar month boundariesSource grainOne row in orders represents one orderNo additional joinThe calculation is not multiplied by another table A standard test suite might check that order_id is unique, order_amount is not null, order_date exists, and the source table arrived within the expected load window, but those checks do not prove the revenue metric still means what the team agreed it should mean. Now change the date field. SQL SELECT date_trunc('month', created_at) AS revenue_month, sum(order_amount) AS gross_revenue FROM orders WHERE order_status = 'completed' GROUP BY 1; The query still runs, the output still contains a month and a number, the dashboard still refreshes, and the schema still matches expectations, but the metric has changed. It now reports revenue by record creation date instead of order date, and while that difference may be small in some domains, it can distort reporting in systems where orders are delayed, imported, amended, or backfilled. Table tests can confirm that the ingredients exist, but they cannot always confirm that the recipe is still correct. What Is Metric Mutation Testing? Mutation testing is a known software testing technique where code is deliberately changed, and the test suite is expected to catch the change. If the modified version survives, the test suite may be too weak. Metric mutation testing applies the same idea to analytics engineering, but instead of mutating application code, we create deliberately wrong versions of business metrics and then run our checks to see whether those wrong versions fail. The question becomes: Would our test suite catch this believable but incorrect metric? A metric mutation should not be random damage, because the useful mutations are the realistic ones that engineers, analysts, or modeling layers could introduce during normal development. MutationWhat changesWhy it mattersRemove a business filterIncludes cancelled, pending, or failed recordsThe number increases but still looks plausibleSwap the date fieldUses created_at instead of order_dateReporting shifts between periodsAdd a one-to-many joinMultiplies rows before aggregationRevenue or counts become inflatedRemove distinctCounts duplicate users or ordersEngagement metrics become overstatedChange a time windowIncludes incomplete or future periodsTrend analysis becomes unreliableAlter null handlingConverts missing values to zeroUnknown data becomes treated as real behaviour The purpose is to test the strength of the analytics testing layer, because if a wrong metric survives, the team has found a blind spot before users find it. Example: Mutating a Revenue Metric Start with the intended version. SQL with revenue as (select date_trunc('month', order_date) as revenue_month, sum(order_amount) as gross_revenue from orders where order_status = 'completed' group by 1) select * from revenue; Now, create a mutation by removing the status filter. SQL with revenue as (select date_trunc('month', order_date) as revenue_month, sum(order_amount) as gross_revenue from orders group by 1) select * from revenue; This version includes all order statuses, and if canceled or failed orders still have an amount, the metric increases. Even though the query does not fail, the model still builds, and the dashboard still works. A metric behavior test should detect the issue. SQL with expected as (select date_trunc('month', order_date) as revenue_month, sum(order_amount) as expected_revenue from orders where order_status = 'completed' group by 1), reported as (select revenue_month, gross_revenue from metric_revenue_monthly) select r.revenue_month, r.gross_revenue, e.expected_revenue, abs(r.gross_revenue - e.expected_revenue) as difference from reported r join expected e on r.revenue_month = e.revenue_month where abs(r.gross_revenue - e.expected_revenue) > 0.01; This test is not asking whether the table is loaded or whether a column exists, because it is checking whether the reported number still matches the intended business definition. Now consider a grain mutation. SQL SELECT date_trunc('month', o.order_date) AS revenue_month, sum(o.order_amount) AS gross_revenue FROM orders o JOIN order_items i ON o.order_id = i.order_id WHERE o.order_status = 'completed' GROUP BY 1; This query can multiply order values when one order has multiple items, and the result may still look reasonable, especially if the increase is not extreme. A grain preservation test can expose this. SQL WITH metric_base AS ( SELECT o.order_id, o.order_amount FROM orders o JOIN order_items i ON o.order_id = i.order_id WHERE o.order_status = 'completed' ) SELECT order_id, count(*) AS rows_after_join FROM metric_base GROUP BY order_id HAVING count(*) > 1; If this returns rows, the metric base no longer has one row per order, and while that may be intentional in some models, it should not happen accidentally. Metric Mutation Matrix A practical way to start is to build a mutation matrix for each important metric, so the team can connect realistic failure modes with the tests that should detect them. Metric areaMutation to introduceTest that should failFilter logicRemove completed status conditionReconciliation against completed-order revenueEvent timeReplace order_date with created_atPeriod boundary comparisonGrainJoin order-level data to item-level rowsGrain preservation testAggregationReplace sum() with count()Expected range or reconciliation checkDistinct logicRemove distinct from user countDuplicate sensitivity testExclusionsInclude test or internal accountsControl-record exclusion testBoundaryInclude current incomplete monthClosed-period validationNull handlingConvert missing values to zeroNull behaviour check This matrix gives the testing strategy structure, because instead of adding random checks, each test is tied to a known failure mode. For example, an active user metric has a different risk profile. SQL SELECT date_trunc('week', event_time) AS activity_week, count(distinct user_id) AS weekly_active_users FROM product_events WHERE event_name IN ('login', 'purchase', 'create_project') AND is_internal_user = false GROUP BY 1; Potential mutations include changing count(distinct user_id) to count(user_id), removing the internal-user exclusion, replacing event_time with loaded_at, or expanding the event filter to include every event type. A simple upper-bound test could catch some bad variants. SQL SELECT activity_week, weekly_active_users FROM metric_weekly_active_users WHERE weekly_active_users > ( SELECT count(distinct user_id) FROM users WHERE is_internal_user = false ); This test will not catch every possible mistake, but that is fine, because metric mutation testing is not about one perfect check. It is about making hidden failure modes visible enough that the team can improve the test layer deliberately. Measuring Mutation Detection Rate The strongest part of this pattern is that it creates a measurable signal. Instead of reporting how many tests exist, teams can report how many realistic wrong versions those tests catch. Mutation Detection Rate = Mutations caught by tests / Total mutations introduced A report might look like this. StageMutations introducedMutations caughtDetection rateExisting table tests only20840%Added reconciliation checks201470%Added grain and boundary tests201890%Added metric behaviour tests201995% This is more useful than saying the project has 80 tests, because a large test suite can still miss the one logic change that matters. Mutation detection rate focuses on whether the tests catch realistic metric defects. The survived mutations are especially useful because they show exactly where the metric remains under-protected. Survived mutationWhat it revealscreated_at used instead of order_dateEvent-time logic is not protectedRefunded orders includedExclusion rules are not testeddistinct removed from user countDuplicate sensitivity is weakCurrent incomplete month includedTime boundary checks are missing Each survived mutation becomes a new test requirement, which turns the exercise into a practical feedback loop rather than a testing vanity metric. A Lightweight Implementation Pattern This pattern does not need a full platform at the start, because a small implementation can use structured metric definitions, a mutation catalog, temporary models, and CI checks. A metric definition might look like this. YAML metric: gross_revenue model: metric_revenue_monthly grain: month source: orders event_date: order_date aggregation: sum(order_amount) filters: - order_status = 'completed' exclusions: - test orders - refunded orders expected_behaviour: - must reconcile to completed-order total - must not include future periods - must preserve order grain before aggregation A mutation catalog can describe the failure modes. YAML mutations: - name: remove_completed_filter type: filter expected_result: fail_reconciliation - name: use_created_at_instead_of_order_date type: event_time expected_result: fail_period_boundary_check - name: duplicate_orders_with_item_join type: grain expected_result: fail_grain_check - name: include_refunded_orders type: exclusion expected_result: fail_control_record_check This can run outside production, while mutated models can be created in a temporary schema, tested, reported, and then discarded. Running Metric Mutation Tests in CI For a dbt-style workflow, the CI process could look like this. StepAction1Build the normal metric model2Run standard dbt tests3Generate mutated metric SQL into a temporary schema4Run metric behaviour tests against each mutated version5Expect each mutated version to fail at least one relevant test6Record caught and survived mutations7Fail or warn the build depending on policy In early adoption, it may be better to warn rather than block, while critical metrics can move to stricter enforcement once the team understands the pattern and has tuned the mutation catalog. Tiny Python Mutation Runner A basic mutation generator can be small. This example mutates SQL strings directly, and although a production version would need safer parsing, templating, and warehouse execution, it shows the core idea. Python from dataclasses import dataclass from typing import Callable @dataclass class Mutation: name: str description: str apply: Callable[[str], str] def remove_completed_filter(sql: str) -> str: return sql.replace("where order_status = 'completed'", "") def use_created_at(sql: str) -> str: return sql.replace("order_date", "created_at") def change_sum_to_count(sql: str) -> str: return sql.replace("sum(order_amount)", "count(order_amount)") base_sql = """ select date_trunc('month', order_date) as revenue_month, sum(order_amount) as gross_revenue from orders where order_status = 'completed' group by 1 """ mutations = [ Mutation( name="remove_completed_filter", description="Includes non-completed orders", apply=remove_completed_filter, ), Mutation( name="use_created_at", description="Uses record creation date instead of order date", apply=use_created_at, ), Mutation( name="change_sum_to_count", description="Counts orders instead of summing revenue", apply=change_sum_to_count, ), ] for mutation in mutations: print(f"\n-- mutation: {mutation.name}") print(f"-- reason: {mutation.description}") print(mutation.apply(base_sql)) A simple report could look like this. Plain Text Metric: gross_revenue remove_completed_filter caught use_created_at survived change_sum_to_count caught duplicate_order_join caught include_refunded_orders survived Detection rate: 3/5 = 60% The survived mutations are not a failure of the idea, because they are the reason to run it in the first place. They show where the metric is under-protected and where the next test should be added. Where This Fits in the Analytics Stack Metric mutation testing does not replace existing checks, because it sits above them and tests whether the existing validation layer can catch believable logic mistakes. LayerMain purposeSource testsCheck raw input reliabilityModel testsValidate transformed structuresRelationship testsCheck entity integritySemantic definitionsCentralise metric meaningMetric behaviour testsValidate expected calculation behaviourMetric mutation testsTest whether the testing layer catches realistic logic errors This is especially useful when metrics are reused through dashboards, semantic layers, notebooks, reverse ETL jobs, APIs, or AI-assisted workflows. The more widely a metric is reused, the more important its definition becomes. A semantic layer can make a metric consistent everywhere, but if the metric logic is wrong, it also makes the wrong number consistent everywhere. When Not to Use This Metric mutation testing should not be applied blindly to every field and every dashboard card, because that would create noise and slow the team down without adding much protection. It is most useful for metrics that influence important reporting, operational decisions, compliance workflows, financial analysis, product measurement, or machine learning features. Good candidatePoor candidateRevenueLow-usage vanity metricChurnTemporary exploration queryActive usersOne-off analysisConversion rateInternal debug countSLA breach rateNon-critical dashboard decorationRetentionDraft metric still being defined This pattern also works best when the metric has a clear definition, because if nobody can agree on the grain, filters, date logic, or exclusions, mutation testing will expose the ambiguity but cannot resolve it alone. Final Thoughts A healthy pipeline tells you that data moved, a normal test suite tells you that the structure looks valid, and a stronger analytics testing layer tells you that the number still behaves like the metric it claims to be. Metric mutation testing adds one more question: If someone introduced a realistic logic mistake tomorrow, would our system catch it? That question matters because many analytics failures do not look like failures at first. They look like ordinary numbers. While the dashboard refreshes, the chart renders, and the table has rows. The issue only appears when someone realizes the calculation no longer means what everyone thought it meant. Good data can still produce a bad metric, and the next step for analytics engineering is not simply more tests, but better tests that protect the meaning of business numbers.

By Prateek Arora
Testing AI-Infused Apps: A Dual-Layer Framework for AI Quality Assurance
Testing AI-Infused Apps: A Dual-Layer Framework for AI Quality Assurance

AI-infused apps are different from traditional software. Apps that embed large language models, agents, retrieval-augmented generation (RAG), or tool-calling workflows bring their own characteristics. They combine deterministic code with probabilistic intelligence. This creates new failure modes that standard testing practices cannot fully address. Engineering leaders, QA architects, platform teams, DevOps engineers, AI product owners, and reliability teams must adopt a dual testing strategy: rigorous software testing alongside continuous probabilistic evaluation of AI behavior. Production readiness depends on integrating both disciplines into a single, automated delivery pipeline. In this article, I start by explaining why AI-infused apps fail differently. A two-layer testing framework is then analyzed, followed by a paragraph explaining why contract tests and evaluation harnesses are important. The next paragraph explains that since prompts are release artifacts, they should be treated as such. Regression testing, especially in production, is important for such systems, and the article concludes with a unifying testing strategy for AI-infused apps. Why AI Apps Fail Differently Software development was never fully predictable. While code itself may execute deterministically under controlled conditions, real-world software systems behave within dynamic environments shaped by users, infrastructure, integrations, networks, data quality, operational constraints, and evolving requirements. On the other hand, emergent behavior has always caused nondeterminism in software systems. The introduction of AI-infused apps, however, adds another dimension of unpredictability. It all starts with the stochastic nature of foundation models. Even with the same input, outputs can vary due to temperature settings, model updates, prompt sensitivity, or data distribution shifts. Modern AI workflows compound this complexity: a user query triggers prompt orchestration, retrieval from knowledge bases, agent reasoning loops, multiple tool calls to external APIs, safety guardrails, and structured output formatting. AI-infused applications are not monolithic. They compose multiple components, each requiring distinct testing approaches: Prompts and system instructions: The "code" that guides model behaviorRetrieval systems: Vector databases, embedding models, search relevanceAgent orchestration: Tool selection, reasoning chains, decision treesIntegration APIs: Authentication, rate limits, error handling, data transformationSecurity controls: Input validation, output filtering, permission boundariesObservability infrastructure: Logging, tracing, evaluation metrics A failure in any layer can cascade. A prompt regression can cause increased tool misuse. Embedding model drift can reduce retrieval quality. A poorly validated API integration can leak sensitive data. Traditional software testing catches some of these. AI evaluation catches others. For production readiness, we need to consider both. The Two-Layer Testing Framework Successful AI system testing requires recognizing two fundamentally different quality dimensions. The conventional dimension focuses on traditional software testing, and the probabilistic dimension of evaluating AI. The Two Layers of QA Testing Layer Layer 1 CONVENTIONAL Software Testing Layer 2 Probabilistic AI Evaluation Focus Traditional software components: APIs, databases, infrastructure, integrations, permissions, deployment mechanisms AI-specific behavior: prompt effectiveness, reasoning quality, output appropriateness, agent decision-making) Testing Types Unit tests: Individual functions, utilities, data transformations Integration tests: API contracts, service communication, database operations Contract tests: Tool interfaces, webhook payloads, third-party API schemas E2E tests: Authentication flows, permission boundaries, error handling Infrastructure tests: Deployment validation, scaling, failover Performance tests: Latency, throughput, resource utilization Prompt evaluation: Instruction following, tone consistency, safety adherence Agent behavior tests: Tool selection accuracy, reasoning coherence, task completion Retrieval quality: Relevance scoring, ranking accuracy, citation validation Output validation: Groundedness, factuality, formatting compliance Reasoning assessment: Logical coherence, step-by-step clarity, error recovery Safety evaluation: Harm prevention, bias detection, PII protection Success Criteria Binary pass/fail: Test either passes (assertion true) or fails (assertion false, exception thrown) Threshold-based scoring: Metrics scored on continuous scale (0.0-1.0), must exceed thresholds (e.g., safety_score ≥ 0.95) Tooling PyTest, JUnit, Jest (unit testing) Postman, Pact (contract testing) Selenium, Playwright (E2E) JMeter, Locust (load testing) Terraform validators (infrastructure) LangSmith, LangGraph, Phoenix Arize (evaluation platforms) LLM-as-judge frameworks Embedding similarity metrics Human evaluation interfaces Golden dataset harnesses Rubric scoring systems Figure 1: The two layers of QA for AI-infused apps Systems can pass software tests while failing AI quality expectations. AI systems must be flexible, adaptable, autonomous, evolving, unbiased, ethical, transparent, interpretable, explainable, and safe. Conventional QA may declare that an AI-infused app is healthy. However, AI failures may cause users to experience it as broken, as in the case below. ✅ All APIs return 200 OK✅ Response times under 500ms✅ No exceptions in logs✅ Permission boundaries enforced✅ Database queries optimized✅ Infrastructure scales appropriately❌ Agent selects wrong tools 30% of the time❌ Retrieval returns irrelevant documents❌ Responses ignore safety instructions❌ Hallucination rate increased 15% since last deploy Reliability Through Contract Testing and Evaluation Harnesses AI agents interact with the world through tools: APIs they can call, databases they can query, and services they can invoke. Each tool represents a contract that must remain stable. Especially when our tests give different results every time we run them due to AI, contract testing, and evaluation harnesses are indispensable. Contract Testing for AI Tools When an agent calls a tool (like an API or a database function), the communication is essentially an integration point. We can use contract tests to enforce strict input/output validation at this boundary. By using schema-validation libraries (such as Pydantic), if the LLM hallucinates a parameter, validation blocks it before it hits the production database. Example: Our agent is tasked with calling get_user_balance(email: str). A contract test verifies that even if the LLM tries to pass an object or an array, the interface throws a validation error, preventing the agent from executing a malformed query. Evaluation Harnesses Just as software teams maintain test suites, AI teams need evaluation harnesses. These are systematic frameworks for measuring AI behavior quality. An evaluation harness is an automated framework that runs our application against a golden dataset. This is a curated, versioned set of inputs and "ground truth" reference outputs. Rather than manual spot-checking, these harnesses use LLM-as-a-Judge. A highly capable model acts as the evaluator for the production model. Key metrics include: Groundedness: Does the response rely solely on the provided context?Citation Validation: Does the response correctly link claims back to the retrieved sources?Task Completion: Does the final output solve the user's underlying intent? By automating these checks, we shift AI development towards an engineering process rather than a "vibes-based" set of activities. Prompts Are Release Artifacts Prompts are not just temporary text. If they are a fundamental ingredient for how our AI system thinks, behaves, and makes decisions, then we should treat them as code. Store them in Git, review changes, run automated tests on them, and keep old versions. This way, we can track what changed, catch problems early, roll back bad changes quickly, and prevent unexpected surprises for users. Version Control: Prompts should exist as a versioned artifact in our source code repository.Auditability: When a model starts behaving erratically, we should be able to roll back to the last known "good" prompt version instantly.Regression Risk: Before deploying a new prompt, we should run it through the evaluation harness. Two important issues that we want to address here are instruction drift and safety degradation. Instruction drift is when the AI system starts following its core directives correctly, and then incrementally stops adhering to them. Safety degradation is where the model becomes more susceptible to prompt injection. Regression Testing in Production When behavior can change even when no application code has been modified, regression testing is essential. Conventionally, code changes trigger regression testing. Here, we need to run our regression tests even without code changes. Our regression suites should be executed continuously at regular intervals. AI systems depend on dynamic components such as prompts, models, embeddings, retrieval pipelines, external tools, and user interactions. All that continuously evolves over time. AI systems drift over time due to: Model updates from providersEmbedding model changesData distribution shiftsUser behavior evolutionTool API modificationsCorpus growth or changes Regression testing in production helps detect behavioral drift by continuously measuring output quality. Safety compliance, task completion, and response consistency can also be tracked. With regression testing, teams can monitor operational signals such as escalation frequency, fallback usage, latency anomalies, and drops in evaluation scores. The crucial point here is to find such issues before users report major failures. Since real user behavior is often more diverse and adversarial than test datasets, production validation becomes necessary to uncover edge cases that pre-release testing missed. Continuous regression testing in production is a mechanism that keeps AI systems aligned with user trust over time. Key metrics to track: Escalation frequency: Increase suggests AI can't handle queriesFallback usage: "I don't know" responses risingLatency spikes: Tool calls timing out, retrieval slowingEvaluation score drops: Golden dataset performance decliningUser feedback: Thumbs down rates, explicit complaintsTool error rates: API failures, permission denials increasingCitation accuracy: Groundedness scores droppingSafety violations: Harmful content detection rising Unifying Testing Strategy But how do we test all the above, and most importantly, when and where? As the code is written, we need to test at a unit level. We also need contract tests, prompt evaluation, and integration tests. We need to evaluate prompts and AI behavior using golden datasets and scoring systems, and verify complete workflows through integration testing. Our goal here is to be confident that both the traditional software components and the AI components behave correctly before deployment. In a staging deployment, the system is tested in an environment that closely resembles production. Here, teams can validate infrastructure reliability, performance under load, scalability, and failover behavior. The overall behavior of AI agents under edge cases and safety stress tests can also be evaluated. After staging, the application can move to a canary deployment, where only a small percentage of real users interact with the new version. Here, the system continuously monitors hallucination rates, safety violations, response consistency, latency, and tool-selection accuracy. If important metrics degrade beyond predefined thresholds, the system could automatically roll back to the previous stable version. Finally, the system enters production monitoring. This is where evaluation becomes continuous. The application regularly checks for behavioral drift, retrieval quality degradation, and changing user behavior. Scheduled evaluations and monitoring signals can detect emerging reliability issues. Figure 2: Unifying testing strategy for AI-infused apps Wrapping Up AI-infused applications represent a trend in software engineering. Conventional testing is necessary but insufficient. Production readiness requires two parallel disciplines: The first is software QA for APIs, infrastructure, and integrations. The second is AI evaluation for prompts, agents, retrieval, and model behavior. Organizations that treat these as separate concerns — delegating one to engineering and the other to data science—may struggle with quality issues. Those that integrate both into unified delivery pipelines can build AI systems that are reliable, maintainable, and trustworthy. The path forward is clear: Test tools like APIs: Contract tests, schema validation, permission boundariesEvaluate prompts like code: Version control, regression checks, systematic evaluationMonitor agents like services: Drift detection, quality metrics, automatic rollbackIntegrate testing disciplines: One pipeline, automated gates, continuous validation AI systems will fail in new ways. The question is whether we catch those failures or our customers catch them. A two-layer testing framework with a unifying testing strategy can catch them early, fix them systematically, and deliver AI applications that users can trust.

By Stelios Manioudakis DZone Core CORE

Top Testing, Tools, and Frameworks Experts

expert thumbnail

Kailash Pathak

Sr. QA Lead Manager,
3Pillar

Author ✦ Speaker ✦ Microsoft® Most Valuable Professional (MVP) || Grab My Book On Playwright https://lnkd.in/gpkGYTgG || Read My Blog qaautomationlabs.com/ || 2x AWS,PMI-ACP®,ITIL® PRINCE2 Practitioner® | ISTQB Certified || || Cypress || Playwright || Selenium | WebdriverIO | API Automation
expert thumbnail

Stelios Manioudakis

Lead Engineer,
Technical University of Crete

25+ years of experience in software engineering. Worked at Siemens and Atos as a software quality expert. Worked in the RPA domain with Softomotive for the acquisition by Microsoft. Currently working in the Technical University of Crete. Holds a PhD in Electrical, Electronic and Computer Engineering, University of Newcastle Upon Tyne (UK).
expert thumbnail

Faisal Khatri

Blogger, QA, Mentor, Trainer,
Freelancer

QA with 16+ years experience in Automation as well as Manual Testing. Passionate to learn new technologies. Open Source Contributor, Mentor and Trainer.

The Latest Testing, Tools, and Frameworks Topics

article thumbnail
Selective Deployment in Azure Data Factory: A Practical Blueprint for Safer CI/CD
Implement selective deployment in Azure Data Factory to safely promote individual features without deploying the entire factory state
June 26, 2026
by Sauhard Bhatt
· 1,266 Views · 1 Like
article thumbnail
REST-Assured Configuration and Specifications: Writing Maintainable API Tests
Learn how to use REST-Assured Configuration, Request Specifications, and Response Specifications to build maintainable API tests.
June 25, 2026
by Faisal Khatri DZone Core CORE
· 942 Views
article thumbnail
Implementing Asynchronous Communication Between Microservices Using Kafka and Spring Boot
Kafka decouples services, buffers spikes, and routes failures to a DLT. Schemas are contracts; consumers must be idempotent.
June 24, 2026
by Mallikharjuna Manepalli
· 1,603 Views
article thumbnail
I Built a VS Code Extension to Debug Azure AI Foundry Agents Without Leaving My Editor
Free VS Code extension for Azure AI Foundry agent traces into your editor as an interactive timeline — see tool calls, token costs, and conversation replays.
June 23, 2026
by Jubin Abhishek Soni DZone Core CORE
· 982 Views
article thumbnail
Testing Strategies for Web Development Code Generated by LLMs
LLMs can quickly generate web application code, but AI-written code may contain security vulnerabilities. This article reviews testing methods for LLM systems.
June 19, 2026
by Sandesh Basrur
· 1,590 Views · 1 Like
article thumbnail
On-Device Debugging and JUnit 5
A walk-through of the new JDWP-based on-device debugging pipeline for ParparVM iOS apps and Android apps, with a step-by-step IntelliJ tutorial for each.
June 17, 2026
by Shai Almog DZone Core CORE
· 1,680 Views · 1 Like
article thumbnail
Testing Is Not About Finding Bugs
Finding bugs is what testing produces; understanding quality is why it exists. QA's future belongs to those who understand products, customers, and risks, not just bugs.
June 17, 2026
by Abhinav Garg
· 1,269 Views · 2 Likes
article thumbnail
Getting Started With GitHub Copilot CLI for Coding Tasks
This blog explores GitHub Copilot CLI, a terminal-based AI coding assistant that integrates with GitHub Copilot for users with a subscription.
June 16, 2026
by Gunter Rotsaert DZone Core CORE
· 1,255 Views
article thumbnail
Building a RAG-Powered Bug Triage Agent With AWS Bedrock and OpenSearch k-NN
Learn how a RAG-powered bug triage agent uses AWS Bedrock, OpenSearch, and dynamic scoring to automate crash analysis and routing.
June 9, 2026
by Rajasekhar sunkara
· 1,218 Views
article thumbnail
Frame Buffer Hashing for Visual Regression on Embedded Devices
Learn how frame buffer hashing reduced visual regression storage from 18GB to 19KB while speeding up CI and eliminating flaky image diffs.
June 9, 2026
by Rajasekhar sunkara
· 739 Views
article thumbnail
Amazon Quick: AWS's Agentic Workspace, Explained for Engineers
A technical deep dive into Amazon Quick — how it works, how it connects to your tools via MCP, and where it sits in the AWS agent stack.
June 9, 2026
by Jubin Abhishek Soni DZone Core CORE
· 2,301 Views
article thumbnail
How to Interpret the Number of Spring ApplicationContexts in Integration Tests
When optimizing Spring Boot integration tests, developers often focus on obvious metrics, but they do not always explain why an integration test suite is slow.
June 8, 2026
by Constantin Kwiatkowski
· 1,421 Views
article thumbnail
Why Your Test Automation Is Always Behind the Code And the Architecture That Fixes It
Most QA teams are stuck in a manual scripting loop. Here's the requirement-driven architecture that eliminates the coverage gap permanently.
June 5, 2026
by Waqar Hashmi
· 2,239 Views
article thumbnail
Good Data, Bad Metric: A Mutation Testing Pattern for Analytics Engineering
A mutation testing pattern for analytics metrics that checks if validation catches realistic business logic errors early.
June 4, 2026
by Prateek Arora
· 3,353 Views · 1 Like
article thumbnail
Testing AI-Infused Apps: A Dual-Layer Framework for AI Quality Assurance
Reliable AI delivery isn't either/or—it's both/and. Test conventionally for functionality. Evaluate probabilistically for quality. Deploy with dual-discipline confidence.
June 4, 2026
by Stelios Manioudakis DZone Core CORE
· 3,502 Views · 2 Likes
article thumbnail
Build a GitHub Slack Bot With AWS Bedrock and MCP, Part 2
Build a Slack bot using AWS Bedrock and MCP to answer GitHub questions. Learn setup, architecture, and how to extend it with new tools and data sources.
June 4, 2026
by Sangharsh Agarwal
· 2,612 Views
article thumbnail
Compliance Automated Standard Solution (COMPASS), Part 11: Compliance as Code, the OSCAL MCP Server Way
How AI-native tooling is finally closing the loop between compliance personas and OSCAL artifacts with an MCP-standardized, AI-agent-ready interface.
June 4, 2026
by Yuji Watanabe
· 2,411 Views
article thumbnail
Build a GitHub Slack Bot With AWS Bedrock and MCP, Part 1
Building a Slack bot with traditional APIs led to 400 lines of code. Using MCP and AWS Bedrock reduced complexity, enabling scalable, tool-driven automation.
June 3, 2026
by Sangharsh Agarwal
· 2,869 Views · 2 Likes
article thumbnail
Your AI Agent Tests Are Passing, But Your Agent Is Still Broken
How to test AI agents that call tools — five patterns using traces and behavior contracts to catch bugs your current tests miss.
May 28, 2026
by Biresh Patel
· 2,606 Views
article thumbnail
Setting Up a Data Catalog With Azure Purview and Collibra: What Three Attempts Taught Me
Setting up a data catalog isn’t just a tool problem. My work with Azure Purview and Collibra showed success depends on governance, metadata, and adoption.
May 27, 2026
by Kuladeep Sandra
· 4,169 Views
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • ...
  • Next
  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook
×