Accelerating Debugging in Integration Testing: An Efficient Search-Based Workflow for Impact Localization

Parallelized 5-Point Adaptive Search boosts debugging in CI/CD by rapidly finding failing builds using parallelism and early test result analysis.

Arun Vishwanathan

May. 16, 25 · Tutorial

Likes (1)

Comment

Save

3.0K Views

The Problem: Debugging at Scale

With frequent software releases, one of the challenges faced in software debugging is localizing potential impact-causing changes. However, testing every change one by one is impractical, especially when dealing with a large set of changes over time.

Here I refer to a group of commits or changes as a "build." Each build has a number associated with it

When the system-level or full suite of tests is executed, although less frequently due to resource constraints, such tests can reveal what we call "regressions" during execution. A regression is an issue that appears in a specific build used currently but is not present in, say, a customer-shipped version of the software. In other words, as software evolves, integration tests catch regressions associated with changes introduced during the development phase.

In intensive work environments, for quicker turnaround and qualification, unit tests are used much more frequently. As a result, this can cause subtle bugs associated with changes to slip through, which ultimately get caught in complete end-to-end level testing across the entire software stack.

To isolate change(s) causing issues and to enable engineers to investigate further, this involves running the tests on each intermediate build after the version where tests pass, to find the first occurrence of the issue. However, this poses three key challenges:

Resource Constraints: Running full test suites on every build is expensive in terms of computing and time. Also, as tests run on devices, there are constraints around the device pool availability.
Time Complexity: Testing every build until the failing one is found can potentially take hours, days, or even weeks, depending on the available time and resources.
Delayed Feedback: Engineers need a fairly quick turnaround to diagnose and resolve regressions during code convergence (which is a phase when issues are investigated rather than working on new features), but slow debugging cycles hinder development velocity.

In this article, I introduce an optimized parallelized 5-point adaptive search, a novel debugging workflow that significantly improves the localization of failure-inducing builds while minimizing test execution overhead. By leveraging multi-point search and parallel execution on devices used in testing, this method accelerates regression detection compared to binary search.

In terms of resources to execute tests, we have access to multiple machines or devices where the suite of tests can run. You can think of this as a pool of devices spanning across hardware configurations, used for a wider testing coverage.

Real-World Impact

Some observed results of using the approach are as follows:

Improved turnaround: The search-based method reduced debugging time by 70-80%, often requiring just 3-5 range search executions to pinpoint failures, while previously debugging regressions across 50-100 builds took multiple days.
Quicker communication between testers and developers on localizing issues.
Quicker actions: Engineers could quickly file actionable bug reports, inspecting commits that might have led to test failures, thereby reducing overall development cycle delays.

Applicability of the Approach

This approach is very useful for automation engineers to assist developers in debugging issues when there is a fast-moving development pipeline with changes going into a release. One needs to trace back or step back and investigate when issues were introduced, which are caught in system-level tests.

The first question posed to test engineers is “Can you let us know the first occurrence of the issue?” or “When did this start happening?!” This is not an easy question to answer, but it is not impossible to localize when this could have occurred in the development phase.

Therefore, in my attempt to answer this question, the search method is explained in the next section.

Also, this approach mostly works when we understand that tests were passing until they started failing recently, similar to the precondition of a binary search. If tests have been inconsistent in pass/fail patterns, this method would not work immediately. It would need refinements to have a narrow search range in such cases and then be employed.

Challenges in the test Environment

Test execution is asynchronous: Some tests can start later due to the device setup time, to bring up the device ready to use.
Limited device pool: Parallel test execution depends on available resources.
Redundant test executions must be avoided: A test run on a build should not be repeated.

Approach: Parallelized 5-Point Adaptive Search for Build Localization

Consider a situation in which an integration test has failed on build Y, over which it was run most recently, but was passing on a stable build or a customer build, released earlier, called C.

This means there have been software changes over time that have led to what we see now, as a failure at some point, Y in time.

Step 1: Initial Build Selection

Identify C (last passing build) and Y (first known failing build).
Select 5 evenly spaced builds between C and Y:
- X1, X2, Xmid, X4, X5 , where Xmid is the mid version of these builds considered.
Kick-off test runs on these 5 builds in parallel (depending on device availability). We chose 5 as a random constant, and this depends on the resources available in a device pool.
Store these builds in a cache to avoid redundant executions in future iterations for the same builds.

Step 2: Dynamic Execution Tracking and Early Stopping

Monitor these executions in real time. As part of this stage, we track two critical builds:
- Earliest Failing Build (EFB) → The first detected failing tests amongst the five chosen builds
- Latest Passing Build (LPB) → The latest detected passing tests amongst the five chosen builds

Set a timeout threshold (T seconds):
- After T seconds, use the earliest failing and latest passing builds discovered so far and proceed to the next stage below. If none of the runs have completed at this point, we will have to wait and keep checking, but as soon as we have some information on EFB or LFB, we can proceed to the next step.

Step 3: Refining the Search Range Until Convergence

If an EFB is found, we know that we can shift our search toward the left before the EFB, because our objective is to find the first build when the test starts failing. We also keep note of the observed EFB each time in the build cross section when we repeat the process within the remaining range of builds. The ultimate goal is to know the earliest of all the EFBs.

If instead, an LPB is found, we know that we can shift our search toward the right after the LPB, because we want to find the first build when the failure appears, and up to the LPB, tests have been passing. So we eliminate everything before and including this build, and repeat the process within the remaining range.
If fewer than five builds remain, we can test all remaining builds in sequence.
Repeat the process, and on convergence, we will find the first build with the failing test, and the one just before would be the last build with the passing test.

Algorithmic Analysis: Best and Worst Case Performance

Search Method	Search Space Reduction per Step	Best Case Complexity	Worst Case Complexity
Sequential Search	1 build at a time	O(N)	O(N)
Binary Search	N/2 per step	O(log2N)	O(log2N)
5-Point Parallel Adaptive Search	N/5 per step in the best case and N/2 steps in the worst case	O(log5N)	O(log2N)

Best Case Scenario O(log5N)

If the first failure (EFB) or last pass (LPB) is detected at the start or the end of the builds selected, the search depth is reduced faster than binary search, achieving O(log5N). Although this is just a constant reduction in runtime complexity, it indicates some useful improvement when using parallelism and several device resources.

Worst Case Scenario O(log2N)

If failures appear near the midpoint, the method behaves like a binary search, halving the search space each step. Even in this case, it performs no worse than binary search.

Conclusion

The Parallelized 5-Point Adaptive Search method improves debugging efficiency by:

Utilizing parallel test execution across multiple builds.
Reducing search space faster than binary search.
Implementing early stopping based on real-time test results.
Handling variable test start times and resource constraints.

This method is particularly useful for large-scale CI/CD test automation leveraging devices for testing, and in OS-level integration testing, where finding the first failing build quickly saves critical debugging time and engineering effort. It helps to investigate issues quickly and apply fixes in the pipeline.

Integration testing Build (game engine) Testing

Opinions expressed by DZone contributors are their own.

Related

Trending