The tester executes a test which fails. She reports the error and gives a correct description to reproduce it. However, the same test passes at the developer’s machine. The cause may be the different environment, a synchronization problem due to the multi-thread code, or the developer made a mistake by modifying the input accidentally. Unfortunately, the developer usually closes the error as NR (non-reproducible) until the tester makes it reproducible.
However, there is yet a solution to this and similar problems. The clue is using standards. Standards are used to compare anything else with them. A well-known example is international standard metre, a standard to measure length. Another good example is regression testing, when testers have expected outputs as standards, and whenever the code is getting worse, executing regression tests reveals the problems.
An execution can also be a standard. If someone records the execution trace of a test, and this test is non-reproducible because it passes at the developer machine, then this second execution can be considered as standard. The execution traces resulting in both a failure and a success will be different or else the results would be the same. The difference just indicates the problem, and studying it you can find the defect or you can come up with a much better hypothesis.
The first step is to record both execution traces, then compare them. A simple way of comparison is based on coverage, i.e. the order of execution steps are neglected. Considering one statement the outcome can be the following:
1. Both traces contain the statement
2. Only the first trace contains the statement
3. Only the second trace contains the statement
All three cases can be colored in the same source code, and thus can be overviewed and understand. The bug hunting has to be started from the differences investigating the reasons of them. It’s reasonable to start with the historically first difference, since if the defect causes this difference, then the location of the bug is the closest to this first difference.
Fault types the method is working
This method can also be applied for cases other than non-reproducible bugs, i.e. identical inputs. Execution comparison works for two execution traces, one is the standard which passes and the one which fails. The other requirement is to have inputs for which the executions should behave identically, but unfortunately they are different. For example a file should be written to disk for any acceptable file name. If one of the file names results in a failure while another doesn’t, then comparison method can be applied. In practice non-reproducible bugs are about 5-14% of all bugs according to case studies. The rate of the other bugs can be revealed by this method is currently unknown, but based on our and our users’ experiences is probably higher.
The significant advantage of the method is that you can find those faults which were not able with traditional methods or finding them may take days. It’s a clearly an active debugging method (read http://java.dzone.com/articles/debugging-step-step-active-and) since you can significantly reduce code to be investigated. The disadvantage of the method is that long execution paths may eliminate the differences. The solution is either using as short executions as possible or to consider execution order.
Let us consider an example taken from the Code Complete book by Steve McConnell. The specification and the names are modified a bit.
The program lists the employees and their salaries in alphabetic order. There have been four employees and we just add a fifth one whose name is Gabriel Green-Scott. Then we list the result as below:
|Green-Scott, Gabriel ||$7950|
|McConnell, May ||$7200 |
|Perry, Kelly ||$8800|
|Scott, Sally ||$8500|
We can see that the result is faulty, since Gabriel Green-Scott is erroneously listed after Adam Hill. However if we make a list again, the result becomes correct. Thus, we have two executions: one fails another passes, but both should behave identically. This is a perfect candidate to apply Comparison Debugging. Studying the differences we realize that for some reason two different sorting algorithms were executed. On the contrary to the original solution of Steve McConnell, we don’t even need to recognize the problem, namely the faulty sorting of hyphenated names.
Of course, the bug could have been found by applying either traditional debugging or with the hypothesis described in Code Complete, but this solution is much easier and faster.
Our tool, Jidebug is based on coverage-level execution comparison. In many cases the tool helped detecting very tricky errors, most of them were reproducible. Our experience is that in several cases this approach is sufficient. To learn more visit our web site: jidebug.com or try Jidebug for free: jidebug.com/download