Technically speaking, you can’t be sure you are fixing the problem unless you can run through the same steps, see the problem happen yourself, fix it, and then run through the same steps and make sure that the problem went away. If you can’t reproduce it, then you are only guessing at what’s wrong, and that means you are only guessing that your fix is going to work.
But let’s face it – it’s not always practical or even possible to reproduce a problem. Lots of bug reports don’t include enough information for you to understand what the hell the problem actually was, never mind what was going on when the problem occurred – especially bug reports from the field. Rahul Premraj and Thomas Zimmermann found in The Art of Collecting Bug Reports (from the book Making Software), that the two most important factors in determining whether a bug report will get fixed or not are:
- Is the description well-written, can the programmer understand what was wrong or why the customer thought something was wrong?
- Does it include steps to reproduce the problem, even basic information about what they were doing when the problem happened?
There are other cases where you have enough information, but don’t have the tools or expertise to reproduce a problem – for example, when a pen tester has found a security bug using specialist tools that you don’t have or don’t understand how to use.
Sometimes you can fix a problem without being able to see it happen in front of you, come up with a theory on your own, trusting your gut – especially if this is code that you recently worked on. But reproducing the problem first gives you the confidence that you aren’t wasting your time and that you actually fixed the right issue. Trying to reproduce the problem should almost always be your first step.
What’s involved in reproducing a bug?
What you want to do is to find, as quickly as possible, a simple test that consistently shows the problem, so that you can then run a set of experiments, trace through the code, isolate what’s wrong, and prove that it went away after you fixed the code.
The best explanation that I’ve found of how to reproduce a bug is in Debug It! where Paul Butcher patiently explains the pre-conditions (identifying the differences between your test environment and the customer’s environment, and trying to control as many of them as possible), and then how to walk backwards from the error to recreate the conditions required to make the problem happen again. Butcher is confident that if you take a methodical approach, you will (almost) always be able to reproduce the problem successfully.
In Why Programs Fail: A guide to Systematic Debugging, Andreas Zeller, a German Comp Sci professor, explains that it’s not enough just to make the problem happen again. Your goal is to come up with the simplest set of circumstances that will trigger the problem – the smallest set of data and dependencies, the simplest and most efficient test(s) with the fewest variables, the shortest path to making the problem happen. You need to understand what is not relevant to the problem, what’s just noise that adds to the cost and time of debugging and testing – and get rid of it. You do this using binary techniques to slice up the input data set, narrowing in on the data and other variables that you actually need, repeating this until the problem starts to become clear.
Code Complete’s chapter on Debugging is another good guide on how to reproduce a problem following a set of iterative steps, and how to narrow in on the simplest and most useful set of test conditions required to make the problem happen; as well as common places to look for bugs: checking for code that has been changed recently, code that has a history of other bugs, code that is difficult to understand (if you find it hard to understand, there’s a good chance that the programmers who worked on it before you did too).
One of the most efficient ways to reproduce a problem, especially in server code, is by automatically replaying the events that led up to the problem. To do this you’ll need to capture a time-sequenced record of what happened, usually from an audit log, and a driver to read and play the events against the system. And for this to work properly, the behavior of the system needs to be deterministic – given the same set of inputs in the same sequence, the same results will occur each time. Otherwise you’ll have to replay the logs over and over and hope for the right set of circumstances to occur again.
On one system that I worked on, the back-end engine was a deterministic state machine designed specifically to support replay. All of the data and events, including configuration and control data and timer events, were recorded in an inbound event log that we could replay. There were no random factors or unpredictable external events – the behavior of the system could always be recreated exactly by replaying the log, making it easy to reproduce bugs from the field. It was a beautiful thing, but most code isn’t designed to support replay in this way.
Recent research in virtual machine technology has led to the development of replay tools to snapshot and replay events in a virtual machine. VMWare Workstation, for example, included a cool replay debugging facility for C/C++ programmers which was “guaranteed to have instruction-by-instruction identical behavior each time.” Unfortunately, this was an expensive thing to make work, and it was dropped in version 8, at the end of last year.
Fuzzing and Randomness
If the problem is non-deterministic, or you can't come up with the right set of inputs, one approach to try is to simulate random data inputs and watch to see what happens - hoping to happen on a set of input variables that will trigger the problem. This is called fuzzing. Fuzzing is a brute force testing technique that is used to uncover data validation weaknesses that can cause reliability and security problems. It's effective at finding bugs, but it’s a terribly inefficient way to reproduce a specific problem.
First you need to setup something to fuzz the inputs (this is easy if a program is reading from a file, or a web form – there are fuzzing tools to help with this – but a hassle if you need to write your own smart protocol fuzzer to test against internal APIs). Then you need time to run through all of the tests (with mutation fuzzing, you may need to run tens of thousands or hundreds of thousands of tests to get enough interesting combinations) and more time to sift through and review all of the test results and understand any problems that are found.
Through fuzzing you will get new information about the system to help you identity problem areas in the code, and maybe find new bugs, but you may not end up any closer to fixing the problem that you started on.
Reproducing problems, especially when you are working from a bad bug report (“the system was running fine all day, then it crashed… the error said something about a null pointer I think?”) can be a serious time sink. But what if you can’t reproduce the problem at all? Let’s look at that next…