Learning From Your Bugs
Learning From Your Bugs
How often do you find a category of bug crops up again and again yet you still can't seem to learn from your mistakes? “Errors are rich in information.” Here is a proven methodology to help you consciously attempt to learn from your bugs.
Join the DZone community and get the full member experience.Join For Free
Maintain Application Performance with real-time monitoring and instrumentation for any application. Learn More!
Bugs are great learning opportunities. So how do we make sure we learn as much as possible from the bugs we fix? A method I have used for more than 13 years now is to write down a short description of the bug, the fix, and the lessons I learned.
Way back in 2002, I came across a blog post (that I unfortunately can’t find again) that described this method. I have used it ever since, and I believe it has helped me become a better software developer.
Every time I fix a particularly tricky or interesting bug, I take a few minutes to write down some facts about it. Here is an example of a typical entry:
Date: 2004-08-17 Symptom: Infinite loop when decoding Q.931 message Cause: When an unknown element id is found in a Q.931 message, we try to skip it by reading the length, and advancing the pos pointer that many bytes. However, in this case the length was zero, causing us to try to skip the same element id over and over. How found: This happened during parsing of a setup message taken from an Ethereal trace from Nortel. Their message was 1016 bytes long (it included a lot of fast start elements), but our MSG_MAX_LEN was 1000. Normally we then receive a truncated message from common/Communication.cxx, but now, when fed directly in to be parsed, memory past the end of the array was accessed, and it happened to be zero, exposing this problem. To find it, I just added a few print outs in the q931 parsing code. But it was lucky that the data happened to be zero. Fix: If the length given is zero, set it to one. This way we always move forward. Fixed in file(s): callh/q931_msg.cxx Caused by me: Yes Time taken to resolve bug: 1 hour Lessons: Trusted the data received in an incoming message. It's not just giving huge numbers that can cause problems. Indicating a length of zero could be just as bad.
How It Works
I have a plain text file called bugs.txt. At the top of the file is a template with all the headings, but no information filled in. When I add a new entry, I copy the template section, and paste it in just below the template. Then I go through and fill in the information. Most of the fields shown in the example above should be self-explanatory.
It’s important to note that this is not a bug tracker. And I don’t add all bugs I fix. For example, if I just forgot to add a statement in the code, and as soon as I try it I realize what the problem is, I don’t add an entry. It is only when the bug, or the fix, or the debugging of it was particularly interesting that I add a new entry. Usually, I caused the bug. But occasionally, especially when hunting down a difficult bug over many days, I will add an entry even if I didn’t cause it.
Once a bug is fixed, my first reflex is to breathe a sigh of relief and move on. However, I try to write the entry immediately after it has been fixed. That’s when all the details are still fresh in my mind. Waiting makes it much harder to remember exactly what happened (or I forget to write an entry at all).
So far, I have 194 entries, which comes to about one new entry a month on average. The most important section is Lessons. This requires some introspection. What made this bug special? I have found that the lessons usually come in three different areas:
Coding. What mistakes did I make in the code? Did I forget an else-part? Was there a system call that failed, but the response wasn’t checked? How can I adjust my coding to avoid these problems in the future?
Testing. Sometimes it is clear from the bug that it should have been caught in testing. If so, testing at which point – unit, functional, system? What test case was missing?
Debugging. How could I have sped up finding this bug? Did I have the right tools? Did I assume too much? Do I need better logging in the code?
In his book Antifragile, Nassim Nicholas Taleb writes: “Errors are rich in information”. I agree completely. Bugs help us understand the system better, and they indicate how we can improve our coding, testing, and debugging techniques. So I think it is only natural to try to learn as much as possible from them.
By writing down an entry in the bugs file for each interesting bug, I find that I learn much more easily. There is something with the act of writing that makes me think more deeply about what happened. Also, once it is written down, I can go back afterwards and check what happened. Once in a while I will also browse through the file, reading only the lessons section, to reinforce what I thought was the most valuable lesson from each bug.
I have been adding entries to my bugs file for 13 years now. That’s a long time. But I am still doing it, because it helps me improve as a developer. Try it, and see if it works for you too!
Published at DZone with permission of Henrik Warne , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.