Bug handling in DevOps can be the make-it or break-it factor- because of this, I have gathered some of the best tips and practices to treat your bugs.
1. Triage EverythingInvestigate every bug to identify the root cause - there might be other bugs underneath.
Fixing bugs fast is essential for healthy automation, especially as part of today's mature DevOps processes. That said, it is also important to make sure there is no other issue hiding below that bug. Sometimes an immediate fix of a small bug with low priority may expose a bigger issue. In some cases, developers will ignore the low priority bug, without knowing there is a critical issue below. Like a triage process in the ER, only full examination will allow understanding of chronic issues, rather than being misled by specific symptoms.
2. Bugs are Actually EVERYONE's Responsibility to Report and Fix ProperlyA true DevOps process is all about people, processes, and tools. Accordingly, all parties should be engaged and involved in order to build a reliable defect management process.
I often hear about the classic conflicts, where an enterprise QA team claims that Dev is checking buggy code with questions like, "Why do we need to report new bugs if we still have so many unfixed bugs?"
On the other hand, Dev teams may claim that bug reports are not comprehensive, as they lack important context and request them to be reproduced and investigated all over again. I recently got some feedback from a Dev Executive saying she asked to delete 4000+ bugs in the corporate Jira instance, as everyone just lost track and no one really resolved them.
Of course, the impact affects the entire organization. If this healthy process will not take place, the trust around defect management may be lost, and the aim for a high-quality product may be affected.
3. Don't Tell Stories About a Bug: It's Either a Bug or a Story...A "Zero-bug" approach is too dramatic, yet if there's a bug that is not that urgent for an immediate fix - prioritize it as a user story as part of your backlog.
There are different events that were created over the years as part of agile methodologies, to go over bugs: Sprint/sprint plan, triage/defect review meetings, daily/weekly prioritization meetings, etc.
The purpose of many of these events is to categorize bugs on different levels or metrics: P 1,2,3 / important vs. urgent / business vs. tech impact/prioritization based on the estimated time to resolve bugs, etc.
Overanalyzing these questions disturbs you from the actual discussion. Don't waste time, either fix bugs as soon as you find them or transform them into user stories for Dev to deal with in the close future as part of the backlog priorities.
There are always bugs that can be referred to as "code red" bugs (and they usually take everyone's attention). But... there are ONLY a few of them that truly require this level of attention. The rest create some noise if not treated properly (just like with the story about the boy who cried Wolf...).
4. Open Bug? - Ignoring It Again and Again Is Not the Right ApproachClean noise out of your CI: if a test fails and you've submitted a bug for this failure - either fix it immediately or take it out of your execution cycle until it is being fixed.
As the number of your test executions grow, you cannot keep remembering which bugs were not fixed yet and which ones are actually new. The greater the number of unfixed bugs, the harder it is to identify whether a bug has already been reported.
Automation at scale will either require you to invest significant time for this unnecessary discovery process (time you could have better spent on more important bug-related issues) or you'll just give up due to the huge effort required to maintain this thing. On both options, you should expect some bugs that will slip through and cause duplicated work as they are reported again. The long trail of actions related to this duplication will cause bigger waste that will include more triaging, investigation, discovery of existing bugs vs. new ones and more.
So either fix immediately or comment out your test until the reported bug is being fixed. The best approach to handle this is either to integrate your framework or your reporting environment to your used defect management for full traceability of bugs and their status.
5. Bugs Cannot Be Lost in TranslationEstablish a data-driven discussion in order to shorten the fix process. Partial descriptions and missing data will cause delays, as Dev will be required to walk through the same process again.
"Build #296 is on fire" would be the best way to describe the bug you've found as it wouldn't serve as a good Segway for developers to start their debugging work. Remember to state the exact environment conditions of the executed test to make sure that Dev can debug under the same conditions. We see many cases of a loop between Dev and QA, where a tester submits a bug and a tester claims he cannot reproduce it and just closes the case. Until the peers share data, they don't understand that the tester cannot reproduce over an emulator with the latest iOS version something that was found over an iOS device with an older version of iOS.
Creating a fast feedback loop that is relying on cross-team collaboration is one of today's DevOps essentials, a key to success. Aligning everyone over the same test execution report would be the recommended way here. When submitting a new bug, make sure it is linked to a report, this is the best evidence to show everyone what actually happened and the best "meeting point."
6. Use the Same Metrics to Measure Bugs ResolutionThe Bug fixing process must be aligned with release cadence in CI. Pushing bug fixing to the right (e.g. to later stages of your SDLC) may create bottlenecks that will affect your quality or release.
Different teams analyze bugs in different ways: total number of bugs, number of bugs per sprint/release (density), bug lifespan, etc.
Taking aside the urgency or complexity of each case, it is clear that every bug-based metric relies on different calculation/policy. It is likely correct to assume that the number of unfixed bugs increases. With that said, you should expect an increasing challenge to maintain accurate bug status.
In continuous integration and even continuous deployment reality, these assumptions or 'snapshots' about the bug status - may change in a matter of minutes/hours. Merging this data with your release decision process may create an inaccurate baseline and put your quality at risk (as bugs may slip or stall development process).
7. Inject Intelligence Into Your Defect Management ProcessCorrelate wisely bugs to tests, features, and environments so that you have a continuous improvement of your entire portfolio.
DevOps methodology requires executives to identify risks through different metrics such as teams, features, stage and business-related items. Having the ability to tag, correlate and act upon bugs as they relate to the above attributes will help optimize the entire workflow, and support a maturity leap forward in DevOps practices. As an example, a specific team that has more bugs in a specific stage compared to a different team, raises few flags that if highlighted and addressed can serve as an alignment factor for the entire R&D. Another example might be a specific feature that is more error-prone on specific OS versions compared to others - the outcome of such insights shall drive more test coverage and attention to that piece.
DevOps is all about the right balance of people, processes, and tools. Problems will always pop-up between these components. With that in mind, problem-solving (bugs are kind of a problem, right?) is fundamental to a healthy agile process.
The process that you build in order to treat bugs as part of automation at scale should be designed with the following ideas in mind:
- Provide the right visibility of bugs (when identified).
- Sufficient enough in order to fix them fast.
- Triaging is essential to determine where to put focus in CI/CD.
- Communicate quality findings with regards to bugs across teams for full alignment.
- Measure bug fixing process using unified metrics.
- Trust - a key factor in order to establish a sustainable process between teams which eventually leads to a stage where everyone take responsibility for bugs.