Is it appropriate to fail builds based on metrics? If so, when? If not, why?
My quick answer to this question was something along the lines of, "It may be appropriate for your context, but I personally choose not to do this with my team." As it turned out, this raised a few eyebrows, including those of my fellow panelists. Please allow me to summarize my thoughts on this matter.
Let me start off my saying that I believe metrics are incredibly important. One of my favorite software development quotes is related to metrics, spoken by Tom DeMarco:
You can't control what you can't measure...Anything you don't measure at all is out of control.
There are an awful lot of things that we want to manage on software projects: complexity, quality, velocity, etc. If we want to manage these things, we have to gather data. Unfortunately, as software projects grow, gathering all of these data gets incredibly difficult. Fortunately we have the aforementioned tools to help.
With all of that said, it's incredibly important that we're careful about how we decide what to measure. We need to ensure that measuring and controlling a particular variable is going to achieve our desired outcome. Let's consider the all too common "lines of code" (LOC) metric. It's still incredibly prevalent to hear executives, managers and other "non-technical types" bragging about the LOC in their projects. Worse than that, developers' performance evaluations are often based in part on the LOC that they write. The obvious goal is to find an objective measure of developer productivity; since developers "write code," let's see how much code they write! Unfortunately, the enlightened among us know that more often than not, the right thing to do is to reduce the LOC in a project. Any additional line of code increments the complexity of software, decrements its availability, and introduces additional potential points of failure. But if satisfying my addiction to food, clothing and shelter is dependent on writing as many LOC as possible, I'm not going to be too concerned about reducing complexity and increasing maintainability. Thus, the whole house of cards falls down. If you're considering a metric for your project, carefully analyze whether or not it will help you achieve your goals for the project.
Not only that, the team must be absolutely, positively, crystal clear about the purpose of the metric. Our natural tendency is to "game the metric." Think about the extremely popular code coverage metric, and consider the rule that "code coverage must not go down for any check-in." If I'm programming in Java (or a similar language), this can present a problem. If I write a few JavaBeans with the requisite getters and setters and check those in without writing tests, my code coverage will go down. Do I really want developers writing tests for getters and setters? And think about code coverage itself. For the most part, the tools simply measure whether or not code is executed by a test and can't provide any feedback about whether or not the code is meaningful. If all I have to do is make sure code is covered, I can write meaningless unit tests with zero assertions and game the metric. Is this the professional thing to do? No. Does it happen? Yes.
For these two reasons, I prefer to keep metrics set aside as passive feedback rather than active feedback. Our team reviews these fairly regularly and we have discussions during retrospectives and other meetings to examine what our metrics are telling us and take action where necessary. The converse to this is the situation where I can't deploy to production without a passing build (since my deployment process is completely automated and based on my CI server - more on this in a later article), and writing a meaningless unit test to up the coverage will allow me to go home. I don't want to put my developers in that spot. Call me "not agile" if you want, but I feel like we're being effective in meeting our project goals.
The counterargument to this idea is that we want to keep the feedback/response cycle very tight. If I make a change that causes negative movement in a metric, I need to know that as quickly as possible so that I can respond effectively. If I don't find out for a week, I won't know what I was doing at that point and time and it will be harder for me to respond. I fully agree with these statements. We do want to keep this feedback loop as tight as possible. So, in your context, if you have a metric that is well characterized, will help you achieve your goals, and is very well understood by the entire team, by all means - include it in your build failure decision tree. However, I think it's equally valid to visit these in a more passive way during a weekly retrospective.
At any rate, I'm much more interested in the overall trend for my projects than point in time numbers. Tools like Sonar with its "Time Machine" feature make this much easier to see. If I see that my code coverage has been steadily dropping for the past week, I'm going to raise a flag. But if I see code coverage drop for one build and then steadily continue to rise, who really cares? My team is doing the right thing already.
Whatever you decide to do, first think - is this going to help my team achieve its goals? Second communicate - make sure everyone understands the metric and why it's there. If you properly handle these two items, I think you're more than qualified to make the "build failure" decision on your own.