A couple of weeks ago we looked at how to do a quick "health check" of an agile team. We saw that a great deal can be learned just by attending one of their daily stand-ups and by inspecting the state of their Scrum and Kanban boards. Of course these are nothing more than cursory examinations, and serious ailments can lie behind an apparently robust façade of agile practice. If you have reason to believe that a team is dysfunctional, you might have to dig deeper than the superficial evidence suggested by its apparent morphology.
In my experience the next thing to examine is the team's "Definition of Done". This is the standard to which all work is put before it can be considered to be complete. Each team is collectively responsible for its own Definition of Done. It's up to them to make sure that it is adequate, and that it is applied by all members to all of the work they do. Without such professional oversight there can be no assurance that any deliverable will truly be fit for release. A spiffy stand-up and a cracker of a Kanban board might suggest rude team health, but they are no guarantee that the Definition of Done is solid, or that it isn't being undercut by someone along the way.
Technical debt and rework are the main symptoms to look for. The consequences of backsliding on a Definition of Done might not become apparent until long after the events that caused it. By then, that rework or debt can be difficult to trace to the specific behaviors of those who cheated the system. You see, unfortunately a Definition of Done is a bit like personal hygiene. If there is no oversight, everyone can pretend that they are following the rules for the sake of the team, even though the presence of E. Coli on the office keyboards will tell its own tale about compliance. Everyone knows that it has to be coming from somewhere, but won't admit to their own liability or involvement, perhaps not even to themselves. Just as team vomiting will follow one member's dubious hand-washing practices, a short-changed Definition of Done will lead to rework by the team or the creation of technical debt. This is why team ownership and enforcement of what "Done" means is so important. An effective Definition of Done has to be founded on a healthy balance between due diligence and professional trust.
What does this mean for agile development?
You can think of a Definition of Done as the key defensive bulwark in software development epidemiology. If you balance the right level of team oversight with the right level of trust, severe outbreaks of technical debt or rework will become rare. High levels of oversight may be needed to start with, since team members might not have bought in to the idea of "done" yet. Once people become conditioned to do the right thing and see themselves as stakeholders in the team and its success, the balance can swing more towards trust. People become reluctant to renege on a team investment if they can see that it adds value for everyone including them. What's more, a Definition of Done improves the more it is respected, and becomes better respected the more it improves.
In terms of agile best practice a Definition of Done will be used to determine whether or not User Story implementations are release-ready. However, each team can implement many User Stories over the course of a sprint, and making sure that all of these stories meet the Definition of Done can be challenging. Teams that are new to agile methods often have more modest ambitions. For example, their Definition of Done may only extend as far as delivery into a pre-production environment. Of course, anything less than "fully release ready" incurs the risk of technical debt and the need to pay it back later. Yet like a sloppy approach to hand-washing, it has to be admitted that something is better than nothing at all. Applying a Definition of Done consistently to even a sub-optimal standard will at least permit the delivery of each User Story to a known level of quality. It might not be great but at least it's there, and it's something that can be built upon and improved.
The Lessons of Lean-Kanban
Lean-Kanban methodologies have an instructive relationship with the Definition of Done. In these approaches the optimization of the value stream is of great significance. Naturally though, if a value stream is to be optimized it must first be understood. This means breaking the stream down into multiple discrete steps that can be studied for bottlenecks and any other occurrences of waste. For example, "Work in Progress" can be broken down into finer-grained stations such as "In Development", "Peer Review", "QA Test", "Knowledge Transfer", and "In Deployment". Team members will be cross-trained and will move freely across those stations in order to expedite as smooth a workflow as possible.
Now here's where it gets interesting. If a Lean-Kanban operation has multiple well-defined stations, the case for having a Definition of Done can seem rather harder to make. After all, by the time a User Story gets to "Done", you already know that it has gone through the key steps you care about in the development process. What value can a Definition of Done really add in such a situation? Doesn't it just become waste itself?
To find the answer, we need to look back to the manufacturing roots of Lean-Kanban. In a car plant for example, the steps of construction are exceptionally well defined and team members can move freely over several dozen stations. Some of those stations will be for the chassis, others for the interior, others for the engine block and electrics and so on. Yet despite this the Definition of Done will be an absolute corker, and much of the process of verification will be automated. Each station might even have its own Definition of Done so inspection can occur as close as possible to the point of action and potential remedy. The total number of checks that happen before each vehicle leaves the factory will be exhaustive. Why is this thought to be necessary? Because the manufacturers know perfectly well that the verification of "done" adds value. Merely having well-defined stations is no guarantee that everything will be done well. Quality is built in and validated by inspection. One thing's for sure: no-one in IT should accuse car manufacturers of having a weak understanding of what "done" means.
The Definition of Done versus Acceptance Criteria
However, software projects have a wild-card to deal with that car manufacturers don't have to worry about. Unlike the car doors and carburettors that roll down an assembly line, each User Story is different and has to be treated as a special case. To deal with this, each User Story has Acceptance Criteria that are agreed by the team members and the Product Owner as part of a Sprint Planning Session.
Acceptance Criteria have to be quite specific to particular User Stories, because each story can be unique. The Definition of Done, on the other hand, applies to all of the User Stories being worked on by a team. The associated conditions must be invariant. For example, if all work has to be peer reviewed and subjected to QA testing prior to release, then those criteria would be enumerated in the Definition of Done rather than being repeated in each User Story's Acceptance Criteria. If the definition is enforced properly, a developer could never claim that a User Story was “Done” if it hadn't both been reviewed and passed QA testing.
Writing a Definition of Done
The Scrum Guide describes a Definition of Done as a "shared understanding of what it means for work to be complete". No process is suggested for writing a Definition of Done, nor in fact is there any suggestion that one should be written down at all. However, a documented definition may go some way towards providing that shared understanding. Here's how you can set about eliciting one:
- Review Acceptance Criteria:
- Gather the Acceptance Criteria for work completed so far
- Look for common criteria that can be abstracted out and applied across work in general
- Use these common criteria as the basis for a Definition of Done
- Assess Technical Debt
- Identify any rework that needs to be done
- Identify the reasons why it wasn't done properly the first time
- Identify what measures can be put in place to stop similar rework from occurring
- Add these measures to the Definition of Done (DoD)
- Continually update the DoD
- In each Sprint Review, identify which (if any) work was rejected or which caused rework to be done, then
- In each Sprint Retrospective, challenge the DoD for relevance and completeness
Example of a Definition of Done in Acceptance Criteria Format
- Given that a user story has required a code change
- When BDD and unit tests have been written for the story and the code change has been reviewed and the code change has been approved by a peer and all BDD and unit tests have been run and no BDD or unit tests have broken (green bar) and the code change has been committed to the repository and QA testing has passed satisfactorily and the Product Owner has approved the change
- Then the user story will be deployed to production and it will be considered Done.
- Given that a user story has required the authoring of documentation
- When the documentation has been reviewed and approved by a peer and the documentation has been approved by the Product Owner
- Then a new version of the documentation will be committed and the user story will be considered Done.
Example of a Definition of Done in List Format
- BDD tests written and pass
- Unit tests written and pass
- Code peer reviewed & approved
- Code committed to repository
- QA testing done
- Product Owner signed off
- Documentation has been peer reviewed & approved
- Documentation approved by Product Owner
- Version number updated
Definitions of Done for IT Infrastructure Support
We've seen that having a good Definition of Done is important, although in IT we also need Acceptance Criteria that address the particulars of each User Story. When used in combination they can approach the levels of rigor that have been shown to be possible in manufacturing. Those working in software development can adopt a similar commitment to quality. Now we need to turn our attention to another function within the IT department...Infrastructure Support.
Infrastructure support teams are increasingly expected to work in an agile way. As part of an enterprise transformation that does not seem unreasonable. After all, the rest of the organization is highly dependent upon them. Their scope includes such things as deploying new workstations and laptops (possibly across entire sites), installing networks, performing miscellaneous diagnostics and repairs, and maintaining and upgrading local server resources. Clearly they will also need Definitions of Done and Acceptance Criteria if they are to be stakeholders in a joined-up agile enterprise.
The question is, how on earth can a meaningful Definition of Done be abstracted across wildly different physical tasks? How can a Definition of Done cover everything from a phone installation to a printer driver upgrade or a memory enhancement, or a firewall configuration to a keyboard replacement?
The answer is to focus on the value chain that is represented by each user story. Work is not "released" as such, but rather it is handed over to someone who can derive benefit from it (i.e. the person occupying the user story role). This is the key to understanding "done" in an infrastructure context. If you can identify the parties who derive value, and demonstrably pass that value on to them, then your work is done. Here's an example of a Definition of Done that might be used to close out a support ticket:
- The receiver of the work has been identified
- Handover instructions have been completed and given to the receiver
- The receiver has been notified of the intention to close the ticket, and has not raised an objection
- A security assessment has been conducted and approved
- The absence of any reference to a Product Owner. This is because infrastructure teams have to support multiple products, and prioritization of work is traditionally handled not through any sense of ownership of those products, but rather through help-desk triage. It's certainly possible for work to be represented by Product Owners, but it would have to be ownership of the business support function rather than ownership of the actual products being supported. The need to identify and work with the actual receivers of value is still there.
- The "acceptance by default" position. Receivers typically have little incentive to sign work off as being complete. On the contrary, their incentive is to defer acceptance as long as possible, for potential use as a "banker" in case a requirement for additional unforeseen work transpires. They might hope to ride this new work on an existing ticket instead of having to raise a new one. Receivers can be expected to care about their own support needs, not about the big picture of enterprise delivery. If a Product Owner can be identified - even if it is just the most likely owner of the business support function - then this situation can be improved. A Product Owner can apply leverage for appropriate and timely sign-off, such as by not accepting new work from certain parties while their approval (or justified rejection) of prior work remains outstanding. The elicitation of solid Acceptance Criteria can help the Product Owner immensely.
- Security implications need to be given careful consideration. The reworking of organizational infrastructure offers great potential for security to be compromised. Approval from Information Security should be obtained for all work, either directly or through authorized agents. One approach is for each team to have a designated "security champion" who provides this function.
Teams that appear to be healthy can still incur rework and technical debt. A poor understanding of what "done" means often underlies such problems, and this should be one of the first things to be looked at if problems are suspected. Having a meaningful Definition of Done encourages a team's sense of ownership of their own process, and helps instil self-discipline into its members to follow agile best practices. The application of this standard requires finding the right balance between team oversight and trust.