Paul Duvall on Continuous Integration
[img_assist|nid=3024|title=|desc=|link=none|align=right|width=108|height=160]Paul Duvall, co-author of the book on continuous integration, in this interview with Javalobby discusses about CI, and his book which was the winner of the Jolt award. He discusses the road ahead for continuous integration, what he envisions in the next version of CI and also gives us some tips and tricks. Continuous Integration: Improving Software Quality and Reducing Risk by Paul Duvall, Steve Matyas, and Andrew Glover, is a great book that received 5 stars in all the categories in our Javalobby review.
Meera. Paul, first please tell us who you are and what you do?
Paul. I'm the lead author of Continuous Integration: Improving Software Quality and Reducing Risk and the CTO of Stelligent Incorporated. Stelligent is an agile consultancy that helps large organizations deliver production-ready software everyday though the effective application of Agile practices. Andy Glover, who is Stelligent's president, is a co-author of Continuous Integration. I also author a monthly series for IBM developerWorks called Automation for the people. This is not just something I write about, it's something I do. I work with clients everyday helping them implement practices such as CI and TDD.
Meera. Congratulations on your Jolt Award for the CI book. Tell us more about the book and also what continuous Integration is all about?
Paul. Thanks! It's a great honor to have won the Jolt Award. To think that I'm on the same list of Jolt winners like Steve McConnell and others is quite a thrill. I'm fortunate to have worked with a great team of contributors and editors that helped make it a success.
Continuous Integration is the practice of integrating software whenever a change is applied to a version control repository. If each developer on a team is checking in their code daily, this will lead to many integrations per day. Teams use an automated build to perform these integrations and test the software. As it's typically implemented, from a separate machine, an automated Continuous Integration server, such as Hudson, polls the version control repository looking for changes. Once the CI server detects a source file change, it runs the automated build. When the integration build is complete, the CI server sends feedback to team members - in the form of an email or through other feedback mechanisms. The premise behind CI is that you reduce the time between when a fault is introduced into the code base and when it's detected - and fixed, reducing headaches, costs, and time.
Continuous Integration, the book, is organized around a set of over 40 practices related to Continuous Integration. We have 70 examples related to CI and build practices using different tools and languages. The book covers what's in an automated build like compilation, packaging, database integration, automated tests and inspections, reporting and deployment. We go over the many ways of automating development processes and approaches in increasing the visibility of software defects or potential problems.
Meera. Is there a second version of the book in the making?
Paul. I expect to start working on my next book later this year (my editor doesn't know this yet, so don't tell anybody ;)). It's not a second edition of Continuous Integration, but I do plan to share some new ideas and experiences from the past several years delivering solutions to clients. I've actually got ideas for several books! Ideas aren't my problem, but time is :-)
Meera. Is it correct to say that most companies have been using CI now? If they are not, how do you convince them to do so?
Paul. Based on my experience and some studies I've researched the answer is no, most companies are not practicing CI now. Further, the companies may claim to be using CI because they're running a CI server that polls an SCM repository for every change and then compiles the code. So, as it turns out, many of them are not running full integration builds - to include automated database integration, tests, inspections and deployment, as part of their CI process.
As for convincing them to use CI, if I'm talking to a manager, I usually talk in terms of the cost savings of delivering software faster and learning about problems earlier. Most experienced development teams know the pain caused by "integration hell". However, they won't always know the types of features that can be part of an automated build and CI system -- particularly database integration, more exhaustive testing and automated deployments. In my experience, the best way to convince development teams is to show them what a CI system can produce such as the emails notifying team members of a build failure or a broken deployment. Sometimes it's a matter of setting up a CI server on a developer's workstation and getting them to the see the results of a simple compilation based on code that is being checked into the version control repository.
Meera. Does CI and TDD go hand in hand? Is there any benefit at all in having CI if you don't have any tests at all?
Paul. CI without automated tests is what I like to call "Continuous Compilation". It's a good start, but it should not end there. In my opinion, a team that is running an integration build without automated tests is not doing CI. Also, you won't get the full benefit of CI if you're running simple "happy path" unit tests; you also need to be running component, functional, load and performance, security and so on. Besides TDD, there's more that can be added to CI so that you get true production-ready software, often. As I mentioned before, processes like database integration/upgrades, automated code inspections (coding standard, duplication, complexity, dependencies, code coverage, generating documentation, installing/deploying to a target environment (web container, etc.). However, there needs to be a balance between being better informed and getting quick feedback through faster builds. This is where techniques like parallelization (across machines) and build pipelines (one build after the other) can help.
Meera. What would you like to see the most in the next generation of CI?
Paul. What I'd most like to see improved with the practice of CI is to make it easy for development teams to move toward delivering production-ready software everyday. To me, this means that the software is ready to use in a production environment. To achieve this, we need to add more automated processes to our builds, while receiving rapid feedback.
The other related area I'd like to see change is with build languages. Build languages should be more like other programming languages rather than most of the XML-based languages that are widely used, such as Ant. I'm not sure how it happened, but others decided to copy Ant's approach (think NAnt and MsBuild). In addition to providing programming constructs like looping and conditions, a build language should also provide a dependency-based tasking mechanism (e.g. the depends attribute in an Ant target). Gant and rake provide this type of support. That said, there's also quite a bit that can be provided without scripting anything at all. I'd like to see less custom scripting to get basic features of CI - perhaps a plug-in for things like X10 devices, Ambient Orb, breaking builds based on configurable parameters (such as cyclomatic complexity). It's still a reality today that developers must script most of these features for their build/CI systems.
Meera. For a company which is trying to incorporate CI, with so many CI tools available, what best advice can you give them? How should they choose their CI tool?
Paul. If you're asking about a CI server, it really doesn't matter which tool you start out with as long as you can get up and running with the server quickly. Because of this, these days I like Hudson because I can download the WAR file and have it running in my web container in no time at all and it's easy to manage using its web-based administration interface. But, there are many freely-available servers that will poll your SCM repository, run the automated build and provide feedback. In choosing a tool, you'll want to know which SCM repositories the tool supports, if it supports your particular build tool (e.g. Ant, MSBuild, rake, etc.). The good thing is if you somehow make the wrong choice, you can get most CI servers up and running pretty quickly.
Meera. I read in Javalobby that you have been experimenting with using voice commands to control a build server. Tell us more about the same?
Paul. Yeah, I like to experiment with different approaches in automation. Every time I see something that could be automated, I like to see how far I can take it. For instance, on one of the current projects I'm working on, development teams perform on-demand builds for different environments (such as DEV or QA) - see http://integratebutton.com/blog/2007/12/16/jott-to-build-use-voice-commands-to-build-software/. I thought, what if I'm driving through infamous Washington DC traffic and I don't want to wait until I get into work to kick off the build?
So, I hacked together a solution with several different freely available tools which allow me to execute an on-demand build with a simple voice command. I call an 800 number and say something like "Build QA". I did this, mostly, because I thought it was fun and I figured other people might come up with better approaches to my hack. I've also been experimenting with creating build thresholds to fail the build based on configurable parameters. For instance, breaking the build if the code duplication or cyclomatic complexity gets too high. Or, breaking the build when developers violate an architectural layering rule. I'm a proponent of a proactive build process, so these types of build thresholds can provide more rapid feedback to the right people so that the problem can be fixed sooner.
Meera. Any tips and tricks for our readers to how effectively use CI?
Paul. My IBM developerWorks "Automation for the people" series covers a lot of tools and techniques for improving your build and CI processes. Here are some key practices I recommend development teams follow when using CI:
• Be sure you can create working "demoable" software by typing a single command from the command line
• Every developer commits all of their checked out code at least once a day
• Use a separate machine for running CI builds
• Fix broken builds immediately
• Incorporate database integration, automated tests and inspections, deployment and installation processes into automated builds
• Do not use environment variables in build scripts unless you've exhausted all other options, as it limits the capability of running the same build in multiple environments without manual tweaking
• Be sure that feedback is visible to the entire development team (think X10 devices, big monitors, playing sounds when build breaks and so on)