A Maturity Model for Build Automation
How does your engineering organization build and deliver products to its customers? Similar to the well-known capability maturity model, the maturity level of a build automation system falls into one of the following: chaotic, repeatable, defined, managed, or optimized.
Let’s take a look at the differences in these levels for a popular project, PhantomJS. At the start of the project, it was tricky to build PhantomJS unless you were a seasoned C++ developer. But over time, more things were automated, and eventually engineers without C++ backgrounds could run the build as well. At some point, a Vagrant-based setup was introduced and building deployable binaries became trivial. The virtualized build workflow is both predictable and consistent.
The first level, chaotic, is familiar to all new hires in a growing organization. You arrive in the new office and on that first day, an IT guy hands you a laptop. Now it is up to you to figure out all the necessary bits and pieces to start becoming productive. Commonly it takes several days to set up your environment – that’s several days before you can get any work done. Of course, it is still a disaster if the build process itself can be initiated in one step.
This process is painful and eventually someone will step up and write documentation on how to do it. Sometimes it is a grassroots, organic activity in the best interest of all. Effectively, this makes the process much more repeatable; the chance of going down the wrong path is reduced.
Just like any other type of document, build setup documentation can be out of sync without people realizing it. A new module may be added last week, which suddenly implies a new dependency. An important configuration file has changed and therefore simply following the outdated wiki leads to a mysterious failure.
To overcome this problem, consistency must be mandated by a defined process. In many cases, this is as simple as a standardized bootstrap script which resolves and pulls the dependency automatically (
npm install, etc). Any differences in the initial environment are normalized by that script. You do not need to remember all the yum-based setup when running CentOS and recall the apt-get counterparts when handling an Ubuntu box. At this level, product delivery via continuous deployment and integration becomes possible. No human interaction is necessary to prepare the build machine to start producing the artifacts.
In this wonderful and heterogenous environment it is unfortunately challenging to track delivery consistency. Upgrading the OS can trigger a completely different build. A test which fails on a RHEL-based is not reproducible on the engineer’s MacBook. We are lucky that virtualization (VirtualBox) or containment (Docker) can be leveraged to ensure a managed build environment. There is no need to install, provision, and babysit a virtualized build machine (even on Windows, thanks to PowerShell and especially Chocolatey). Anyone in the world can get a brand-new computer running a fresh operating system, get the bootstrap script, and start kicking the build right away.
There are two more benefits of this managed automation level. Firstly, a multi-platform application is easier to build since the process of creating and provisioning the virtual machine happens automatically. Secondly, it enables every engineer to check the testing/staging environment in isolation, i.e. without changing their own development setup. Point of fact, tools like Vagrant are quickly becoming popular because they give engineers and devops such power.
The last level is the continuous optimizing state. As the name implies, this step refers to ongoing workflow refinements. For example, this could involve speeding up the overall build process which is pretty important in a large software project. Other types of polishes concern the environment itself, whether creating the virtual machine from the .ISO image (Packer) or distributing the build jobs to a cloud-based array of EC2/GCE boxes.
My experience with automated build refinement may be described like this:
- Chaotic: hunt the build dependencies by hand
- Repeatable: follow the step-by-step instructions from a wiki page
- Defined: have the environment differences normalized by a bootstrapping script
- Managed: use Vagrant to create and provision a consistent virtualized environment
- Optimizing: let Packer prepare the VM from scratch
How is your personal experience through these different maturity levels?
Note: Special thanks to Ann Robson for reviewing and proofreading the draft of this blog post.