Build Agent: Template vs Provisioning
For an automated build system, a typical configuration involves the separation between the build server and the build agents (some systems call it master-slave or coordinator-runner). Such a configuration allows any addition or removal of new build agents, perhaps to improve the build performance, without causing too much disruption. When it is time to spawn a new build agent, there are at least two possible techniques: recreate it from the template or provision it from scratch.
Except for various corner cases, build agents nowadays often run in a virtualized environment. This makes it easy to install, upgrade, and manage the agent. An important benefit of having it virtualized is the ability to take the snapshot of the state of the build agent. When there is a problem, it is possible to revert it to the last good known snapshot. In addition, that snapshot could serve as the agent template. If there is a need to have more build agents, maybe because the build jobs are getting larger and larger, then a new agent can be created by cloning it from the template.
With today’s technologies, a template-based build agent is not difficult to handle. Vagrant permits a simplified workflow for managing virtual machines with VirtualBox, VMware, etc. Continuous integration system like TeamCity and Bamboo has a built-in support for Amazon EC2, a new instance from a specified AMI can be started and stopped automatically. And of course, running a new Linux system in a container is a child’s play with Docker.
This template-based approach, while convenient, has a major drawback. If the software to be built has an updated set of dependencies (patched libraries, different compiler, new OS), then all the build agents become outdated. It is of course enough to create a fresh agent from scratch based on the new dependencies and spawning a bunch of new agents from this template. Yet, this process is often not automated and error-prone, an accident waiting to happen.
In the previous blog post A Maturity Model for Build Automation, I did already outline the loose mapping of capability maturity model to the state of common automated build system. With this, it is easy to see how we can level up the above template-based approach. Instead of relying on a predefined configuration, a build agent should be able to create a working environment for the build from a provisioning script. The litmus test is rather simple: given a fresh virtual machine, the build agent must figure out all the dependencies, find out what’s missing and solve it, and then be in a state where it is ready to take any build job.
Again, today’s technologies make such those provisioning actions as easy as 1-2-3. We already have all kinds of powerful configuration management tools (CFEngine, Chef, Puppet, Ansible, etc). In many cases, relying on the OS package managers (apt-get, rpm, yum, Chocolatey, etc) or even framework-specific packaging solution (pip, npm, gem, etc) is more than enough. There is hardly any excuse not to adopt this provisioning solution.
Last but not least, it’s possible to combine these two to form a hybrid, optimized approach. Given a plain vanilla machine, the provisioning script can always upgrade it to the intended state. That should also still hold even if the machine is already polluted with some old dependencies. This opens an opportunity for doing both. In the context of Docker, it means that the base image needs to be refreshed with all the dependencies, e.g. different compiler and system libraries. At this point, the existing agents can still continue to function, perhaps installing missing stuff as necessary. However, once the base image is fully upgraded, the agent container can be rebuilt and it will bypass any redundant installation.
Care to share which approach do you use/experience/prefer?