Decentralized DevOps: Master-less Puppet and supply_drop
Join the DZone community and get the full member experience.
Join For FreeHere at Braintree, we are big fans of Puppet for system administration. In our ever-changing infrastructure, Puppet allows us to quickly provision and re-provision servers in multiple environments. We are also big fans of keeping our infrastructure lean and simple. Each piece of infrastructure that we must maintain comes with a cost. We have taken advantage of an under-appreciated feature of Puppet that allows us to manage our servers in a completely decentralized manner.
Benefits of going master-less
- Fine-grained control We pride ourselves on our ability to keep our site up. By using Puppet without a master, we have tight control over how configuration is applied to a server.
- Parallelization With a centralized Puppet server, the server maintains a single canonical version of the Puppet configuration. This causes contention when multiple people are attempting to make changes at the same time. Without a master, the source control repository maintains the canonical version, so people can easily work in parallel as long as they are not trying to Puppet the same server.
-
No single point of failure Running a Puppet master is yet another service to maintain and make highly available.
Having one less service to apply our typical HA rigor is a big win.
The nuts and bolts
In order to facilitate our master-less setup, we wrote a small gem called supply_drop. It's a set of capsitrano tasks that let you provision servers with Puppet. It tries to be small, simple, and stay out of your way. There are two tasks that do the bulk of the work. cap puppet:noop and cap puppet:apply. The noop will show you the set of changes that are about to be applied. As you could guess, the apply task makes those changes on the box. supply_drop uses rsync to push the current Puppet configuration from your box out to the server making it very fast.
We use a setup similar to cap multistage to manage the scope of what boxes we apply changes to. We dynamically create tasks for each server and environment that we have. Here is an example of what that looks like
def tasks_for_datacenter(datacenter, servers) task datacenter do role :server, *servers end servers.each do |server| task server do role :server, server end end end tasks_for_datacenter :sandbox, %w(app1.sand db.sand) tasks_for_datacenter :production, %w(app1.prod app2.prod db.prod)
These tasks allow us to apply changes to a single server or the entire datacenter. We can also use shell expansions to easily make changes to a subset of the servers in a given environment. Some examples:
- cap app1.prod puppet:noop shows the changes on app1
- cap sandbox puppet:apply applies the current changes to all of Sandbox
- cap app{1,2}.prod puppet:apply applies the changes to app1.prod and app2.prod
The workflow
We are always looking for ways to improve our workflow, but are generally happy with the one we have now. Our goals are:
- Easily allow multiple people to make changes to an environment
- Have explicit promotion from QA to Sandbox to Production
- Store every change in source control without making the logs overly noisy
We use a separate git branch for each environment (QA -> master, Sandbox -> sandbox, Production -> production). This allows us to promote changes by merging branches. Furthermore, if we need to do a quick fix in Production, we can use that branch and not affect Sandbox or QA until we merge. I'll walk through our workflow by adding a new Nagios alert for Apache as an example.
Write the Nagios script and push it out to a web server in QA. Repeat until you are happy with it.
cap web01.qa puppet:noop cap web01.qa puppet:apply
Push the script out to all the web servers in the datacenter
cap web0{1..n}.qa puppet:noop cap web0{1..n}.qa puppet:apply
Change the central Nagios server's configuration to know about the new alert
cap monitor.qa puppet:noop <span class="c"># Some scary changes</span>
Oh snap. There is a diff we don't know about. Someone else is adding a Nagios alert. No worries, they've already checked in. We grab their changes and try again.
git stash git pull origin master git stash pop cap monitor.qa puppet:noop cap monitor.qa puppet:apply
The alert works. Now we noop the entire environment for changes, and commit our change. If our noop shows that we are going to be removing some other changes in addition to our own, we talk to those devs and let them know what we did and that they will need to pull from git.
git commit -am "new nagios check"
Now we want to push the change to Sandbox. Since we are using git we can either merge all the changes from master or cherry-pick our single commit if there is other stuff that is not ready.
git checkout sandbox git merge master
Then apply the changes to Sandbox. We can do this on a server-by-server basis or in one fell swoop.
cap sandbox puppet:noop cap sandbox puppet:apply
Repeat for Production. Declare victory.
In conclusion
supply_drop and Puppet allow us fine grained control of how we
apply changes to any server in any one of our datacenters. Pairing it with git and a decent workflow gives you auditable
and repeatable configuration management.
Source: http://www.braintreepayments.com/devblog/decentralize-your-devops-with-masterless-puppet-and-supply-drop
Opinions expressed by DZone contributors are their own.
Comments