One important issue that comes up when undertaking a configuration management effort is how to design “the schema” for configuration management data. Obviously there’s no one-size-fits-all answer here. But there are a couple of general and complementary approaches you need to know about if you’re working on this. In this post we’re going to look at those.
Some background: configuration and state management
First, let’s take a 50K view of a fairly generic management architecture for an operational environment:
There are three major components in the diagram above:
- an environment we want to manage
- an infrastructure for managing the environment’s configuration: farm and instance definitions (including cardinality), middleware deployments, app deployments, etc.
- an infrastructure for managing the environment’s state: availability, performance, etc.
Configuration management has a couple of important responsibilities. One is that it has to offer a way to realize desired configurations in the managed environment. For example, it would provide machinery for provisioning, deployment and rollbacks. Another responsibility is to maintain that configuration over time until somebody pushes a new desired configuration through. Both responsibilities are blueprint-driven: a blueprint describes our expectation, and there are mechanisms in place to establish and maintain the configuration against the blueprint.
State management has a similar maintenance responsibility: it has to know what counts as healthy functioning (usually defined by SLAs), and it has to manage state by detecting, diagnosing and remediating excursions. In real life this is usually a combination of automation and manual work. Monitoring is usually automated, whereas it’s pretty common to see tool-assisted manual effort on the diagnosis and remediation side. But automation or no, the job is to maintain healthy state.
Note that configuration management includes an as-is configuration repository, which describes the configuration that’s actually in the environment. Configuration management uses this to find deltas between actual and desired configuration. State management uses it largely for diagnostic purposes, like tracing state issues down through a chain of dependencies.
(A brief aside: In the diagram we have the deployment engine populating the as-is config repo, but that’s only one way to do it, and anyway it’s incomplete. Sometimes there’s an automated discovery process that finds servers and devices in the environment and records them somewhere. Other times, the instance provisioning process installs monitoring agents that phone home to a central server, which effectively becomes an as-is config repo. There can also be security agents on the machine that check files against known checksums and complain into a database when there are changes. These aren’t mutually exclusive, which incidentally can make it hard to get a good read on as-is configuration. But I digress.)
Schema design for configuration management
We said above that configuration management has to establish and maintain desired configurations in the managed environment. This isn’t simply about limiting configuration drift. We want to make it impossible (or at least hard) for wrong configurations to appear in the environment in the first place. For example, we probably never want to see a server farm with three Ubuntu 12.04 instances and one Ubuntu 11.10 instance. But we want to eliminate drift too.
One powerful technique falls out of the blueprint-driven approach. Recall:
On this design, the blueprint states intent, and the deployment engine makes it real. So if we don’t want to see bogus configurations appear in the environment, one approach is to make them impossible to represent with our blueprints in the first place. And if we don’t want to see bogus configurations passing our periodic audits (e.g., agents in the environments just watching over stuff), we can once again adopt the approach of making such configurations impossible to express with our blueprints.
To see how this works, consider the following schema for describing a server farm.
This is a pretty natural way to see the world, and hence a natural way to design a schema. The farm has a bunch of instances. Each instance has a type, an image (OS + burnt-in packages) and some number of add-on packages.
But it’s not the only way to see the world. Consider the following alternative:
On this second schema, the farm has an “instance build”, which is just a definition that combines an instance type, image and set of packages together in a single group. The instances still have types, images and packages, but only indirectly through the farm and build entities.
The second schema is superior for blueprinting. Why? The second schema makes certain unwanted configurations impossible to express, and hence impossible to propagate to the managed environment:
- It’s impossible on the second schema to describe a farm with three Ubuntu 12.04 instances and one Ubuntu 11.10 instance. Any instances in the farm get the farm’s single configuration. The first schema, on the other hand, allows this wrong configuration.
- If you want a catalog of standard (or at least defined) builds, the second schema allows you to enforce its use. Of course, if you don’t want a catalog, then you can just drop the instance build entity from the schema and associate its child entities directly with the farm.
The first schema is superior for representing as-is configuration. In the real world, bad configurations occur (e.g., half-completed deployments, rogue sysadmins, through security vulnerabilities, through human error, etc.). With the blueprinting schema, we’re trying to make it impossible to express bad configurations. So by definition it’s not going to be up to the task of representing the bad configurations that actually occur.
The takeaway is that there’s not just one configuration management schema to design. There are two: a blueprinting schema for expressing desired configuration, and an as-is schema for expressing actual configuration.
We want to embed key domain constraints in the blueprinting schema to the extent that it’s feasible to do so without over-complicating the schema. (In practice we want to identify the high-risk misconfigurations and focus on handling those.) The approach is more abstract: we say things like “it’s really the farm that has an instance build, not the instance”.
In the as-is schema we want to be more unconstrained and concrete so as to represent the wide variety of incorrect configurations that can actually happen.