Component Deployment: Why Virtualization Changes your Assumptions
The lifecycle of software involves many groups as well as the software development team and it's often part of the role of the software architects to own these relationships. One of the teams they (should) have frequent contact with is the systems team who look after the physical hardware. Even if you've moved to a completely cloud based architecture you need to work with the provider.
Until recently you could be pretty sure that there was an almost one-to-one mapping between your deployment diagrams and your network and systems diagrams. For example, your deployment node called "web server" would sit on a physical server called something like webserv-01 (with maybe a few other numbers for load balancing) and your database would sit on a physical server called 'database' replicated to database-dr. The network diagram would contain a few extra boxes for items like routers but otherwise the structure was almost identical.
These days if we hand a list of required nodes to a systems team they will still return us a list of machines we can deploy onto but there is a good chance that we now have a set of virtual machines (and virtual routers/connections). This is generally a very good thing, as we'll probably get the resources very quickly and be able to modify the capabilities at a later date, but virtualisation has changed the assumptions we can make about the physical deployment.
Let me give you a simple example. (Note that these diagrams are just slightly modified examples taken from http://www.gliffy.com ). This is a very simple setup with some very basic load balancing/HA.
We hand this over to the systems team who hand us back something that looks like this:
We have a pretty much one-to-one mapping between deployment nodes and 'machines'. They've helpfully included the intended IP addresses so we can write our deployment scripts. Pretty easy huh?
Most systems teams these days don't give access to an OS sitting directly on hardware as it's difficult to monitor, audit and backup. Virtualisation is often used as standard. This not only allows underused physical hardware to be shared but means that virtual machine can be shifted if they need more resources, fine grained control over resource allocation etc etc. The benefits are immense.
Therefore you may be allocated instances on shared hardware - after all they can easily shift it to more powerful hardware if required. So how might the physical hardware actually look:
I'm being slightly facetious here (this is an unlikely rack setup) but the point is that you have no idea how many pieces of physical hardware your virtual machines are on. Virtualization vendors like VMWare can not only give you virtual machines but also virtual firewalls, switches etc. You can easily squeeze a couple of web servers, application servers, DB, virtual firewall and switches onto a single high end server. Externally, however, they will look like individual servers and infrastructure.
Is this an issue? Well we've designed redundant components and we have redundant virtual machines but we probably assumed redundant physical machines as well. A hardware failure could take down everything at the same time - much to our surprise.
How likely is it that our allocated virtual machines could be like this? Probably more than you'd think. The systems team may be supporting hundreds of virtual instances across dozens of application groups. The allocation of virtual machines may be automated without any sanity checking for physical implications. The systems team may just allocated on the next available piece of hardware so the entire application infrastructure may be on one or two machines.
In this case we'd probably just request that no vertical layer is deployed on a single piece of physical hardware but we'd need a much deeper analysis on a more complex system. I think it's best to assume that a systems team *will* deploy on a single item of hardware and then work out the minimum restraints for physical deployment requirements. I'd suggest working closely with the systems team to understand the actual deployment.
I'd love to hear your approaches to this issue!