In June of 2012, we announced that we would be adding Infrastructure as a Service (IaaS) features to the Windows Azure Platform. While many believe that Platform as a Service (PaaS) is still the ultimate “sweet spot” with regards to cost/benefit ratios, the reality is that PaaS adoption is… well… challenging. After 25+ years of buying, installing, configuring, and maintaining hardware, nearly everyone in the IT industry tends to think of terms of servers, both physical and virtual. So the idea of having applications and data float around within a datacenter and not tied to specific locations is just alien for many. This created a barrier to the adoption of PaaS, a barrier that we are hoping our IaaS services will help bridge (not sure about “bridging barriers” as a metaphor since I always visualize barriers as those concrete fence things on the side of highway construction sites for but we’ll just go with it).
Unfortunately, there’s still a lot of confusion about what our IaaS solution is and how to work with this. Over the last few months, I’ve run into this several times with partners so I wanted to pull together some of my learnings into a single blog post. As much for my own personal reference as for me to be able to easily share it with all of you.
So I’d like to start by explaining a few terms as they are used within the Windows Azure Platform…
Cloud Service – This is a collection of virtual machines (either PaaS role instances or IaaS virtual machines) representing an isolation boundary that contains computational workloads. A Cloud Service can contain either PaaS compute instances, or IaaS Virtual Machines, but not both.
Availability Set – For PaaS solutions, the Windows Azure Fabric already knows to distribute the same workload across different physical hardware within the datacenter. But for IaaS, I need to tell it to do this with the specific virtual machines I’m creating. We do this by placing the virtual machines into an availability set.
Virtual Network – Because addressability to the PaaS or IaaS instances within Cloud Services is limited to only those ports that you declare (by configuring endpoints), it’s sometimes helpful to have a way to create bridges between those boundaries or even between them and on-premises networks. This is where Windows Azure Virtual Networks come into play.
The reason these items are important is that in Windows Azure you’re going to use them to define your solution. Each piece represents a way to group, or arrange resources and how they can be addressed.
You control the infrastructure, mostly…
Platform as a Service, or PaaS, handles a lot for you (no surprise as that’s part of the value proposition). But in Infrastructure as a Service, IaaS, you take on some of that responsibility. The problem is that we are used to taking care of traditional datacenter deployments and either a) don’t understand what IaaS still does for us and b) just aren’t sure how this cloud stuff is supposed to be used. So we, through no fault of our own try to do things the way we always have. And who could really blame us?
So let’s start with what Windows Azure IaaS still does for you. It obviously handles the physical hardware and hypervisor management. This includes provisioning the locations for our Virtual Machines, getting them deployed, and of course moving them around the data center in the case of a hardware failure or host OS (the server that’s hosting our virtual machine) upgrades. The Azure Fabric, our secret sauce as it were, also controls basic datacenter firewall configuration (what ports are exposed to the internet), load balancing, and addressability/visibility isolation (that Cloud Service thing I was talking about). This covers everything right up to the virtual machine itself. But that’s not where it stops. To help secure Windows Azure, we control how all the virtual machines talk to our network. This means that the Azure Fabric also has control of the virtual NIC that is installed into your VM’s!
Now the reason this is important is that there are some things you’d normally try to do if you were creating a network in a traditional datacenter. Like possibly providing fixed IP’s to the servers so you can easily do name resolution. Fixed IPs in a cloud environment is generally a bad idea. Especially so if that cloud is built on the concept of having the flexibility to move stuff around the datacenter for you if it needs too. And if this happens in Windows Azure, it’s pretty much assured that the virtual NIC will get torn down and rebuilt and in the process lose any customizations you made to it. This is also a frequent cause for folks losing the ability to connect to their VMs (something that’s usually fixable by re-sizing/kicking the VM via the management portal). It also highlights one key, but not often thought of feature that Windows Azure provides for you, server name resolution.
Virtual Machine Name Resolution
The link I just dropped does a pretty good job of explaining what’s available to you with Windows Azure. You can either let Windows Azure do it for you and leverage the names you provided for the virtual machines when you created them, or you can use Virtual Networking to bring your own DNS. Both work well, so it’s really a matter of selecting the right option. The primary constraint is that the Windows Azure provided name resolution will only work for virtual machines (be they IaaS machines or PaaS role instances) hosted in Windows Azure. If you need to provide name resolution between cloud and on-premises, you’re going to want to likely use your own DNS server.
The key here again is to not hardcode IP address. Pick the appropriate solution and let it do the work for you.
Load Balanced Servers
The next big task is how to load balance virtual machines in IaaS. For the most part, this isn’t really any different than how you’d do it for PaaS Cloud Services, create the VM, and “attach” it to an existing Virtual Machine (this places both virtual machines within the same cloud service). Then, as long as both machines are watching the same ports, the traffic will be balanced between the two by the Windows Azure Fabric.
If you’re using the portal to create the VM, you’ll need to make sure you use the “create from gallery” option and not quick create. Then as you progress through the wizard, you’ll hit the step where it asks you if you want to join the new virtual machine to an existing virtual machine or leave it as standalone.
Now once they are both part of the same cloud service, we simply edit the available endpoints. In the management portal, you’ll select a Virtual Machine, and either add or edit the endpoint using the tools menu across the bottom. Then you set the endpoint attributes manually (if it’s a new endpoint that’s not already load balanced), or choose to load balance it with a previously defined endpoint. Easy-peasy. J
Now that we have load balanced endpoints, the next step is to make sure that if one of our load balanced virtual machines goes offline (say a host OS upgrade or hardware failure), that the service doesn’t become entirely unavailable. In Windows Azure Cloud Services, the Fabric would automatically distribute the running instances across multiple fault domains. To put it simply, fault domains try to help ensure that workloads are spread across multiple pieces of hardware, this way if there is a hardware failure on a ‘rack’, it won’t take down both machines. When working with IaaS, we still have this as an option but we need to tell the Azure Fabric that we want to take advantage of this by placing our virtual machines into an Availability Set so the Azure Fabric knows it should distribute them.
You configure a virtual machine that’s already deployed to join it to an Availability Set, or we can assign a new one to a set when we create/deploy it (providing we’re not using Quick Create which you hopefully aren’t anyways because you can’t place a quick create VM into an existing cloud service). Both options work equally well and we can create multiple Availability Sets within a Cloud Service.
So you might ask, this is all find and dandy if the virtual machines are deployed as part of a single cloud service. But I can’t combine PaaS and IaaS into a single cloud service, and I also can’t do direct machine addressing if the machine I’m connecting to exists in another cloud service, or even on-premises. So how do I fix that? The answer is Windows Azure Virtual Networks.
In Windows Azure, the Cloud Service is an isolation boundary, fronted by a gatekeeper layer that serves as a combination load balancer and NAT. The items inside the cloud service can address each other directly and any communication that comes in from outside of the cloud service boundary has to come through the gatekeeper. Think of the cloud service as a private network branch. This is good because it provides a certain level of network security, but bad in that we now have challenges if we’re trying to communication across the boundary.
Virtual Network allows you to join resources across cloud service boundaries, or by leveraging an on-premises VPN gateway to join cloud services and on-premises services. Acting as a bridge across the isolation boundaries and enabling direct addressability (providing there’s appropriate domain resolution) without the need to publically expose the individual servers/instances to the internet.
Bringing it all together
So if we bring this all together, we now have a way to create complex solutions that mix and match different compute resources (we cannot currently join things like Service Bus, Azure Storage, etc… via Virtual Network). One such example might be the following diagram…
A single Windows Azure Virtual Network that combines an on-premises server, a PaaS Cloud Service, and both singular and load balanced virtual machines. Now I can’t really speculate on where this could go next, but I think we have a fairly solid starting point for some exciting scenarios. And if we do for IaaS what we’ve done for the PaaS offering over the last few years… continuing to improve the tooling, expanding the feature set, and generally just make things more awesome, I think there’s a very bright future here.
But enough chest thumping/flag waving. Like many topics here, I created this to help me better understand these capabilities and hopefully some of you may benefit from it as well. If not, I’ll at least share with you a few links I found handy: