OpenStack enterprise readiness received a lot of attention back at the OpenStack summit in Vancouver earlier this summer, which is particularly interesting as it shows that OpenStack has reached a certain level of maturity. But does it mean that enterprise IT departments have the tools necessary to be able to operate OpenStack and maintain an installation? Can legacy enterprise applications be moved to the private cloud?
A notable difference between a cloud app and a traditional enterprise business application is that the former is engineered with the dynamic nature of the cloud in mind. It is keeping itself aware of resources added and removed, and will not try to interact with a service that is no longer present. This is different from a typical enterprise business application where configuration is often static and dependent on highly available (HA) services which are unchanging. The reliability of the underlying cloud infrastructure is thus a corner-stone for enabling HA setups.
High Availability Across the Cloud
The word going around at the summit was that it is definitely possible to run a resilient OpenStack cloud, but it is not easy. There is no simple way to set up OpenStack in a high availability configuration. Many decisions have to be made when setting up such a system, starting at the hardware, then going up the stack to the configuration of OpenStack itself.
High availability in OpenStack is actually twofold; on one side, tenant land with network, storage, compute etc, and on the other, the OpenStack control plane through which all cloud resources are controlled. A lot has happened in this domain in the last couple of releases, and much of the required functionality has landed to accomplish a highly available setup on both sides. One of the more important additions is, in my opinion, on the networking side where improvement has been done on the virtual routing service.
Keepalived, which internally uses VRRP, is running on each network node
In older releases it would be difficult to cope with a Neutron network node outage, which would lead to downtime of the virtual routers it was controlling. Consequently, this would leave the virtual machines that it serviced isolated, making it a single point of failure up until the Juno release — which solve the problem by leveraging VRRP. In an HA setup, two networking nodes are typically setup together to provide the virtual routing service. One of the nodes serves as master, and syncs its tcp session state to the other networking node which acts as a backup. In case of an outage of the master, the virtual routing service is taken over by the backup network node.
On the compute node side, live migration of instances is possible, which facilitates node maintenance with minimal interruption. However, making a proper sizing of the compute node resources must be done for this to be feasible. Evacuation of a majority of the machines would otherwise take too long. From an operations perspective, this is definitely a requirement to be able to keep the underlying OS up-to-date. Things are, however, far from where we would like them to be when it comes to tools for doing operations. Upgrading is definitely not a straightforward process, and it is not only about keeping OpenStack up to date, but also about keeping the underlying OS updated, both with urgent security patches and planned updates.
The harder part is the control plane. This is a complicated process, and essentially requires you to have an HA design in order to get minimal downtime of the service APIs. Upgrading involves interrupting the service that is being updated, and an outage will occur in a non-HA setting. As it stands now, the tooling around these tasks is not good. There is no simple “upgrade” button, and there are many tools that need to be mastered in order to make an upgrade happen. Without a devops approach leveraging a configuration management layer, it is practically impossible.
The problematics around upgrades were addressed by several sessions at the summit. Every single one I attended underlined how important planning (including a rollback plan, in case of problems) and automation are in this process. There is definitely shortcomings in this area, but judging from the amount of attention and discussion it had at the summit, I think we can see a lot of improvement over the next releases.
As for the hardware, component selection is important and can bring trouble if the wrong choices are made. This is not to say that every server has to be replaced to run OpenStack. It runs basically wherever linux runs, but making sure that the kernel modules are mature for all the server components is always a good idea. Obviously, the choice of certain components matter more than others. The choice of NIC can make a big impact when deploying to production, much more so than which CPU is chosen. Hardware vendors have started to release reference architectures for OpenStack which come with tested hardware configurations. So if starting from scratch, this can be of great interest as a kickstart.
Lowering the Bar
A project which is making OpenStack even more palatable for the enterprise is Manila. Many sessions were focusing on the shared filesystem, and we also had questions around support for Manila at the booth back at the summit.
Manila allows applications running in OpenStack to rely on a cloud service to supply file shares. Such needs would earlier have had to be managed by compute nodes that exported shares, forcing the application developer or ops engineers to handle maintenance as well as failure scenarios of the share server. Manila removes these concerns, and provides a fileshare service that is a first class OpenStack service, just like Cinder for block devices.
This also makes scaling an application that depends not only on an object store or database, but on file assets as well, easier. Just fire up more instances and mount the shared filesystem exposed by Manila. This is perhaps even more interesting for legacy applications which don’t use an object store at all, and could be a differentiating factor when moving to the cloud.
As Manila is not yet ready for release, there is a lack of documentation and many important blueprints are still being worked on. Most notable is the lack of mount automation which was one of the summit design session topics. There are many ways to tackle this problem, and with the proliferation of containers in the OpenStack ecosystem, there not just one “right” way of doing this. The “container way” will most likely involve the share to be already mounted on the host, and then bind-mounted before the pivoting into the container filesystem. This is quite different from launching a VM which will need to discover and mount the share by itself on boot. Several techniques have been discussed around this, one being the introduction of an agent that discovers and mounts the shares on boot.
Another interesting point discussed at the design summit was whether consistency groups could make sense for Manila. An obvious use case for cloud applications is hard to find. The classic case for this type of functionality is when you have a database spreading its data across volumes, and you need to create a consistent backup of all the data. This is probably a feature that enterprise grade applications have more use for, and could be an enabler for such applications to make it to the cloud.
With higher level cloud services like Manila, the bar for moving enterprise applications to the cloud is lowered. Fewer VMs need to be provisioned for auxiliary services, which allow IT staff to focus less on looking after specific service VMs.