It’s an immutable fact of life for any IT manager: Sometime, somewhere, somehow, something will go wrong. It doesn’t matter whether you or anyone else was responsible. All that matters is you do everything reasonable to prevent failures and be ready to respond quickly when prevention is no longer an option.
Since the first data center opened, managers have relied on one single technique more than any other to avoid system crashes: overprovisioning. The only way to accommodate unpredictable spikes in demand is to build in a cushion that provides the processor power, storage space, and network bandwidth required at peak demand. Actual storage capacity usage may be as low as 33 percent in some organizations, according to Storage Bits’ Robin Harris.
Striking the optimal balance when provisioning VMs is influenced by the natural tendency to respond to what TechTarget’s Stephen J. Bigelow calls “inadvertent resource starvation” by overcompensating. As you might expect, this is the exact wrong way to react to resource optimization. The appropriate response to VM slowdowns is to test workloads continuously to calculate resource levels, both before the workloads are deployed and continually thereafter.
Breaking VM Provisioning Into Its Constituent Parts
The natural starting point for any provisioning strategy is processors. Whenever you create more vCPUs for allocation to VMs, each vCPU has to be scheduled, so it waits for a physical CPU before it can process instructions and data from VMs. Ready times can reach 20 percent as vCPUs are queued until processors are available.
Two ways to give VMs more ready access to physical CPUs are by increasing the CPU shares priority, and by setting CPU reservations for the VM. Workload balancing lets you reduce the number of vCPUs on a server by moving slow-running VMs to servers with more available resources.
Likewise, when you allocate more memory to a VM than it and its applications need, there’s no easy way for the hypervisor to recoup that lost memory. To avoid excessive disk swapping, the hypervisor may use memory ballooning or other aggressive memory-reclamation methods to recover idle memory.
The temptation is to compensate by overprovisioning memory to the VM. To prevent this, analyze logical unit number (LUN) volumes assigned to VMs to determine capacity optimization. With thin provisioning, the actual physical disk capacity could be a fraction of the specified logical volume size. You’ll save money by thin provisioning a 100GB LUN with just 10GB allocated, for example, and then add physical memory subsequently as the physical volume fills up.
With thin provisioning, a VM configured with 40GB of storage will have only 20GB of that total allocated from the underlying VMFS volume. Source: GOVMLab
Containers Can Make the Overprovisioning Problem Even Worse
Traditional virtualization models place a hypervisor atop the main OS, where it supports multiple “guest” OSes, each with their own app instances. By contrast, containers allow more efficient virtualization: the Docker Engine runs on the host OS, and virtualized apps run in their own instances above the host. Docker’s simplified architecture allows more containers to fit on the same server, and it lets containers be spun up in microseconds rather than minutes.
But there’s a price to pay. Containers create more virtual servers, which are spun up and down in an instant. This draws more power, which generates more heat. More heat means the load on cooling systems is increased. The result could be flash “heat floods” and overprovisioning of infrastructure. IT managers need to be more aware of server loads and performance hiccups in container-centric environments.
How the Cloud Turns Overprovisioning on Its Head
In organizations of all types and sizes, the same discussion is taking place: “Which data resources do we keep in-house, and which do we relocate to the cloud?” The question belies the complexity involved in managing today’s multi-cloud and hybrid-cloud environments. TechTarget’s Alan R. Earls points out how provisioning cloud services turns the traditional resource-allocation model “on its head.”
For in-house systems, the danger is sizing capacity too low, causing apps to crash for lack of processor power or storage space. In the cloud, the danger is sizing capacity too high, which leads to overpayments and cancels out a primary reason for moving to the cloud in the first place: efficiency. One problem organizations often fail to consider is that legacy apps won’t take advantage of cloud elasticity unless the application load is uniform, so running the app in the cloud may be more expensive than keeping it in-house.
An application that scales only vertically within a server or instance as its load changes will leave managers with only one option: reboot the app onto a larger or smaller instance. The resulting interruption incurs its own costs, increased user dissatisfaction in particular. Overprovisioning is often an attempt to avoid having to reboot these apps.
By contrast, apps that scale horizontally allow more than one instance to be deployed to accommodate the change in load demand. Gartner researcher J. Craig Lowery points out that horizontal scaling allows IT to “identify the optimum building block instance size, with cost being a part of that assessment, and change the number of instances running as the load varies."
In Search of the Auto-Scaling Ideal
The Holy Grail of cloud provisioning is auto-scaling, a concept that is central to cloud-native software design. There is no shortage of instrumentation and diagnostic tools that afford deep dives into cloud utilization. The challenge for DevOps teams is to apply the knowledge they gain from such tools into a strategy to redesign or reconfigure apps so they support auto-scaling and other cloud cost-saving features.
An example of an auto-scaling group in AWS ensures at least one instance is always available, the optimal instance capacity is available most of the time, and a max capacity is available to accommodate the worst-case scenario. Source: Auth0