I recently found myself intrigued by an article by Jon William Toigo on Tech Target titled - Software-defined infrastructure or how storage becomes software. In his article, Toigo poses the question: Could a software-defined infrastructure, with software-based controls and policies, be the answer to managing and allocating storage? While I am sure Jon would agree we're all tired of the buzz words around "software-defined-anything", the fact of the matter is that we all use it anyway for lack of a better description of what's occurring today.
Storage specifically is one of those areas I always say is like organizing your room: everyone has their own way of doing it. You want your dresser to be a certain height. You want you mattress pointing a certain direction and the light shining through your window on your desk at just the right time of day. The truth is, storage is the under-pinning of virtualization that everyone wants to architect and manage the way they want to, and no one is going to tell them otherwise... UNTIL - wait for it - ...software can manage this FOR people.
Where is the robot (I wish the Roomba folks would hurry up with this, but I digress;) that keeps your room exactly the way you want it to be? The industry seems constantly to toss this idea around, but no one really seems to know where to find the solution. The truth is that no matter what quirks people have with their storage, there is one goal everyone has in common: Ensuring applications have ability to consume the storage resources they need while preserving priority and business logic; as well as increasing efficiency without introducing risk. This is the promise of every storage vendor on the planet trying to sell their latest auto-tiering, de-dupe, compression solution and all the wonderful bling to trick out their man cave.
The problem, however, is that virtualization obscures the lines with storage and it becomes intractably complex to manage the macro-level supply and demand of resources occurring across the stack. Traditional management tools - storage- and virtualization-related - are running into the limitations of their stats-based, linear approach to managing diverse environments.
As an illustration, let's use a real-life example. Let's assume we have a NetApp environment supporting VMWare. The NetApp consist of 4 Aggregates spread across two filers.
- Aggr 1 SATA and Aggr 2 SATA are on Controller A, and each Aggr comprises of 2 Volumes in VMWare
- Aggr 3 SAS and Aggr 4 SAS are on Controller B, and each Aggr comprises of 2 Volumes in VMWare
Aggr1 sees a spike in IOPS driven by 3 virtual machines demanding IOPS on its 2 Volumes. This results in Aggr1 utilizing 93% of its available IOPS capacity due to several high consumers, or "Bully" VMs (to use a traditional storage vendor's language). To compound the issue, the high utilization on disk has now manifested itself up the stack and begins to impact the ability of other workloads to access the storage resources they require on Aggr1.
In this example, a traditional software management system for the storage platform will alert an administrator that the Aggr1 has exceeded a tolerable utilization on IOPS and that it is time for an administrator to act. Similarly, the virtualization vendor (in this case VMWare), will generate alarms related to the virtualized components layered on top of the storage platform.
The administrator must then siphon through the charts and graphs in their storage vendor's tool, or their virtualization management system, with the end goal being some sort of resource allocation decision to intelligently allocate storage resources to the applications that need them while avoiding quality of service disruption at the expense of low-priority applications.
More likely the administrator needs to access both of these interfaces to try and accomplish this. In this example, the resource decision might be to move the volume to separate aggregate on Controller A (when it reality this won't do much due to performance constraints underneath), move the volume itself to faster disk associated with a separate storage controller, or move the virtual machine to a volume hosted on Controller B.
Now the second part of this equation (and arguably more difficult to get right) is: How does the administrator ensure that the domino they decide to push over doesn't create another resource constraint within the environment? Fundamentally, traditional storage vendor software offerings and virtualization management tools are incapable of understanding the impact and outcome of any prospective resolution because they simply do not analyze the interdependencies of this decision across both (virtualized) compute and storage components. The best case for operations is a head start on the troubleshooting process after quality of service has already been impacted or is in the process of being degraded.
Are Things Even at Human Scale Anymore?
In order to truly accomplish a software-defined storage system, there needs to be a new type of management system capable of connecting these two obscure worlds for the purpose of intelligent decision making and resource allocation.
Toigo paints this gap perfectly when he states, "Our storage needs to be managed and allocated by intelligent humans, with software-based controls and policies serving as a more efficient extension of our ability to translate business needs into automation support."
Following this logic, this new system must go above and beyond looking at application issues in isolation to determine how to properly allocate the infrastructure's entire supply of finite storage resources to every virtualized workload and application - at scale. Inevitably, this means looking across all application resource demands concurrently and then determining how to service each application's request for the best cost/benefit to the overall platform by allocating the supply of storage resource intelligently and in the most efficient way. Ideally, this will be done prescriptively - before quality of service is degraded.
The second phase of this brave new world will involve incorporating business logic that allows the software-driven control plane to consider business constraints alongside of capacity and performance metrics in real time. If tier 1 applications need to have priority for faster disk over low-priority applications, then the system should be set it and forget it. If tier 3 applications must be confined to bronze or slow storage, then the constraint should carry over dynamically for any workload matching this criterion that is provisioned across the lifecycle of the environment.
If 20% overhead needs to be maintained across tier-1 storage resources, then software should be intelligent enough to control utilization below this level, instead of notifying administrators once they have crossed it and forcing them to bring the infrastructure back from the brink.
The reality is that everyone has their own idea how to best trick out their room - in this case, their precious storage. Administrators will never truly be comfortable with putting their storage architecture on auto-pilot until they can rest assured that their policies are maintained while assuring application performance. Any system developed to tackle this brave new world, must be able to solve both of these goals simultaneously - a challenge that Toigo argues is beyond human capacity to do so at scale.