At some point in your life, you've bumped into the phrase, "past performance is not always indicative of future results." It's in the prospectus for every single financial analysis paper or stock advice. Not since the Mayans or Nostradamus has there been such a drive in our industry to seek out predictive systems. Predictive systems may appear prophetic, but there is a better way to solve these performance problems.
Many predictive analysis tools are widely used in areas such as estimating what products you buy (i.e. Amazon, Target online stores) or personal network mapping (i.e. LinkedIn, Facebook, Twitter). Successful tools like those employed by Amazon and LinkedIn have algorithms that can reasonably match behavior to predict the potential for additional future behavior for a few reasons.
Successful Behavioral Algorithms
Let's look at buying patterns to see some examples. If you bought The Power of Habit by Charles Duhigg, you would probably enjoy Smarter, Faster, Better, which is also by the same author and in the same genre. If you bought a book about road cycling and also bought a book about marathons and a book about swimming, there is a reasonable chance that you will be a good target for books about triathlons. You can see how the patterns for you alone are measurable, and increase in accuracy as you use adjacent patterns from other buyers who bought things together.
The LinkedIn PYMK (People You May Know) feature is driven by another algorithm that looks at people you know already from details such as what school you were at during certain years and which companies you worked at for particular periods. If you make a connection with someone from that group, you may also know someone they are already connected with. There are statistical results that can be pulled from this, and for the first large number of connections in your personal networks, you will be both matching these patterns and creating new ones as you expand further with your connections.
Successful behavioral algorithms also depend greatly on the size of the systems from which they draw on for input. If LinkedIn only had 100,000 people in the system from various cities around the world and various age ranges, the statistical chance of finding connections outside of the direct ones you seek out becomes slim. Pattern-based algorithms require patterns. Makes sense, right?
While you may happen to develop some patterns in small numbers, it's dangerous to make predictions based on them. This was the reason for the study by Daniel Kahnemann and Amos Tversky, which we know as the Law of Small Numbers.
Application Performance and the Failed Approach of Predictive Systems
The other day, I walked by a park that had the sprinkler systems jetting water across the green open area. Makes sense that at 8 AM on a warm day that watering the lawn would be beneficial. Most would agree with that prediction. Doing so at 3 PM would be a poor choice because there is a chance that there would be people who would not enjoy being sprayed, so that's a reasonable variable to account for. The problem with the scenario of the sprinkler is that it was bucketing down rain at the very same time it was watering the lawn.
Most will correctly argue that this is not a predictive system. It is based on a schedule and along with that, an assumption that it would not be raining. What this scenario does set up for us is that predictive products that target systems performance must account for real-time changes within the environment.
The proof is in the fact that predictive systems in virtualization and cloud must be paired with reactive systems. If predictive systems were so successful, it would eliminate the need to have systems take action that wasn't predicted. The truth of the matter is that predictive algorithms in application performance are so deeply entrenched in assumptions that they don't even attempt to look beyond the very near future in attempting to predict change. The hope of these systems is to be statistically right often enough to be able to seen as true.
Unless your applications and infrastructure have very distinct, cyclic patterns that are repeatedly followed day after day, at precisely the same time, the chances of success in predictive systems rapidly approaches zero. Even accounting for patterns like backups, what about the difference in when each server gets backed up during the overnight window, or the fact that you do different backups daily versus weekly versus monthly, and so on? It sounds predictable until you look back over the data and realize the variances are more pronounced even with simple human analysis.
Would You Trust Predictive Systems for Infrastructure Management?
If we are to trust patterns to predict the future in volatile environments, I'll give you an exercise to perform to see how much you'd trust it. You most likely take the same path to work every day. You drive the same roads, take the same exits, leave at the same time, and arrive at about the same time, right? You may even change lanes around the same time before the exits because you have a pattern you like to follow to be ready to get across a couple of lanes to exit safely.
Let's say, based on predictive analysis, you change lanes at the same spot every day five days a week. Tomorrow, when you reach the point where you normally change lanes, would you close your eyes and just blindly change lanes because you change lanes at the same spot every time? Of course not! That would be crazy. The reason you wouldn't do that (please, please tell me you wouldn't) is that the real-time environment around you is entirely unpredictable. There are different cars around you, and so many variables that it would be purely guesswork and assumptions that you actually changed lanes at the same time in the same exact spot. Your experience may seem to align with the prediction but it just isn't the case.
Performance Predictions Are Really Guessing
The inaccuracy of predictive systems and inability to provably show success is obvious in both ends of the spectrum. Either your infrastructure is small enough that it does not produce statistically accurate outcomes that can be predicted, or it is large and volatile enough that the patterns become immeasurable with any amount of accuracy (see the traffic example above). Systems that claim to be able to predict the next hour of your application usage are really just guessing and hoping to be more right than wrong — not really something you would want making decisions around changing your infrastructure based on a guess. They are no more accurate than knowing if you'll roll a 3 on a dice roll.
In the end, I would call on you to think about the gap between predictions and guessing. When it comes to system performance, it is widely proven that it's more guessing than anything. This is the reason the systems need to be real-time. If you grew up on Sesame Street, you'll remember that just because Ernie had a banana in his ear, it didn't mean it was the real reason there weren't elephants in the room. So, when being told about the ability of a predictive system that solves for application performance versus a real-time platform you should remember that it really is like Ernie: hoping to avoid an elephant in the room.