Understanding AI Ops: Part 2
How can you make use of AI/ML if you are not a dev or data scientist? I explore the emerging area known as AIOps and how it applies to cloud operations.
Join the DZone community and get the full member experience.Join For Free
Welcome everyone to the second of my AIOps introduction. In Part 1 I talked about the challenges that exist today with digital transformation and how more and more automation is being used in software build and delivery areas. I also talked about the fact that there is more and more pressure on IT Operations to react quickly, deal with infra, software, config, connectivity, security, multi-cloud; the list goes on and on.
Today, I am going to consider AIOps from the vantage point of the tools used for Operations Management and how they leverage AI/ML to solve common problems in the IT management domain. I have also included a link to part two of the video I highlighted in my last blog which imagines a future that AIOps could enable for IT Operations teams, allowing those teams to be true enablers of business outcomes.
As mentioned in Part 1, AIOps really should be much more than another label to give IT Operations tools. It’s a change in how we do things. That’s really important. Having said that, tools can be a great place to start this journey.
As we explore tools, I’m going to stay in my comfort zone and will leverage my knowledge of VMware technologies. Specifically, I am considering VMware solutions that are part of the VMware vRealize and VMware Tanzu family of products. These tools will be the basis for my examples on how AI/ML is already impacting IT Operations to support various use cases.
An AI/ML-enabled solution for log management uses ML to intelligently group logs, improving the performance of finding the right log in order to help fix the problem you’re working on. It uses ML to analyze massive amounts of log data from all over your clouds and data centers. Based on whatever logs you forward to it; the solution gives you near real-time monitoring and alerts.
The solution also creates structure from unstructured data. If you know AI/ML, you know that the structure of the data you’re analyzing can be really important. The solution brings together information on an application from all perspectives (app logs, network logs, config, messages, performance data, etc) giving you a big head start in your troubleshooting because you can start with a holistic view of your app.
Whereas Log Management looks at the logs that come out of apps, infra, clouds, etc. your Operations Management solution starts with looking at already defined metrics. For example, how much memory is being used, how many Kubernetes clusters are you running or how are those clusters performing? Unlike Log Management, it’s all about things you can measure.
Your AI/ML operations solution then applies analytics and machine learning to data collected from these metrics, across infrastructure and applications, looking at specific metrics from things like servers, VMs, apps, everything. The goal is to automatically spot and react to issues in real-time, including things like performance optimization opportunities, capacity constraints, and anomaly detection.
A solution-focused on Performance Optimization builds on top of your operational metrics platform and provides the capabilities needed to automatically optimize your infrastructure operations. Through data collections and reinforcement learning techniques, this solution will continuously optimize your infrastructure to the KPIs you’ve given it (for example performance). It does this while factoring in the completely dynamic nature of traditional and modern applications.
Combine these three of these AI/ML-enabled solutions together and you can achieve a self-tuning, self-healing, self-driving data center and/or cloud.
Where an Operations Management solution is all about looking at metrics for specific things, like compute power, a number of database queries, etc. (things we know about) an Observability solution does some of this, but it is more focused on allowing you to build your own metrics and using different queries and AI/ML to analyze them.
A developer can place a hook (a piece of code) into their application, which can send a metric out. For example, a games developer might want to measure how many times someone uses a particular weapon in their game. That metric can be built into the code so that the application reports when someone uses a specific weapon. We can then apply anomaly detection to tell us when the typical use of that weapon changes. Maybe we see from this data that people have stopped using this weapon so much that it is time to design a new one. In this circumstance, this is useful business-level information.
Hopefully, you can see that AIOps suddenly starts to become an enabler of innovation. But you need to know what you’re looking for. Which leads me into the future.
I think the “AI” in AIOps really should mean Data Scientist. Data Scientists are the ones building these algorithms and methods to understand and navigate data in an intelligent and useful way.
As DevOps blurs the lines between Developers and Operations, AIOps will start to blur the lines between Data Scientists and IT Operations. The data will be there and finally, it will be easily and quickly accessible through the types of tools I’ve just talked about.
Automation is already happening. Things are starting to be automatically self-tuned and self-healed.
In the future, we might start to see AIOps teams building complex algorithms to improve how the platform runs. They might be able to build algorithms such as what we already see in VMware’s vRealize AI Cloud as an example.
If we consider observability, we know that we can already build monitoring hooks into apps that can generate a massive amount of data from all our modern apps. That data can then be queried very easily using the available tools. The only limit to what can be done with data when we apply AI/ML techniques to it will be your imagination.
Take this scenario. Amazon will be looking at how long you hover over the buy button on their website for the new thing you really want (but probably don’t need!). They might use that information to decide how much more advertising they should place on you, to push you over the edge and buy that new thing.
The original question I asked when I released part 1 was “How can we start to make use of AI/ML if we are not a developer or a data scientist?” As we’ve learned, the more we can embrace complexity and leverage AI/ML to find problems or opportunities in IT operations, the further we can use IT Operations as an enabler to move faster and be more agile as companies. We also use this technology to be more innovative, coming up with new questions for the AI that could be transformational for our company.
You’re not going to have to be a data scientist to do this. It was just a decade ago when all the traditional networking engineers started learning python programming because they knew DevOps was coming. It is probably time to brush up on your math thought because the more you understand the data and how to manipulate it with equations used in algorithms, the more effective you will be in the months/years to come with AIOps.
AIOps is here, so let’s leverage this new technology to move away from reactive IT operations and long conference calls of pointing fingers at each other. Let's move things up the stack, solving business problems and driving new revenue streams. That is where the value is. This, much more than just a single AIOps tool, is what AIOps as a concept can really give us.
Published at DZone with permission of Tobias Lilley. See the original article here.
Opinions expressed by DZone contributors are their own.