Understanding AI Ops: Part 1
How can you make use of AI/ML if you are not a dev or data scientist? I explore the emerging area known as AIOps and how it applies to cloud operations
Join the DZone community and get the full member experience.Join For Free
Think about this question. How can we start to make use of AI/ML if you are not a developer or data scientist? How about applying these capabilities to the discipline of IT or Cloud Operations. In this two-part blog, I am going to explore the problem and the opportunity that exists in the emerging area known as AIOps, some of the tools that can help, and what I think the future looks like in this area.
My role in VMware is all about Cloud and Cloud Management. My conversations are usually with people who care about building, running, or managing applications in public and private clouds and care about everything that's required to do so. That led me to think about how AI is going to affect these people and what opportunities it creates for different roles, specifically in the IT and Cloud Ops space.
So, do we need another "X" Ops? We have many Ops-related terms in the Cloud and Data Center world today; everything from DevOps, to GitOps, to SecOps, to DevSecOps. It's all about Ops these days. Ops is the hub! It's mission control! But what about AIOps?
Well, just as operations responsibilities have started to "shift left" to developers, becoming "DevOps," AIOps is currently in that strange place where people talk about it as a tool or set of tools. We thought like this when "DevOps" came out some 12 years ago. We were very silly back then and tended to think of DevOps as a tool. That tool was called Jenkins, Ansible, or sometimes (and yes, I have heard this) we thought DevOps was "just what Kubernetes does."
Thankfully DevOps is pretty well understood now. We are way beyond that kind of thinking and most of the people working in the app dev of IT Ops get it. DevOps is about breaking down silos, changing the way Developers and Operations teams work together, more closely to produce better software.
Now I'll stop talking about DevOps and move onto AIOps. But as we transition here, the model of DevOps is the kind of thinking I'd like you to have as we explore AIOps. That's because there is a roadmap, we can expect AIOps to start taking. But where DevOps took many years to get there, my prediction is that AIOps will come to fruition much faster.
Origin of AIOps
Gartner coined the term AIOps and they currently define it as: "AIOps combines big data and machine learning to automate IT operations processes, including event correlation, anomaly detection and causality determination."
So then, reading Gartner's definition, you can expect AIOps to be used a as a snazzy new buzzword or label for any monitoring tool that uses an algorithm or machine learning in some form, no matter how trivial the use case.
There are many of these tools out there already but there is also fantastic software that leverages AI/ML technology to dramatically improve IT Operations. These tools should definitely be leveraged in any modern data center or cloud. I'm going to talk more about some of these tools later so bear with me.
But let's remember, AIOps isn't a set of tools, it's a whole new way of approaching IT Operations.
So What's the Problem that Necessitates AIOps
Let's start with the problem or opportunity that exists today. Applications are driving more complexity. That's pretty much the base for absolutely any IT problem today. Things are just more complex today than they were in the pre-cloud era.
Digital Transformation has been happening for years now. In fact, we've been talking about it since the 90s. Then, it meant a company having a website. Now, the website IS the company. Businesses are embedding technology into their products or are building new digital products.
Digital Transformation has meant that the way we work has needed to change to catch up. It has led to DevOps, Kubernetes, Microservices, Functions as a Service, CI/CD with automated pipelines. All of these amazing people, processes, and technology changes.
And all of these things happened because the old way wasn't working. Methodologies were too slow, and silos meant many hand-offs between different operations teams. Essentially people were moving too slowly for the technology.
Now, we are in a world, where processes are becoming so heavily automated, we can push new code for a new app, into production every 10 seconds (or much less for some). The humans do the creative piece, figuring out what the app does and what it looks like. Humans also write the code for the app.
Everything else required to get that code into production can all be automated with CI/CD. All the testing, staging, user acceptance, etc. Everything can be automated. This should be your goal as a company if it isn't already.
Automation really is the key. But, if you do things faster, you MUST have the right foundation to keep it going. In that respect, the burden is now falling onto operations teams.
Everything is moving faster than the eye can see. IT Ops or Cloud Ops admins might not have to look after specific apps as much anymore, since DevOps begins to cover this space; but they now have to look after Kubernetes, they have to surface newer things to developers like Lamda, they have completely different DB types to monitor, and they have different clouds to manage with different APIs and different functionality. In short, they have more of everything to deal with.
Today, nobody wants them to become the bottleneck. The days of making an app and throwing it over the fence for Ops to look after are over. DevOps has really allowed Operations teams to give some of the burdens of looking after apps to the developers themselves.
But there's much more we can do to not just remove the bottleneck from Ops further, but to actually have Ops become an enabler for innovation.
As mentioned earlier, AIOps combines big data and machine learning to automate IT operations processes, including event correlation, anomaly detection, and causality determination. So, what does this mean? I'd like to break down each piece.
Let's start with Big Data. This is a vast amount of structured or unstructured data. This is where all of our useful information is waiting to be analyzed. If anyone has ever tried to run a query on a large database, you can bet it generally took a long time run. Think of that scale but much, much larger. So large, that traditional tools just can't query it.
Machine Learning. Essentially machine learning is an overarching AI concept where a machine can assess success from the output and then change how it processes data, without human intervention, to provide a better output next time, hence learning.
Then we combine big data and machine learning with automation to automate IT Operations processes. This piece is pretty self-explanatory, but the examples include event correlation, anomaly detection, and causality determination.
Event Correlation: Bringing together events from multiple different places, for example networking devices, virtualization platforms, cloud services, all together. Then filtering, tidying up the data, removing any duplication, etc. Then the system looks at relationships and patterns between all of these events, getting ready for root cause analysis. There is no human that 20 different tools open for each of the devices involved, trying to correlate events manually.
Anomaly Detection: Noticing when something is different. For example, my app usually has around 1000 users logged in at this time on a Friday, but today I can see that it has 100 users logged in. That is an anomaly and could be a good indicator that there's a problem with the platform.
But Why Is This Happening?
Causality Determination: Once we've correlated events and looked for anomalies, we can start to use this to find the root of a problem. Maybe the number of users logged into the app drop every time a particular firewall is upgraded for example. Ultimately, the use of AIOps changes this whole process from hours of work to seconds. And we've not even touched on how to automate fixing the issues that get identified!
So, as I said earlier, "AIOps combines this big data and machine learning to automate IT operations processes, event correlation, anomaly detection, and causality determination." Make sense? Hopefully, you can see that this type of approach is going to be a massive enabler for innovation.
That's it for today but please check out part 2 of this blog, where I will look into some of the AIOps tools that exist today and what the future could look like for AIOps. I'll then wrap up by summing up and answering the question "How can we start to make use of AI/ML if we're not a developer or data scientist?"
Published at DZone with permission of Tobias Lilley. See the original article here.
Opinions expressed by DZone contributors are their own.