Using AIOps for DevOps Workflows
Learn how monitoring DevOps workflows with AIOps allows you to set up smart alerts and triggers, and how it's benefitting real-world companies.
Join the DZone community and get the full member experience.Join For Free
Every DevOps support team has to deal with large amounts of monitoring data and logs in order to take care of their cloud infrastructure. AIOps is when AI is leveraged to make use of that data.
We have explained what AIOps is and the benefits it provides to any DevOps workflow. There are multiple DevOps tools used at various stages of software delivery, from iterations through code versioning, building, testing, pushing to production and monitoring the ready product performance. There are also various parameters to be taken into consideration while monitoring these software development lifecycle stages, from CPU/RAM usage to disk volume and bandwidth usage, to the numbers of app sessions, etc.
Even when all of the vital parameters are monitored through convenient dashboards - either via the cloud monitoring solutions from your cloud computing provider, or through custom configured Prometheus + Grafana cloud monitoring tools — it's quite difficult to control all of the system parameters at once across multiple dashboards.
The viable solution is enabling the smart alerts once any of the parameters monitored begin to exceed certain thresholds. For example, we know from historical data that our system freezes once the number of simultaneous app connections exceeds 50,000. We can set up a trigger that will raise a smart alert once the quantity grows to 40,000, so the admins have some time to react, like launching another instance for the purpose of load balancing.
This was a huge step forward, and this responsive approach to system monitoring helps to lower the chance of significant failures dramatically. However, the need for manual responses for these alerts remains great, and this is by far not the optimal approach to using the DevOps support team resources and time. In addition, when monitoring complex distributed systems, the numbers of alerts received can be humongous. Thus said, the need for automated algorithms of responding to triggers is obvious - and here is when using AIOps becomes feasible and useful. Actually, using AI in daily DevOps operations is one of the cutting-edge IT trends of 2018.
AIOps in IT Svit Projects: Training and Using the ML Models
IT Svit remote DevOps team has developed an internal Machine Learning (ML) model for predictive and prescriptive machine data analytics some time ago. We were pleased to know AWS went this way too, as they reported on their progress during the AWS Summit London 2018 in May. This was a proof we are on the right way.
We began identifying the points of interest in the processed data, like:
- CPU usage
- RAM usage
- Disk usage
- Network usage: traffic/ requests per second
- Geolocation of requests
- Time of the day
- IP address/ pool of IP's used
- Internet provider of the request
- User agent
- Payload, etc.
We even inserted the schedule of the US, EU, Ukrainian and Russian state and religious holidays.
Storing and processing all of this data allowed our Big Data architects to train several ML models that identified the "normal usage" patterns and highlighted certain correlations between various parameters. This allowed development of various responses to the cases that break the pattern. For example, if the CPU usage threshold is set to 50% and the system gets a spike up to 70% due to the increasing number of active connections, it automatically spins up additional Amazon EC2 instances with the app, until CPU usage goes back to 50%. Once the spike ends and the usage drops below 45%, additional instances are shut down to save the money.
This is the simplest use case, and we have quite complex response scenarios in place. Due to such a service in place, we have significantly decreased the numbers of incidents in production, as well as shortening the issue resolution time and automating many routine actions. Our system just essentially knows what needs to be done by the time the DevOps engineer need it done. This is a great improvement as compared to post-crash firefighting many businesses still have to deal with.
Final Thoughts on How IT Svit Uses AIOps in Our Projects
This is quite a simplified representation of a lengthy and complicated process, which was extremely beneficial for IT Svit customers, but they were not able to pull this off on their own. We succeeded with this endeavor only because we have both a highly-skilled infrastructure support team specializing at providing DevOps-as-a-Service and an experienced Big Data team, well-learned in choosing and training the right ML models.
This approach to providing DevOps services is one of the reasons why IT Svit is listed as one of the leaders of IT outsourcing market in Ukraine and among top-10 Managed Services Providers worldwide by international business rating agency Clutch. All of the 5-star IT Svit customer reviews on Clutch mention the incredible speed and efficiency of cloud management services we provide - and now you know why it is like this.
Published at DZone with permission of Vladimir Fedak, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.