AI/ML Use Cases in Application Management
In this article, take a look at some AI and ML use cases in application management and see solutions.
Join the DZone community and get the full member experience.Join For Free
Artificial Intelligence-based Operations (AIOps) is the convergence of AI and traditional AM/IM operations. Like in all other domains, AI is going to have a significant impact on operations management. When the power of AI is applied to operations, it will redefine the way applications and the supporting application/infrastructure is managed.
Multiple applications are running simultaneously generates a lot of data. The data is generated right from the network layer to the latency of an API call to an end-user. The user expects experience of applications without the slightest performance disruption.
It is possible to acquire data from disparate layers of the stack becomes a rich source to infer insights. The complexity of operations has led to the creation of algorithmic IT operations (AIOps) platforms. The platform solution uses AI and ML to gain insights from the monitoring data and drive automated solutions by augmenting human decisions.
Important Use Cases and Solutions
Application metrics track things like response time, requests/minute, error rates overtime on a time scale and identify trends in their behavior. Besides, infrastructure metrics such as CPU utilization, memory utilization, and load averages are captured to understand how the infrastructure layer supports to cater to different load conditions of an application. As the complexity of the application grows, it is difficult to detect exceptions from an expected pattern. If unnoticed, these anomalies can cause potential outages.
Patterns of changes can be analyzed and discovered at different scopes including application level, service level, transaction-level, and external dependencies. First, determine what constitutes normal system behavior and then discern departures from that normal system behavior. AIOps can accurately highlight these outliers by pinpointing the sources, which can help better RCA in real-time. Also, it can prevent potential outages and infrastructure disruptions.
Business transactions range from simple synchronous message exchanges between point-to-point application connections to more complex asynchronous communications. To track the transaction, flow a sophisticated tracking and monitoring solution is required. Long-running, multi-step asynchronous transactions transit IT infrastructure, spanning multiple technologies, tiers, etc.
Complex transactions often morph and split, thus defying standard tracking and analysis via tagging or statistical sampling techniques. Transactions are stitched together by examining method calls and individual message payload contents, correlating them and presenting intuitive visualizations of any pending or existing breaches in expected behavior and performance
Managing software quality is a big concern in the software development lifecycle. Almost any software contains at least some minor defects after being released. It is important to identify and fix defects before getting into the production environment. Any defect found in production incurs significant costs. Locating bugs is considered to be the most time consuming and challenging activity in this context where the resources available are limited. Therefore, there is a need for fully/semi-automated techniques in software engineering for augmenting the manual debugging process. If a developer obtains some hints where bugs might be localized, debugging becomes more efficient
Various graph mining algorithms/techniques come in handy to localize software defects. These techniques rely on detecting discriminative sub-graphs between failing and passing traces. These approaches may not be applicable when the fault does not appear in a rare code pattern. On the other hand, many approaches focus on selecting potentially faulty program components (statements or predicates) and then ranking these components according to their degree of suspiciousness and the context of execution traces based on control flow graphs
Performance baseline determines how the components of Application and Infrastructure performs under different load conditions. The load conditions are namely Normal, Operational, Quasi, Stressed, Spikes, Breakpoint, etc., and the baselines are a set of rules or thresholds of individual metrics expected to vary between upper and lower limits. Traditionally, these associations are modeled by running Machine-Learning algorithms after collecting performance data collected for a defined time interval and deployed in real-time to notify in case of any performance deviations. This method well suited for the components evolves slowly however nullifies the point of “relevance” with the modern development methodologies.
The influence of hyper-converged Infrastructure Management, Domain-driven Application development, a surge of Distributed computing, Polyglot Programming and Persistence changed the way software components are developed and deployed. Frequent changes of software components need to be deployed continuously on top of dynamic scaled-up/scaled-down underlying infrastructure. This paradigm shift enforces the model-building exercise to use near-real-time data to stay relevant to the latest changes in Application as well as Infrastructure components. These models need to use real-time feeds to learn new rules and evolve continuously.
Intelligent alerting in APM is to detect the abnormal dynamically. For alerts to be intelligent, the tool needs to be configurable to understand the nature of your application and its behavior so that it can detect anomalies. It was quite common to define static thresholds; such as raise an alert if this service call takes more than three seconds to return. However, it is very tedious to identify important metrics to monitor and their thresholds for different application usage patterns, hence the need for an intelligent way to baseline the normalcy of an application and notify in case of abnormal behavior.
As the algorithmic techniques evolved, alerting became smart enough to perform rudimentary statistical analysis and allowed for alerts based on prediction using measures like things standard deviations, percentiles, forecasting, etc. Today, tools are smart enough to understand the behavior of your application and establish a baseline, allow you to define the strategy to use when analyzing requests against your baseline, and intelligently alert when there is a real problem that you need to review.
Opinions expressed by DZone contributors are their own.