AI-based Analytics Heralding the New Era of ITOA
AI is bringing about radical transformation, and so are today’s hybrid cloud environments. These IT infrastructures are increasingly dynamic and agile but at the same time extraordinarily complex.
Join the DZone community and get the full member experience.Join For Free
Long a favorite of science fiction plots, decades of research is finally yielding dividends in the field of Artificial Intelligence (AI). CB Insights reports that funding in the AI sector has multiplied almost sevenfold in five years, from $45M in 2010 to $310M in 2015. From voice and facial recognition to retail to medical treatment recommendations, AI is being applied in ways too numerous to mention. But one more application is definitely worthy of note: IT operations.
AI is bringing about radical transformation, and so are today’s hybrid cloud environments. These IT infrastructures are increasingly dynamic and agile but at the same time extraordinarily complex. Humans are no longer able to sift through the variety, volume, and velocity of Big Data streaming out of IT infrastructures in real time, making AI—especially machine learning—a powerful and necessary tool for automating analysis and decision making. By helping teams bridge the gap between Big Data and humans, and by capturing human domain knowledge, machine learning is able to provide the necessary operational intelligence to significantly relieve this burden of near real-time, informed decision-making. Industry analysts agree. In fact, Gartner named machine learning among the top 10 strategic technologies for 2016, noting “The explosion of data sources and complexity of information makes manual classification and analysis unfeasible and uneconomical.”
Sadly, though, that’s what is often still occurring in the IT operations environment of companies around the world. Domain experts—typically IT administrators, IT operators for TechOps and Site Reliability Engineers (SRE) for DevOps—who must manually gather this disparate information and apply their domain expertise in an attempt to make informed decisions. While these professionals are great at what they do, trying to analyze so much data from multiple tools leaves the door wide open for human error. On the other hand, analytics that are based on machine learning are quickly becoming a necessity to ensure the availability, reliability, performance and security of applications in today’s digital, virtualized and hybrid-cloud network environments.
A combination of disparate monitoring tools has, until recently, been all IT operations teams had to rely on for information about their network, virtual and physical infrastructure and application performance. While these tools provide pieces of the puzzle, they offer a narrow view of the IT infrastructure and, therefore, only one aspect of the tool chain. The other aspect is service desk tools that manage tickets and change management. Humans more often than not bridge this gap between the siloed monitoring tools of yesterday and service desk applications with their domain expertise.
The Analytics That Matter Today
Intelligent, informed decisions based on real-time analysis of Big Data—arising out of the entire application infrastructure stack—is what TechOps and DevOps environments need today. Following are key analytics for IT operations:
Comprehension of the temporal, peer-to-peer and hierarchal relationship between hybrid cloud elements. Topology is something every IT administrator or SRE should be aware of. This type of analysis should be able to self-learn the inter-relationships of objects and the impact of their performance on one another. Learning those relationships and maintaining that understanding in order to spot trouble in time is extremely important for both TechOps and DevOps environments.
Assist operators with finding early indicators; they provide insights into looming problems that may eventually lead to performance degradation and outages. Predictive analytics are also good at providing early insights into anomalies to better plan for what’s ahead.
Understanding the behavior profile of every metric, how that is incorporated into the object behavior and then how the object behaviors relate to other object behaviors across the hybrid cloud environment. It is a multi-dimensional problem, and understanding and adapting to “normal” behavior is extremely important.
Understanding when there is a real anomaly and more importantly, when there is not, is critical to avoid generating false alarms. Best-of-breed machine learning algorithms should be able to look at contextual, historical and sudden changes in the behavior of objects to detect anomalies. This is the foundation of what is typically referred to as diagnostic analytics.
By isolating the origin and impact of an incident, root-cause analysis is able to fast-track the resolution and reduce mean time to repair substantially.
Intelligent and actionable recommendations to remediate an incident. These recommendations should capture tribal knowledge gathered over the years in the organization, best practices in the industry, and may even be crowd-sourced to capture state-of-the-art knowledge. These analytics provide the opportunity to finally close the loop in automated IT Operations Management.
It’s been a while since mere humans have been able to keep up with everything that’s going on, trying to respond to incidents as well as resolve them after they have spun out of control. In contrast, AI provides technologies to help automate many of these tasks in order to handle incidents in advance. The whole notion of automating IT operational tasks, as well as preventing outages in the first place, and getting to the root cause quickly and in an automated way is the next frontier in remediating these issues.
For the purpose of identifying incidents, automation is critical today to properly review monitoring data. This is where AI comes in. Its applications seem limited only to the imagination, and fortunately, it has been applied to the tasks that DevOps and TechOps teams can no longer manage. AI-based analytics are enabling real-time decision making based on intelligent insights, taking a load off of IT pros’ plates and enabling everyone to rest easier.
Opinions expressed by DZone contributors are their own.