Truth and Proof: Building Trust in Machines Through AIOps
How can humans learn to trust self-healing machines? See how teams can build trust in machines through “truth and proof,” evidence-driven AIOps tools.
Join the DZone community and get the full member experience.
Join For FreeIT systems are only getting more complex, with greater pressures to solve issues faster and demonstrate value consistently. Issues within systems, which dev teams could once handle all on their own, sprout up too fast and too often for direct human intervention. Artificial intelligence for IT Operations (AIOps) tools exist today to deliver automated monitoring and solution development, “no humans required” — significantly easing dev teams’ many burdens.
Adopting AIOps should be simple enough, then. But one of the tougher sticking points has been trust. Can humans trust a machine to identify root causes of issues and create accurate and effective solutions? The stakes are high — if a machine gets it wrong, the burden on human teams compounds quickly.
The sheer volume of issues and the data they generate necessitates automated solutions. But building trust is a slow process requiring an incremental approach. Once achieved, human teams alleviate themselves of significant toil while system issues are resolved more efficiently. Where can teams start to learn more about leveraging AIOps tools and begin building trust?
Understand Your AIOps Tool
A powerful benefit of an AIOps tool is its ability to be proactive and self-heal issues. Properly integrated tools can seek out and solve problems automatically. It goes beyond surface-level problems, too — AIOps can eliminate the root causes and, by extension, the manual toil associated with issues caused by those root causes. Is this something where we’re going to see full humanless automation? Probably not. Experience tells us that sometimes the “fix” can be worse than the fault. The goal here should be to attempt rapid root cause analysis and offer the best probable fixes, automated if practical, and let the user authorize the action.
But not all AIOps tools are created equal. GigaOm’s recent Radar for AIOps Solutions report investigated AIOps tools and uncovered numerous approaches to critical functions like observability, self-healing, and integrations with other systems. And the very use of the term AI varied: some claimed “AI” when their systems operated on rules-based heuristics with the need for substantial human supervision, while others tout actual automated neural capabilities.
The variation makes a lack of trust understandable. After all, do you really know what you’re getting in your AIOps tool? Is it sophisticated enough to truly self-heal root causes?
Rule- and model-driven systems rely on problems conforming to the set list of rules programmers built into the system. When systems change, either the rules need changing or teams fall behind as new unforeseen problems arise. The root causes of issues get tougher to find and fix, and human teams spend more time hunting down problems or updating data sets instead of innovating and operating proactively. They gain little from automation.
But an evidence-driven approach can keep pace with rapid change. The system is based on the evidence of what is actually happening within the system as opposed to what rules think is happening — the truth. And empirical evidence and data generated by testing offer proof that a solution works. An AIOps tool employing an evidence-driven approach is best equipped to help teams solve root causes and take advantage of benefits like self-healing.
Trusting Your AIOps Solution
“Truth and proof” with an empirical, data-driven incremental approach are much like how humans build trust with one another. We take small steps, looking for confirmation before continuing. Self-healing machines are no different; by delineating increments toward improvement, and monitoring the outcomes to confirm AIOps is fixing problems accurately, human teams build trust with machines.
Trust-building begins by implementing AIOps and integrating it with information repositories. Tools should support native and third-party orchestration to actively manage workflow, so apply it to your CI/CD pipeline or DevOps toolchain. While teams should let their AIOps tool identify and act upon root causes via the incremental approach, they should also closely monitor the outputs and results the tool produces. AIOps can update ITSM ticketing or CMDBs whenever the system logs changes and can collect and interpret data to give humans insights with the necessary context on its performance to help them decide on a solution’s success.
AIOps tools are not intended to eliminate humans entirely, but to augment and assist. This “human-in-the-middle” approach keeps the most important component of operations (the human) in control while eliminating the mundane toil and facilitating scale.
Deeper observability lets teams see the steps an AIOps tool took to solve a problem and the value realized afterward. Trust naturally develops as AIOps logs more successes. And success extends beyond issue remediation. As DevOps takes on more principles of value stream management (VSM), the streamlined, automated workflow AIOps creates can help teams optimize the flow of value and how that value translates into the customer experience. AIOps can apply context to quantitative and qualitative data to discern proactive solutions that reduce toil for internal teams and maintain superior, responsive CX.
The best way to build trust in AIOps is to start using it and realizing the value it can provide teams and customers. Self-healing systems, paired with a careful, incremental approach to rollout and monitoring, can significantly reduce human workloads and create more opportunities for innovation.
Opinions expressed by DZone contributors are their own.
Comments