A Roadmap to AIOps — Part 2
A Roadmap to AIOps — Part 2
Part two takes a look at the mid and late stages in the roadmap to AIOps.
Join the DZone community and get the full member experience.Join For Free
Start coding something amazing with the IBM library of open source AI code patterns. Content provided by IBM.
This article is a continuation of Part 1. You can find the first part here.
Mid-Stage: Implement Existing Analytics Workflows
It is likely that when you begin your AIOps journey, you will already have certain analytics in place. I do not mean the analytics that are embedded in your IT tools. I mean offline, mostly manual analytics, that you do regularly, irregularly, or periodically to identify areas for process improvement, reduce costs, improve performance, etc.
These manual efforts are precisely what your AIOps solution should address and automate in its first iteration. Once the data you use to do these investigations is flowing into your data platform, you should seek to recreate and automate the analyses. The initial value you will generate is the reduction of manual effort spent on analysis, but you should also immediately be able to increase the frequency and perhaps the scope (data points, systems, permutations, etc.) of the analysis.
Remember that AIOps is intended to put you into a position of doing real-time analysis on data sets beyond human scale. The easiest way to move in this direction while simultaneously realizing immediate value is to reduce the time/effort and increase the speed/frequency with which you do analyses that are already part of your operational process.
Mid-Stage: Begin Implementation of Automation
Ah, automation. Everyone knows its value. Everyone knows they need it (or at least could use it). Few organizations put it into practice. Fewer still approach it as a practice with discipline. There used to be a mantra in performance management — "Monitor all the things!" The mantra in the digital era is "Automate all the things!"
It should be sufficient to say that in a digital enterprise, data grows and moves at speeds beyond human scale. To address this, you need to turn to machines to perform analysis and execute automation. There are, however, other process factors that impact the desperate need for IT operations to automate. Prominent among them is the rise of the developer and DevOps, more specifically “continuous” integration and delivery (CI/CD).
Let’s clarify something first: you automate tasks and you orchestrate processes. Task automation in IT Operations typically has been and remains segregated by tools. Your service desk has some automation, you have automated patching for your servers, and you may automate some remediations from your monitoring tools. Orchestration across these tools is more difficult to achieve and rarely fully accomplished.
DevOps is essentially the automation of development tasks and their orchestration — to eliminate the bottlenecks caused by phased review processes in waterfall developments, segregated test and compliance activities and operational, pre-production interlocks. What this means for IT is that DevOps application teams creating the innovative cloud services impacting the business are now moving at lightning speed compared to the traditional application teams of the past.
For IT Operations to keep up, they must not only "automate all the things," they must orchestrate them and also plug into the CI/CD toolchain. If you don’t know when things move from test to staging to production; if you don’t know who owns the code or what impact it has on production; if you can’t measure and identify developer backlog/productivity on business services, you can’t effectively manage your environment.
That is the situation that modern IT Ops finds itself in. They need to match the speed and agility of the DevOps teams spread throughout their organization while simultaneously adding visibility to those teams’ activities into their value chain. This begins by automating and orchestrating the things they already do — across siloed tools — and finding ways to connect, share information, and communicate with the DevOps teams in their enterprises.
Late Stage: Develop New Analytics Workflows
Above, I talked about implementing existing, manual analytics workflows into your AIOps solution to automate and scale them. Once this is accomplished, you should have the bandwidth to:
- Assess the value of those workflows
- Modify and improve those workflows
- Develop new workflows based on the existing or to address gaps
Part of the problem with the "brute-force spreadsheet" approach to doing analysis with disparate data sets is that the energy and focus it requires oftentimes exhausts the capacity for the practitioner to assess the value of what is being delivered. Reports have been promised, meetings are scheduled, and expectations have been set. Unless a leader calls for a re-evaluation of the approach, rarely is the process questioned.
Once the existing process has been automated in the AIOps platform, the practitioner can step back and evaluate whether the necessary information is being analyzed, insights are being gained and results are actionable. Having done so, s/he can make improvements using the AIOps platform — which should be an order of magnitude easier than doing so in the spreadsheet(s) — and evaluate the impact of those changes.
Simultaneously, she/he can determine where information/insight gaps exist and envision higher-levels of analysis that leverage the outcomes of existing workflows. Again, the promise of AIOps is the ability not only to execute what heretofore wasn’t practically feasible; it’s doing it at a scale and speed that makes previously unrealized analytics opportunities possible.
Late Stage: Adapt Organization to New Skill Sets
It should be obvious by now that if the AIOps platform is taking the analysis and response activities off of the plate of the IT Ops practitioner, the role of the IT practitioner will evolve. You will transition out of the need to have someone who has domain knowledge for the purposes of tactically addressing issues to one who can put that knowledge to use training the system.
This is not a simple semantic distinction. The ability to know when something is wrong, determine how to tell a system to alert about that fact and then fix it is fundamentally different from the ability to understand how systems are operating well or poorly, how the system is reading and reacting, and then adjust the system accordingly (or give appropriate guidance thereto).
IT Ops will move from a "practitioner" to an "auditor" role. This doesn’t require in-depth, data-science level understanding of machine analytics. It does require understanding how systems are processing data and whether the desired business outcomes are being achieved. Of all of the changes AIOps will bring to IT Operations, I believe this will be the most disruptive.
IT Operations has long had a bunker, hero mentality, particularly with monitoring teams. Giving up control to a machine will be one of the most difficult transitions those who have been steeped in the practice for decades will experience. Many will not succeed. This is an inevitable result of market trends as they exist now. The move to business beyond human scale will have significant consequences for the humans who have been used to managing it.
Organizations will have to cultivate this new skill in their existing — reduced — workforce or bring in talent that either has the skill or can adapt to the change. This will be challenging in two ways: the scarcity of such skills and the fact that the market may take a while to respond with the education, certification, and practical opportunities necessary to build a robust AIOps labor force. It will take time for these changes to have a noticeable impact and it may be that only the highest-performing organizations understand and realize it, but it will happen and will be a tectonic shift in the discipline of IT Operations.
Late Stage: Customize Analytic Techniques
The last activity I will discuss is both the most speculative and the most contentious. It is the question of whether IT Operations organizations will need to develop a mature data science practice or not. Some analysts believe you do. I disagree. I believe in the segregation between domain and data science knowledge.
I have two preceding paradigms in mind: the scientist-analyst and the developer-analyst. Scientists have long been executing complex, data-intensive analyses. With the rise of machine computation, scientists had to develop, at least, the ability to craft the mathematical algorithms that they wanted to run on their data sets. At first, when computational resources were shared, scientists built their own analyses to be run on systems maintained by computer experts. The languages, parameters, and constraints were dictated by the systems and scientists had to work within them.
In that paradigm, scientists developed specialized knowledge that allowed them to leverage the computational systems. Once computational resources and analytic languages became less expensive, more powerful, and more accessible, scientists had to develop not only the domain knowledge in their fields but also data science and computational knowledge sufficient to execute their desired analyses on contemporary computing platforms.
They were able to do this because:
- Their programs were research, not commerce and hence weren’t subject to market or business pressures (at least not immediately like IT)
- They were self-selected for the education, drive, and acumen to learn and master both types of knowledge (Ph.D.)
- They were afforded the time in an academic setting to acquire the skills and knowledge necessary
- Failure to do so would be fatal to their careers – labor competition
Let us contrast this with the programmer-analyst. Currently, the market stands in critical need of data science practitioners who can also implement their data science knowledge in code. In spite of the ubiquity of data science jobs and data science education (both formal and informal), the market is bereft of people who have M.A. or Ph.D. level knowledge of statistical modeling (e.g.) and are at least adequate Python or R programmers.
This may change, but I do not foresee that happening soon, if ever. It is simply the case that it is too hard for most people to learn the math required and too easy to make very good money with just the coding to incent them take on more than that. And even if they did, they would still need the domain knowledge required for a particular industry or problem area.
Asking IT Operations practitioners to know math, IT, and coding to manage infrastructure, applications and services is, I think, too much. In my vision of the future, IT Operations would be the stewards of semi-intelligent, semi-autonomous systems with deep knowledge of the domain, sufficient knowledge of the math to understand what the system is doing and no knowledge (or no need for knowledge) of coding.
In this paradigm, AIOps vendors provide systems that offer multiple analytics options from which practitioners select combinations that best fit their environments. Ideally, this would require less knowledge of the math than of the business outcomes. Also ideally, the AIOps platforms would provide regression analysis that would suggest "best-fit" options from which practitioners could make informed decisions.
This is how I see new and customized analytics coming out of AIOps. Some organizations may have the wherewithal and will to have domain, data science, and programmatic implementation expertise. Teams of people. For revenue generating activities, this may make sense. I don’t see a future where such an approach will be feasible for IT Operations.
I have offered 9 steps for an AIOps roadmap.
- Identify current use cases
- Agree on a system of record
- Determine success criteria and begin tracking them
- Assess current and future state data models
- Implement existing analytics workflows
- Begin implementation of automation
- Develop new analytics workflows
- Adapt organization to new skill sets
- Customize analytic techniques
#1 and #3 are table stakes for IT operations in its current state and so certainly for AIOps. If you don’t know what you are currently trying to accomplish and/or you can’t measure it, you can’t hope to manage it even with existing tools. #2 can be done with existing tools or it may be an assessment that current tools are unsatisfactory. If the latter, building out requirements for how different organizations will share a view of what is happening is the logical response. These are all early-stage activities on your AIOps roadmap.
#4 is a requirement for any activities that follow. As I mentioned in that section, understanding your current and future data needs is paramount to a successful AIOps implementation. It can be done piecemeal, but it must be done. #5 depends on #4. #6 depends on #5 for the analytics portion of the AIOps process but automation of tasks and orchestration between tools can and should be pursued at whatever stage of maturity an IT organization finds itself.
#s 7, 8, and 9 are more intertwined and likely to evolve organically in tandem, taking different courses in different organizations. It may be impossible to forecast or plan at early — or even mid — stages for their eventualities, but the highest performing organizations will comprehend them in their strategic horizons.
To paraphrase Peter Drucker, the future has “already happened.” The only IT organizations that aren’t thinking about how AIOps leveraging machine learning, big data, and analytics will radically alter the way they function are those that haven’t realized it yet. And they are the ones likely to miss the almost limitless opportunities that digital transformation presents.
Published at DZone with permission of Seth Paskin , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.