The Demise of the Dashboard: Embracing AI for IT Monitoring
This article explains why we should embrace AI for IT monitoring instead of using dashboards.
Join the DZone community and get the full member experience.Join For Free
When it comes to IT monitoring, dashboards have seen better days. While it’s true that they provide important, up-to-the-minute views of key organizational metrics, the apparent detail they provide also obscures some important blind spots.
You might also like: We Need to Talk About Dashboards
How Your Organization Can Outgrow Dashboards
Dashboards typically only show historical data, usually the time series as it’s been received over the last few hours, days, or weeks. Data analytics can make real-time forecasts and even decision support, something traditional dashboards don’t offer.
In addition, dashboards display only a limited number of metrics. Because you chose these metrics, you may think they’re important. But are they really aligned with your business model?
For instance, a territory manager might decide to keep a dashboard that displays EBITDA, daily website visitors and support calls. While these are all important metrics to follow, do they match the user’s job title? Are there metrics they could be following but aren’t?
As far as usability is concerned, it’s far better to do away with dashboards. Right now, you’re selecting a limited number of metrics that you believe are important. It’s better to instead create a system that constantly monitors and alerts on anomalies and generates forecasts across your entire landscape of metrics. This requires much more horsepower than traditional dashboards. It’s a level of scale that can only truly be attained by artificial intelligence.
What Benefits Does AI-Based Monitoring Have Over Dashboards?
The main problem with dashboards is that they’re a simplistic implementation of big data. Your company generates a myriad of metrics, and the most that dashboards can do is skim the surface of their depth. They have limited capabilities to extrapolate or alert on anomalies.
Dashboards were among the best ways to interpret volumes of big data in real time, at least back when they were first used. Since then, however, advances in artificial intelligence — plus the commodification of IT hardware, the widespread availability of compute power in the cloud and the increasing refinement of GPU technology – have made it a far better choice.
AI gives users the ability to monitor millions of metrics across thousands of applications. Even at this scale, AI provides granular alerting capabilities — notifying administrators about anomalies as soon as they occur, with few false positives or false negatives. AI is now easy to implement, easy to use and provides more benefits at a lower cost than a fully-staffed Network Operations Center (NOC) using dashboards.
Your Organization Powered by AI
AI can pay for itself by quickly helping organizations recapture revenue that would otherwise have been lost, preventing things like fraud, data breaches, and unplanned outages.
For example, credit card companies have a huge problem with fraud. The issue is trending down, but fraudsters still managed to milk credit card customers for around $6.4 billion in 2018. Much of this activity involves stealing a customer’s identity, creating a fake account under their name and then grooming the account — making small regular debits and payments to increase the credit limit. Once the credit limit is deemed high enough, the fraudster borrows to the hilt and then vanishes.
In this case, fraud hurts both parties. The victim takes a huge hit to their credit score that will take them a long time to fix. The credit card company lends money that it probably won’t get back.
Using AI, you’d be able to see that the pattern of fraudulent transactions on a fake account looks different from that of a normal customer. Fraud may involve small transactions paid off quickly, whereas real consumers make larger transactions and pay them off slowly. Using these and other factors, you could begin to employ artificial intelligence to recognize fraudulent accounts and alert on them for further investigation.
Many companies now utilize massive NOCs in order to detect and mitigate fraud, data breaches and unplanned outages. Even a small NOC requires at least six people to run — two people per shift, three shifts per day — and one technician costs around $60,000 per year. So, staffing the smallest possible NOC will cost at least $360,000 per year — not including the price of equipment and benefits.
What’s more, there is no way that your dashboard-equipped NOC will be able to fully prevent fraud, security breaches or unplanned downtime. The limitations of dashboards make it impossible for NOC team members to find every anomaly. Therefore, the cost of the NOC should be adjusted to include all the incidents that it cannot prevent.
Once you add AI, you can detect and alert on all incidents and anomalies at any time. AI is more accurate than manual monitoring and can cover every metric. As a result, you can reduce your NOC staff and devote their time to fixing the errors that AI finds, as opposed to having them spend most of their time staring at screens.
Implementing AI Monitoring
Building an AI monitoring system entails building an architecture that can pull data from your applications, arrange data into patterns, send those patterns to an AI monitoring system and then push any resulting anomaly alerts into the hands of your technicians — or even fix issues automatically.
Collectors will instrument your applications, allowing you to turn their signals into time series and push them to an anomaly detection and correlation engine. Rule engines will automatically act on anomalies, either pushing alerts to technicians or sending them along to workflow engines. These, in turn, are able to perform automatic resolution actions, restarting machines and following known procedures to turn anomalies into non-events.
Armed with a comprehensive anomaly detection architecture, your employees will be able to spend less time looking at screens and more time proactively maintaining and improving your platform. As a result, you’ll be able to save money, redeploy your workers more productively and keep your users and customers happy.
Opinions expressed by DZone contributors are their own.