Hybrid IT Monitoring Planning and Design
Hybrid IT Monitoring Planning and Design
If you're moving toward a Hybrid IT model, a three-stage planning process focusing on definition, preparation, and setting baselines can help ease monitoring woes.
Join the DZone community and get the full member experience.Join For Free
Learn how to migrate and modernize stateless applications and run them in a Kubernetes cluster.
Today's IT is becoming more and more commoditized, and enterprises are now looking at a Hybrid IT model as a key enabler for driving the business by reducing costs, improving time to market and helping them to become agile and innovative in competitive markets. Though the low-level metrics, such as CPU utilization, memory consumption or network bandwidth are undeniably relevant for hybrid IT monitoring, but the major objective of monitoring should be in regards to how well an IT service or an application performs the task they are designed for. This includes the underlying infrastructure performance as well. This article summarizes various stages of Hybrid IT monitoring execution.
Stage 1: Define
During the definition phase, the monitoring team would determine what is required to monitor the health of each service. The team will work with the development and operations teams to identify needs and dependencies and break down the service into steps to safeguard accurate monitoring. The core focus would be to understand the functional, non-functional, operational aspects essential for the IT service. These statistics would be used to create a health model, which defines whether a system is healthy, i.e., operating within normal conditions or has been degraded. This model becomes the basis for system events and instrumentation on which monitoring and automated recovery are built. If needed, a set of basic key performance indicators (KPIs) for all IT services to be governed would be created.
Stage 2: Prepare
The major intent of this phase would be to comprehend various Configuration Items (CIs) that make up the service and relationship with each other, failure scenarios and approaches to determine failure occurrences, event messages, and other critical dependent CIs (OS, hardware, network, etc.) required for better identification of a problem. This includes:
- Alert and event definitions for all CIs, relationships to other CIs.
- A service model that defines all CIs for the application and their relationship to other CIs.
- A complete health model describing each CI error description and troubleshooting hints for every type of CI alert.
- A definition of availability for CI via a health model.
The monitoring and development teams should agree on standards such as CI definition, format logging design, performance counters, synthetic transactions, and reporting.
The below diagram depicts an illustrative monitoring event hierarchy.
Stage 3: Baseline and Implement
Suitable processes would be designed to enable effective monitoring, reporting, and alerting of IT services in a multi-vendor environment through the appropriate deployment of the relevant tools and procedures across all elements of the IT services required for those processes to operate effectively.
Monitoring thresholds and parameters would be set up based on industry standards and best practices should be followed across the entire stack (infrastructure and application). Alerting would be set up for informational, warning, and critical problems based on different threshold settings.
Furthermore, required workflows would be defined to deploy the customized policies/profiles/rules across the stack in the IT infrastructure and proper reporting mechanisms (daily/weekly/monthly) would be defined to get hold of the alerts and to maintain SLA. Alerting would be set up to capture events based on the component availability, reliability, performance, capacity management, and error logs, and exceptions and constant focus will be provided for improving the maturity levels of this monitoring.
Opinions expressed by DZone contributors are their own.