Database Monitoring: Key Metrics and Considerations
Database monitoring is as crucial as databases themselves. How good is your data if you can't guarantee its availability and accuracy?
Join the DZone community and get the full member experience.
Join For FreeThis is an article from DZone's 2023 Database Systems Trend Report.
For more:
Read the Report
Hearing the vague statement, "We have a problem with the database," is a nightmare for any database manager or administrator. Sometimes it's true, sometimes it's not, and what exactly is the issue? Is there really a database problem? Or is it a problem with networking, an application, a user, or another possible scenario? If it is a database, what is wrong with it?
Figure 1: DBMS usage
Databases are a crucial part of modern businesses, and there are a variety of vendors and types to consider. Databases can be hosted in a data center, in the cloud, or in both for hybrid deployments. The data stored in a database can be used in various ways, including websites, applications, analytical platforms, etc.
As a database administrator or manager, you want to be aware of the health and trends of your databases. Database monitoring is as crucial as databases themselves. How good is your data if you can't guarantee its availability and accuracy?
Database Monitoring Considerations
Database engines and databases are systems hosted on a complex IT infrastructure that consists of a variety of components: servers, networking, storage, cables, etc. Database monitoring should be approached holistically with consideration of all infrastructure components and database monitoring itself.
Figure 2: Database monitoring clover
Let's talk more about database monitoring. As seen in Figure 2, I'd combine monitoring into four pillars: availability, performance, activity, and compliance. These are broad but interconnected pillars with overlap. You can add a fifth "clover leaf" for security monitoring, but I include that aspect of monitoring into activity and compliance, for the same reason capacity planning falls into availability monitoring.
Let's look deeper into monitoring concepts. While availability monitoring seems like a good starting topic, I will deliberately start with performance since performance issues may render a database unavailable and because availability monitoring is "monitoring 101" for any system.
Performance Monitoring
Performance monitoring is the process of capturing, analyzing, and alerting to performance metrics of hardware, OS, network, and database layers. It can help avoid unplanned downtimes, improve user experience, and help administrators manage their environments efficiently.
Native Database Monitoring
Most, if not all, enterprise-grade database systems come with a set of tools that allow database professionals to examine internal and/or external database conditions and the operational status. These are system-specific, technical tools that require SME knowledge. In most cases, they are point-in-time performance data with limited or non-existent historical value. Some vendors provide additional tools to simplify performance data collection and analysis.
With an expansion of cloud-based offerings (PaaS or IaaS), I've noticed some improvements in monitoring data collection and the available analytics and reporting options. However, native performance monitoring is still a set of tools for a database SME.
Enterprise Monitoring Systems
Enterprise monitoring systems (EMSs) offer a centralized approach to keeping IT systems under systematic review. Such systems allow monitoring of most IT infrastructure components, thus consolidating supervised systems with a set of dashboards. There are several vendors offering comprehensive database monitoring systems to cover some or all your monitoring needs. Such solutions can cover multiple database engines or be specific to a particular database engine or a monitoring aspect. For instance, if you only need to monitor SQL servers and are interested in the performance of your queries, then you need a monitoring system that identifies bottlenecks and contentions.
Let's discuss environments with thousands of database instances (on-premises and in a cloud) scattered across multiple data centers across the globe. This involves monitoring complexity growth with a number of monitored devices, database type diversity, and geographical locations of your data centers and actual data that you monitor. It is imperative to have a global view of all database systems under one management and an ability to identify issues, preferably before they impact your users.
EMSs are designed to help organizations align database monitoring with IT infrastructure monitoring, and most solutions include an out-of-the-box set of dashboards, reports, graphs, alerts, useful tips, and health history and trends analytics. They also have pre-set industry-outlined thresholds for performance counters/metrics that should be adjusted to your specific conditions.
Manageability and Administrative Overhead
Native database monitoring is usually handled by a database administrator (DBA) team. If it needs to be automated, expanded, or have any other modifications, then DBA/development teams would handle that. This can be efficiently managed by DBAs in a large enterprise environment on a rudimental level for internal DBA specific use cases.
Bringing in a third-party system (like an EMS) requires management. Hypothetically, a vendor has installed and configured monitoring for your company. That partnership can continue, or internal personnel can take over EMS management (with appropriate training). There is no "wrong" approach — it solely depends on your company's operating model and is assessed accordingly.
Data Access and Audit Compliance Monitoring
Your databases must be secure! Unauthorized access to sensitive data could be as harmful as data loss. Data breaches, malicious activities (intentional or not) — no company would be happy with such publicity. That brings us to audit compliance and data access monitoring. There are many laws and regulations around data compliance. Some are common between industries, some are industry-specific, and some are country-specific. For instance, SOX compliance is required for all public companies in numerous countries, and US healthcare must follow HIPAA regulations.
Database management teams must implement a set of policies, procedures, and processes to enforce laws and regulations applicable to their company. Audit reporting could be a tedious and cumbersome process, but it can and should be automated. While implementing audit compliance and data access monitoring, you can improve your database audit reporting, as well — it's virtually the same data set.
What do we need to monitor to comply with various laws and regulations? These are normally mandatory:
- Access changes and access attempts
- Settings and/or objects modifications
- Data modifications/access
- Database backups
Who should be monitored? Usually, access to make changes to a database or data is strictly controlled:
- Privileged accounts – usually DBAs; ideally, they shouldn't be able to access data, but that is not always possible in their job so activity must be monitored
- Service accounts – either database or application service accounts with rights to modify objects or data
- "Power" accounts – users with rights to modify database objects or data
- "Lower" accounts – accounts with read-only activity
As with performance monitoring, most database engines provide a set of auditing tools and mechanisms. Another option is third-party compliance software, which uses database-native auditing, logs, and tracing to capture compliance-related data. It provides audit data storage capabilities and, most importantly, a set of compliance reports and dashboards to adhere to a variety of compliance policies. Compliance complexity directly depends on regulations that apply to your company and the diversity and size of your database ecosystem.
While we monitor access and compliance, we want to ensure that our data is not being misused. An adequate measure should be in place for when unauthorized access or abnormal data usage is detected. Some audit compliance monitoring systems provide means to block abnormal activities.
Data Corruption and Threats
Database data corruption is a serious issue that could lead to a permanent loss of valuable data. Commonly, data corruption occurs due to hardware failures, but it could be due to database bugs or even bad coding. Modern database engines have built-in capabilities to detect and sometimes prevent data corruption. Data corruption will generate an appropriate error code that should be monitored and highlighted. Checking database integrity should be a part of the periodical maintenance process.
Other threats include intentional or unintentional data modification and ransomware. While data corruption and malicious data modification can be detected by DBAs, ransomware threats fall outside of the monitoring scope for database professionals. It is imperative to have a bulletproof backup to recover from those threats.
Key Database Performance Metrics
Database performance metrics are extremely important data points that measure the health of database systems and help database professionals maintain efficient support. Some of the metrics are specific to a database type or vendor, and I will generalize them as "internal counters."
Availability
The first step in monitoring is to determine if a device or resource is available. There is a thin line between system and database availability. A database could be up and running, but clients may not be able to access it. With that said, we need to monitor the following metrics:
- Network status – Can you reach the database over the network? If yes, what is the latency? While network status may not commonly fall into the direct responsibility of a DBA, database components have configuration parameters that might be responsible for a loss of connectivity.
- Server up/down
- Storage availability
- Service up/down – another shared area between database and OS support teams
- Whether the database is online or offline
CPU, Memory, Storage, and Database Internal Metrics
The next important set of server components which could, in essence, escalate into an availability issue are CPU, memory, and storage. The following four performance areas are tightly interconnected and affect each other:
- Lack of available memory
- High CPU utilization
- Storage latency or throughput bottleneck
- Set of database internal counters which could provide more content to utilization issues
For instance, lack of memory may force a database engine to read and write data more frequently, creating contention on the IO system. 100% CPU utilization could often cause an entire database server to stop responding. Numerous database internal counters can help database professionals analyze use trends and identify an appropriate action to mitigate potential impact.
Observability
Database observability is based on metrics, traces, and logs — what we supposedly collected based on the discussion above. There are a plethora of factors that may affect system and application availability and customer experience. Database performance metrics are just a single set of possible failure points.
Supporting the infrastructure underneath a database engine is complex. To successfully monitor a database, we need to have a clear picture of the entire ecosystem and the state of its components while monitoring. Relevant performance data collected from various components can be a tremendous help in identifying and addressing issues before they occur.
The entire database monitoring concept is data driven, and it is our responsibility to make it work for us. Monitoring data needs to tell us a story that every consumer can understand. With database observability, this story can be transparent and provide a clear view of your database estate.
Balanced Monitoring
As you could gather from this article, there are many points of failure in any database environment. While database monitoring is the responsibility of database professionals, it is a collaborative effort of multiple teams to ensure that your entire IT ecosystem is operational. So what's considered "too much" monitoring and when is it not enough? I will use DBAs' favorite phrase: it depends.
- Assess your environment – It would be helpful to have a configuration management database. If you don't, create a full inventory of your databases and corresponding applications: database sizes, number of users, maintenance schedules, utilization times — as many details as possible.
- Assess your critical systems – Outline your critical systems and relevant databases. Most likely those will fall into a category of maximum monitoring: availability, performance, activity, and compliance.
- Assess your budget – It's not uncommon to have a tight cash flow allocated to IT operations. You may or may not have funds to purchase a "we-monitor-everything" system, and certain monitoring aspects would have to be developed internally.
- Find a middle ground – Your approach to database monitoring is unique to your company's requirements. Collecting monitoring data that has no practical or actionable applications is not efficient. Defining actionable KPIs for your database monitoring is a key to finding a balance — monitor what your team can use to ensure systems availability, stability, and satisfied customers.
Remember: Successful database monitoring is data-driven, proactive, continuous, actionable, and collaborative.
This is an article from DZone's 2023 Database Systems Trend Report.
For more:
Read the Report
Opinions expressed by DZone contributors are their own.
Comments