DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workkloads.

Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • The Production-Ready Kubernetes Service Checklist
  • 10 Best Practices for Managing Kubernetes at Scale
  • Optimizing Prometheus Queries With PromQL
  • Setting Up a ScyllaDB Cluster on AWS Using Terraform

Trending

  • How the Go Runtime Preempts Goroutines for Efficient Concurrency
  • Mastering Fluent Bit: Installing and Configuring Fluent Bit on Kubernetes (Part 3)
  • While Performing Dependency Selection, I Avoid the Loss Of Sleep From Node.js Libraries' Dangers
  • A Guide to Developing Large Language Models Part 1: Pretraining
  1. DZone
  2. Software Design and Architecture
  3. Cloud Architecture
  4. Galera Warning “Last Inactive Check”

Galera Warning “Last Inactive Check”

In this post, we’ll discuss the Galera warning “last inactive check” and what it means.

By 
Manjot Singh user avatar
Manjot Singh
·
Jun. 06, 16 · Tutorial
Likes (2)
Comment
Save
Tweet
Share
5.3K Views

Join the DZone community and get the full member experience.

Join For Free

Problem

I’ve been working with Percona XtraDB Cluster quite a bit recently and have been investigating various warnings. I came across this one today:

[Warning] WSREP: last inactive check more than PT1.5S ago (PT1.51811S), skipping check

This warning is related to the evs.inactive_check_period option. This option controls the poll period for the group communication response time. If a node is delayed, it is added to a delay list and it can lead to the cluster evicting the node.

Possible Cause

While some troubleshooting tips seem to associate the warning with VMWare snapshots, this isn’t the case here, as we see the warning on a physical machine.

I checked for backups or desynced nodes, and this also wasn’t the case. The warning was not accompanied by any errors or other information, so there was nothing critical happening.

In the troubleshooting link above, Galera developers said:

This can be seen on bare metal as well—with poorly configured mysqld, O/S, or simply being overloaded. All it means is that this thread could not get CPU time for 7.1 seconds. You can imagine that access to resources in virtual machines is even harder (especially I/O) than on bare metal, so you will see this in virtual machines more often.

This is not a Galera specific issue (it just reports being stuck, other mysqld threads are equally stuck) so there is no configuration options for that. You simply must make sure that your system and mysqld are properly configured, that there is enough RAM (buffer pool not over provisioned), that there is swap, that there are proper I/O drivers installed on guest and so on.

Basically, Galera runs in virtual machines as well as the virtual machines approximates bare metal.

It could also be an indication of unstable network or just higher average network latency than expected by the default configuration. In addition to checking network, do check I/O, swap and memory when you do see this warning.

Our graphs and counters otherwise look healthy. If this is the case, this is most likely nothing to worry about.

It is also a good idea to ensure your nodes are desynced before backup. Look for spikes in your workload. A further option to check for is that swappiness is set to 1 on modern kernels.

If all of this looks good, ensure the servers are all talking to the same NTP server, have the same time zone and the times and dates are in sync. While this warning could be a sign of an overloaded system, if everything else looks good this warning isn’t something to worry about.

Source

The warning comes from evs_proto.cpp in the Galera code:

 if (last_inactive_check_ + inactive_check_period_*3 < now)
 {
 log_warn << "last inactive check more than " << inactive_check_period_*3
 << " ago (" << (now - last_inactive_check_)
 << "), skipping check";
 last_inactive_check_ = now;
 return;
 }

Since the default for inactive_check_period is one second according to the Galera documentation, if it is now later than three seconds after the last check, it skips the rest of the above routine and adds the node to the delay list and does some other logic. The reason it does this is that it doesn’t want to rely on stale counters before making decisions. The message is really just letting you know that.

In Percona XtraDB Cluster, this setting defaults to 0.5s. This warning simply could be that your inactive_check_period is too low, and the delay is not high enough to add the node to the delay list. So you could consider increasing evs.inactive_check_period to resolve the warnings. (Apparently in Galera, it may also now be 0.5s but documentation is stale.)

Possible Solution

To find a sane value my colleague David Bennett came up with this command line, which gives you an idea of when your check warnings are happening:

$ cat mysqld.log | grep 'last inactive check more than' | perl -ne 'm/(PT(.*)S)/; print $1."n"' | sort -n | uniq -c
 1 1.55228
 1 1.5523
 1 1.55257
 1 1.55345
 1 1.55363
 1 1.5543
 1 1.55436
 1 1.55483
 1 1.5552
 1 1.55582
 

 Therefore, in this case, it may be a good idea to set inactive_check_period at 1 or 1.5 to make the warnings go away.

Conclusion

Each node in the cluster keeps its own local copy of how it sees the topology of the entire cluster. check_inactive is a node event that is triggered every inactive_check_period second to help the node update its view of the whole cluster, and ensure it is accurate. Service messages can be broadcast to the cluster informing nodes of changes to the topology. For example, if a cluster node is going down it will broadcast a service message telling each node in the cluster to remove it. The action is queued but the actual view of the cluster is updated with check_inactive. This is why it adds nodes to its local copy of inactive, suspect and delayed nodes.

If a node thinks it might be looking at stale data, it doesn’t make these decisions and waits until the next time for a fresh queue. Unfortunately, if inactive_check_period is too low, it will keep giving you warnings.

cluster

Published at DZone with permission of Manjot Singh, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • The Production-Ready Kubernetes Service Checklist
  • 10 Best Practices for Managing Kubernetes at Scale
  • Optimizing Prometheus Queries With PromQL
  • Setting Up a ScyllaDB Cluster on AWS Using Terraform

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!