DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
View Events Video Library
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Integrating PostgreSQL Databases with ANF: Join this workshop to learn how to create a PostgreSQL server using Instaclustr’s managed service

Mobile Database Essentials: Assess data needs, storage requirements, and more when leveraging databases for cloud and edge applications.

Monitoring and Observability for LLMs: Datadog and Google Cloud discuss how to achieve optimal AI model performance.

Automated Testing: The latest on architecture, TDD, and the benefits of AI and low-code tools.

Related

  • AWS Multi-Region Resiliency Aurora MySQL Global DB With Headless Clusters
  • Apache Ranger and AWS EMR Automated Installation and Integration Series (5): Windows AD + Open-Source Ranger
  • Apache Ranger and AWS EMR Automated Installation and Integration Series (4): OpenLDAP + Open-Source Ranger
  • Apache Ranger and AWS EMR Automated Installation and Integration Series (3): Windows AD + EMR-Native Ranger

Trending

  • How To Optimize Feature Sets With Genetic Algorithms
  • An Introduction to Build Servers and Continuous Integration
  • Creating a Deep vs. Shallow Copy of an Object in Java
  • Extracting Maximum Value From Logs
  1. DZone
  2. Software Design and Architecture
  3. Cloud Architecture
  4. Diagnosing SST Errors with Percona XtraDB Cluster for MySQL

Diagnosing SST Errors with Percona XtraDB Cluster for MySQL

Peter Zaitsev user avatar by
Peter Zaitsev
·
Apr. 27, 15 · Interview
Like (0)
Save
Tweet
Share
10.86K Views

Join the DZone community and get the full member experience.

Join For Free
[This article was written by Stephane Combaudon]
State Snapshot Transfer (SST) is used in Percona XtraDB Cluster (PXC) when a new node joins the cluster or to resync a failed node if Incremental State Transfer (IST) is no longer available. SST is triggered automatically but there is no magic: If it is not configured properly, it will not work and new nodes will never be able to join the cluster. Let’s have a look at a few classic issues.

Port for SST is not open

The donor and the joiner communicate on port 4444, and if the port is closed on one side, SST will always fail.

You will see in the error log of the donor that SST is started:

[...]
141223 16:08:48 [Note] WSREP: Node 2 (node1) requested state transfer from '*any*'. Selected 0 (node3)(SYNCED) as donor.
141223 16:08:48 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 6)
141223 16:08:48 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
141223 16:08:48 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'donor' --address '192.168.234.101:4444/xtrabackup_sst' --auth 'sstuser:s3cret' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --gtid '04c085a1-89ca-11e4-b1b6-6b692803109b:6''
[...]


But then nothing happens, and some time later you will see a bunch of errors:

[...]
2014/12/23 16:09:52 socat[2965] E connect(3, AF=2 192.168.234.101:4444, 16): Connection timed out
WSREP_SST: [ERROR] Error while getting data from donor node:  exit codes: 0 1 (20141223 16:09:52.057)
WSREP_SST: [ERROR] Cleanup after exit with status:32 (20141223 16:09:52.064)
WSREP_SST: [INFO] Cleaning up temporary directories (20141223 16:09:52.068)
141223 16:09:52 [ERROR] WSREP: Failed to read from: wsrep_sst_xtrabackup-v2 --role 'donor' --address '192.168.234.101:4444/xtrabackup_sst' --auth 'sstuser:s3cret' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --gtid '04c085a1-89ca-11e4-b1b6-6b692803109b:6'
[...]

On the joiner side, you will see a similar sequence: SST is started, then hangs and is finally aborted:

[...]
141223 16:08:48 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 6)
141223 16:08:48 [Note] WSREP: Requesting state transfer: success, donor: 0
141223 16:08:49 [Note] WSREP: (f9560d0d, 'tcp://0.0.0.0:4567') turning message relay requesting off
141223 16:09:52 [Warning] WSREP: 0 (node3): State transfer to 2 (node1) failed: -32 (Broken pipe)
141223 16:09:52 [ERROR] WSREP: gcs/src/gcs_group.cpp:long int gcs_group_handle_join_msg(gcs_group_t*, const gcs_recv_msg_t*)():717: Will never receive state. Need to abort.

The solution is of course to make sure that the ports are open on both sides.

SST is not correctly configured

Sometimes you will see an error like this on the donor:

141223 21:03:15 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'donor' --address '192.168.234.102:4444/xtrabackup_sst' --auth 'sstuser:s3cretzzz' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --gtid 'e63f38f2-8ae6-11e4-a383-46557c71f368:0''
[...]
WSREP_SST: [ERROR] innobackupex finished with error: 1.  Check /var/lib/mysql//innobackup.backup.log (20141223 21:03:26.973)

And if you look at innobackup.backup.log:

41223 21:03:26  innobackupex: Connecting to MySQL server with DSN 'dbi:mysql:;mysql_read_default_file=/etc/my.cnf;mysql_read_default_group=xtrabackup;mysql_socket=/var/lib/mysql/mysql.sock' as 'sstuser'  (using password: YES).
innobackupex: got a fatal error with the following stacktrace: at /usr//bin/innobackupex line 2995
	main::mysql_connect('abort_on_error', 1) called at /usr//bin/innobackupex line 1530
innobackupex: Error: Failed to connect to MySQL server: DBI connect(';mysql_read_default_file=/etc/my.cnf;mysql_read_default_group=xtrabackup;mysql_socket=/var/lib/mysql/mysql.sock','sstuser',...) failed: Access denied for user 'sstuser'@'localhost' (using password: YES) at /usr//bin/innobackupex line 2979

What happened?

The default SST method is xtrabackup-v2 and for it to work, you need to specify a username/password in the my.cnf file:

[mysqld]
wsrep_sst_auth=sstuser:s3cret

And you also need to create the corresponding MySQL user:

mysql> GRANT RELOAD, LOCK TABLES, REPLICATION CLIENT ON *.* TO 'sstuser'@'localhost' IDENTIFIED BY 's3cret';

So you should check that the user has been correctly created in MySQL and that wsrep_sst_auth is correctly set.

Galera versions do not match

Here is another set of errors you may see in the error log of the donor:

141223 21:14:27 [Warning] WSREP: unserialize error invalid flags 2: 71 (Protocol error)
	 at gcomm/src/gcomm/datagram.hpp:unserialize():101
141223 21:14:30 [Warning] WSREP: unserialize error invalid flags 2: 71 (Protocol error)
	 at gcomm/src/gcomm/datagram.hpp:unserialize():101
141223 21:14:33 [Warning] WSREP: unserialize error invalid flags 2: 71 (Protocol error)
	 at gcomm/src/gcomm/datagram.hpp:unserialize():101

Here the issue is that you try to connect a node using Galera 2.x and a node running Galera 3.x. This can happen if you try to use a PXC 5.5 node and a PXC 5.6 node.

The right solution is probably to understand why you ended up with such inconsistent versions and make sure all nodes are using the same Percona XtraDB Cluster version and Galera version.

But if you know what you are doing, you can also instruct the node using Galera 3.x that it will communicate with Galera 2.x nodes by specifying in the my.cnf file:

[mysqld]
wsrep_provider_options="socket.checksum=1"

Conclusion

SST errors can have multiple reasons for occurring, and the best way to diagnose the issue is to have a look at the error log of the donor and the joiner. Galera is in general quite verbose so you can follow the progress of SST on both nodes and see where it fails. Then it is mostly about being able to interpret the error messages.



cluster MySQL XtraDB

Published at DZone with permission of Peter Zaitsev, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • AWS Multi-Region Resiliency Aurora MySQL Global DB With Headless Clusters
  • Apache Ranger and AWS EMR Automated Installation and Integration Series (5): Windows AD + Open-Source Ranger
  • Apache Ranger and AWS EMR Automated Installation and Integration Series (4): OpenLDAP + Open-Source Ranger
  • Apache Ranger and AWS EMR Automated Installation and Integration Series (3): Windows AD + EMR-Native Ranger

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: