DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations

Trending

  • Does the OCP Exam Still Make Sense?
  • Multi-Stream Joins With SQL
  • Testing, Monitoring, and Data Observability: What’s the Difference?
  • Auto-Scaling Kinesis Data Streams Applications on Kubernetes
  1. DZone
  2. Software Design and Architecture
  3. Cloud Architecture
  4. Galera Flow Control in Percona XtraDB Cluster for MySQL

Galera Flow Control in Percona XtraDB Cluster for MySQL

Peter Zaitsev user avatar by
Peter Zaitsev
·
May. 04, 13 · Interview
Like (1)
Save
Tweet
Share
4.64K Views

Join the DZone community and get the full member experience.

Join For Free
this post comes from jay janssen at the mysql performance blog.

last week at percona live, i delivered a six-hour tutorial about percona xtradb cluster (pxc) for mysql.  i actually had more material than i covered (by design), but one thing i regret we didn’t cover was flow control.  so, i thought i’d write a post covering flow control because it is important to understand.

what is flow control?

one of the things that people don’t often expect when switching to galera is existence of a replication feedback mechanism, unlike anything you find in standard async mysql replication. it is my belief that the lack of understanding of this system, or even that it exists, leads to unnecessary frustration with galera and cluster “stalls” that are preventable.

this feedback, called flow control , allows any node in the cluster to instruct the group when it needs replication to pause and when it is ready for replication to continue. this prevents any node in the synchronous replication group from getting too far behind the others in applying replication.

this may sound counter-intuitive at first: how would synchronous replication get behind? as i’ve mentioned before , galera’s replication is synchronous to the point of ensuring transactions are copied to all nodes and global ordering is established, but apply and commit is asynchronous on all but the node the transaction is run on.

it’s important to realize that galera prevents conflicts to such transactions that have been certified but not yet applied, so multi-node writing will not lead to inconsistencies, but that is beyond the scope of this post.

galera flow control in percona xtradb cluster for mysql tuning flow control

flow control is triggered when a synced node exceeds a specific threshold relative to the size of the receive queue (visible via the wsrep_local_recv_queue global status variable). donor/desynced nodes do not apply flow control, though they may enter states where the recv_queue grows substantially. therefore care should be taken for applications to avoid using donor/desynced nodes, particularly when using a blocking sst method like rsync or mysqldump.

so, flow control kicks in when the recv queue gets too big, but how big is that? and when is flow control relaxed? there are a few settings that are relevant here, and they are all configured via the wsrep_provider_options global variable.

gcs.fc_limit

this setting controls when flow control engages. simply speaking, if the wsrep_local_recv_queue exceeds this size on a given node, a pausing flow control message will be sent. however, it’s a bit trickier than that, because of fc_master_slave (see below).

the fc_limit defaults to 16 transactions. this effectively means that this is as far as a given node can be behind committing transactions from the cluster.

gcs.fc_master_slave

the fc_limit is modified dynamically if you have fc_master_slave disabled (which it is by default). this mode actually adjusts the fc_limit dynamically based on the number of nodes in the cluster . the more nodes in the cluster, the larger the calculated fc_limit becomes. the theory behind this is that the larger the cluster gets (and presumably busier with more writes coming from more nodes), the more leeway each node will get to be a bit further behind applying.

if you only write to a single node in pxc, then it is recommended you disable this feature by setting fc_master_slave =yes. despite its name, this setting really does no more than to change if the fc_limit is dynamically resized or not. it contains no other magic that helps single node writing in pxc to perform better.

gcs.fc_factor

if fc_limit controls when flow control is enabled, then fc_factor addresses when it is released. the factor is a number between 0.0 and 1.0, which is multiplied by the current fc_limit (adjusted by the above calculation if fc_master_slave =no). this yields the number of transactions the recv queue must fall below before another flow control message is sent by the node giving the cluster permission to continue replication.

this setting traditionally defaulted to 0.5, meaning the queue had to fall below 50% of the fc_limit before replication was resumed. a large fc_limit in this case might mean a long wait before flow control gets relaxed again. however, this was recently modified to a default of 1.0 to allow replication to resume as soon as possible.

an example configuration tuning flow control in a master/slave cluster might be:

mysql> set global wsrep_provider_options="gcs.fc_limit=500; gcs.fc_master_slave=yes; gcs.fc_factor=1.0";

working with flow control

what happens during flow control

simply speaking: flow control makes replication stop, and therefore makes writes (which are synchronous) stop, on all nodes until flow control is relaxed .

in normal operation we would expect that a large receive queue might be the result of some brief performance issue on a given node, or perhaps the effect of some large transaction briefly stalling an applier thread.

however, it is possible to halt queue applying on any node by simply by running “flush tables with read lock”, or perhaps by “lock table”, in which case flow control will kick in just as soon as the fc_limit is exceeded. therefore, care must be taken that your application or some other maintenance operation (like a backup) doesn’t inadvertently cause flow control on your cluster.

the cost of increasing the fc_limit

keeping the fc_limit small has two purposes:

  1. it limits the amount of delay any node in the cluster might have applying cluster transactions. therefore, it keeps reads more up to date without needing to use wsrep_causal_reads.
  2. it minimizes the expense of certification by keeping the window between new transactions being committed and the oldest unapplied transaction small. the larger the queue is, the more costly certification gets.

on a master/slave cluster, therefore, it’s reasonable to increase the fc_limit because the only lagging nodes will be the slaves with no writes coming from them. however, with multi-node writing, larger queues will make certification more expensive and therefore time-consuming.

how to tell if flow control is happening and where it is coming from

there are two global status variables you can check to see what flow control is happening:

  • wsrep_flow_control_paused – the fraction of time (out of 1.0) since the last show global status that flow control is effect, regardless of which node caused it. generally speaking, anything above 0.0 is to be avoided.
  • wsrep_flow_control_sent – the number of flow control messages sent by the local node to the cluster. this can be used to discover which node is causing flow control.

i would strongly recommend monitoring and graphing wsrep_flow_control_sent so you can tell if and when flow control is happening and what node (or nodes) are causing it.

using myq_gadgets , i can easily see flow control if i execute a flush tables with read lock on node3:

[root@node3 ~]# myq_status wsrep
wsrep    cluster        node           queue   ops     bytes     flow        conflct
    time  name p cnf  #  name  cmt sta  up  dn  up  dn   up   dn pau snt dst lcf bfa
09:22:17 myclu p   3  3 node3 sync t/t   0   0   0   9    0  13k 0.0   0 101   0   0
09:22:18 myclu p   3  3 node3 sync t/t   0   0   0  18    0  28k 0.0   0 108   0   0
09:22:19 myclu p   3  3 node3 sync t/t   0   4   0   3    0 4.3k 0.0   0 109   0   0
09:22:20 myclu p   3  3 node3 sync t/t   0  18   0   0    0    0 0.0   0 109   0   0
09:22:21 myclu p   3  3 node3 sync t/t   0  27   0   0    0    0 0.0   0 109   0   0
09:22:22 myclu p   3  3 node3 sync t/t   0  29   0   0    0    0 0.9   1 109   0   0
09:22:23 myclu p   3  3 node3 sync t/t   0  29   0   0    0    0 1.0   0 109   0   0
09:22:24 myclu p   3  3 node3 sync t/t   0  29   0   0    0    0 1.0   0 109   0   0
09:22:25 myclu p   3  3 node3 sync t/t   0  29   0   0    0    0 1.0   0 109   0   0
09:22:26 myclu p   3  3 node3 sync t/t   0  29   0   0    0    0 1.0   0 109   0   0
09:22:27 myclu p   3  3 node3 sync t/t   0  29   0   0    0    0 1.0   0 109   0   0
09:22:20 myclu p   3  3 node3 sync t/t   0  29   0   0    0    0 1.0   0 109   0   0
09:22:21 myclu p   3  3 node3 sync t/t   0  29   0   0    0    0 1.0   0 109   0   0
09:22:22 myclu p   3  3 node3 sync t/t   0  29   0   0    0    0 1.0   0 109   0   0

notice node3′s queue fills up, it sends 1 flow control message (to pause) and then flow control is in a pause state 100% of the time.  we can tell flow control came from this node because ‘flow snt’  shows a message sent as soon as flow control is engaged.

flow control and state transfer donation

donor nodes should not cause flow control because they are moved from the synced to the donor/desynced state. donors in that state will continue to apply replication as they are permitted, but will build up a large replication queue without flow control if they are blocked by the underlying sst method, i.e., by flush tables with read lock.





Flow control (data) Flow (web browser) cluster MySQL XtraDB

Published at DZone with permission of Peter Zaitsev, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Trending

  • Does the OCP Exam Still Make Sense?
  • Multi-Stream Joins With SQL
  • Testing, Monitoring, and Data Observability: What’s the Difference?
  • Auto-Scaling Kinesis Data Streams Applications on Kubernetes

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com

Let's be friends: