Increasing fc_limit Can Affect SELECT Latency
It is not a good idea to increase the fc_limit beyond some value — that is, unless you simply don’t care about data freshness.
Join the DZone community and get the full member experience.
Join For FreeIn this blog post, we’ll look at how increasing the fc_limit
can affect SELECT
latency.
Introduction
Recent Percona XtraDB Cluster optimizations have exposed fc_limit contention. It was always there but was never exposed as the Commit Monitor contention was more significant. As it happens with any optimization, once we solve the bigger contention issues, smaller contention issues start popping up. We have seen this pattern in InnoDB, and Percona XtraDB Cluster is no exception. In fact, it is good because it tells us that we are on the right track.
If you haven’t yet checked the performance blogs, then please visit here and here.
What Is fc_limit?
Percona XtraDB Cluster has the concept of Flow Control. If any member of the cluster (not garbd
) is unable to match the apply
speed with the replicated write-set
speed, then the queue builds up. If this queue crosses some threshold (dictated by gcs.fc_limit
), then flow control kicks in. Flow control causes members of the cluster to temporary halt/slowdown so that the slower node can catch up.
The user can, of course, disable this by setting wsrep_desync=1
on the slower node, but make sure you understand the effect of doing so. Unless you have a good reason, you should avoid setting it.
mysql> show status like 'wsrep_flow_control_interval';
+-----------------------------+------------+
| Variable_name | Value |
+-----------------------------+------------+
| wsrep_flow_control_interval | [ 16, 16 ] |
+-----------------------------+------------+
1 row in set (0.01 sec)
Increasing fc_limit
Until recently, the default fc_limit
was 16 (starting with Percona XtraDB Cluster 5.7.17-29.20, the default is 100). This worked until now since Percona XtraDB Cluster failed to scale and rarely hit the limit of 16. With new optimizations, Percona XtraDB Cluster nodes can process more write-sets in a given time period, and thereby can replicate more write-sets (anywhere in the range of three to ten times). Of course, the replicating/slave nodes are also performing at a higher speed. But depending on the slave threads, it is easy to start hitting this limit.
So, what is the solution?
Increase fc_limit
from 16 to something really big. Say, 1,600.
Is this correct?
Yes and no.
Why yes?
- If you don’t care about the freshness of data on the replicated nodes, then increasing the limit to a higher value is not an issue. Say setting it to 10K means that the replicating node is holding 10K write-sets to replicate, and a SELECT fired during this time will not view changes from these 10K write-sets.
- But if you insist on having fresh data, then Percona XtraDB Cluster has a solution for this (set
wsrep_sync_wait=7
). - Setting
wsrep_sync_wait
places theSELECT
request in a queue that is serviced only after existing replicated write-sets (at the point when theSELECT
was fired) are done with. If the queue has 8K write-sets, thenSELECT
is placed at the 8K+1 position. As the queue progresses,SELECT
gets serviced only when all those 8K write-sets are done. This insanely increasesSELECT
latency and can cause all Monitoring ALARM to go ON.
Why no?
- For the reason mentioned above, we feel it is not a good idea to increase the
fc_limit
beyond some value unless you don’t care about data freshness and in turn don’t care to setwsrep_sync_wait
. - We did a small experiment with the latest Percona XtraDB Cluster release to understand the effects.
- Started 2 node cluster.
- Fired 64-threads workload on node-1 of the cluster.
- node-2 is acting as replicating slave without any active workload.
- Set wsrep_sync_wait=7 on node-2 to ensure data-freshness.
Using default fc_limit (= 16)
-----------------------------
mysql> select sum(k) from sbtest1 where id > 5000 and id < 50000;
+-------------+
| sum(k) |
+-------------+
| 22499552612 |
+-------------+
1 row in set (0.03 sec)
Increasing it from 16 -> 1600
-----------------------------
mysql> set global wsrep_provider_options="gcs.fc_limit=1600";
Query OK, 0 rows affected (0.00 sec)
mysql> select sum(k) from sbtest1 where id > 5000 and id < 50000;
+-------------+
| sum(k) |
+-------------+
| 22499552612 |
+-------------+
1 row in set (0.46 sec)
That is whopping 15x increase in SELECT latency.
Increasing it even further (1600 -> 25000)
-------------------------------------------
mysql> set global wsrep_provider_options="gcs.fc_limit=25000";
Query OK, 0 rows affected (0.00 sec)
mysql> select sum(k) from sbtest1 where id > 5000 and id < 50000;
+-------------+
| sum(k) |
+-------------+
| 22499552612 |
+-------------+
1 row in set (7.07 sec)
Note: wsrep_sync_wait=7
will enforce the check for all DMLs (INSERT
/UPDATE
/DELETE
/SELECT
). We highlighted the SELECT
example, as that is more concerning at first go. But latency for other DMLs also increases for the same reasons as mentioned above.
Conclusion
Let’s conclude with the following observation: Avoid increasing fc_limit
to an insanely high value as it can affect SELECT
latency (if you are running a SELECT
session with wsrep_sync_wait=7
for data freshness).
Published at DZone with permission of Krunal Bauskar, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments