All You Need to Know About GCache (Galera-Cache)
Using Percona's XtraDB? Here's what you need to know about GCaching, including how it's managed and how it can be best implemented to spread your data.
Join the DZone community and get the full member experience.Join For Free
Why Do We Need GCache?
Percona XtraDB Cluster is a multi-master topology, where a transaction executed on one node is replicated on another node(s) of the cluster. This transaction is then copied over from the group channel to Galera-Cache followed by the apply action.
The cache can be discarded immediately once the transaction is applied, but retaining it can help promote a node as a DONOR node, serving write-sets for a newly booted node.
So in short, GCache acts as a temporary storage for replicated transactions.
How Is GCache Managed?
Naturally, the first choice to cache these write-sets is to use memory allocated pool, which is governed by gcache.mem_store. However, this is deprecated and buggy and shouldn’t be used.
Next on the list is on-disk files. Galera has two types of on-disk files to manage write-sets:
- RingBuffer File:
- A circular file (aka RingBuffer file). As the name suggests, this file is re-usable in a circular queue fashion, and is pre-created when the server starts. The size of this file is preconfigured and can’t be changed dynamically, so selecting a proper size for this file is important.
- The user can set the size of this file using gcache.size. (There are multiple blogs about how to estimate size of the Galera Cache, which is generally linked to downtime. If properly planned, the next booting node will find all the missing write-sets in the cache, thereby avoiding need for SST.)
- Write-sets are appended to this file and, when needed, the file is recycled for use.
- On-demand page store:
- If the transaction write-set is large enough not to fit in a RingBuffer File (actually large enough not to fit in half of the RingBuffer file), then an independent page (physical disk file) is allocated to cache the write-sets.
- Again there are two types of pages:
- Page with standard size: As defined by gcache.page_size (default=128M).
- Page with non-standard page size: If the transaction is large enough not to fit into a standard page, then a non-standard page is created for the transaction. Let’s say gcache.page_size=1M and transaction write_set=1.5M, then a separate page (in turn on-disk file) will be created with a size of 1.5M.
How long are on demand pages retained? This is controlled using following two variables:
- keep_pages_size defines total size of allocated pages to keep. For example, if keep_pages_size=10M then N pages that add up to 10M can be retained. If N pages add to more than 10M, then pages are removed from the start of the queue until the size falls below set threshold. A size of 0 means don’t retain any page.
- gcache.keep_pages_count (PXC specific)
- But before pages are actually removed, a second check is done based on page_count. Let’s say keep_page_count=N+M, then even though N pages adds up to 10M, they will be retained as the page_count threshold is not yet hit. (The exception to this is non-standard pages at the start of the queue.)
So in short, both condition must be satisfied. The recommendation is to use whichever condition is applicable in the user environment.
Where Are GCache Files Located?
The default location is the data directory, but this can be changed by setting gcache.dir. Given the temporary nature of the file, and iterative read/write cycle, it may be wise to place these files in a faster IO disk. Also, the default name of the file is gcache.cache. This is configurable by setting gcache.name.
What if a Node Is DESYNCED and PAUSED?
If a node desyncs, it will continue to received write-sets and apply them, so there is no major change in gcache handling.
If the node is desynced and paused, that means the node can’t apply write-sets and needs to keep caching them. This will, of course, affect the desynced/paused node and the node will continue to create on-demand page store. Since one of the cluster nodes can’t proceed, it will not emit a “last committed” message. In turn, other nodes in the cluster (that can purge the entry) will continue to retain the write-sets, even if these nodes are not desynced and paused.
Published at DZone with permission of Krunal Bauskar, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.