Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

MariaDB ColumnStore Data Redundancy: A Look Under the Hood

DZone's Guide to

MariaDB ColumnStore Data Redundancy: A Look Under the Hood

Learn how to use MariaDB ColumnStore data redundancy in order to have highly available storage and automated PM failover when using local disk storage.

· Database Zone ·
Free Resource

MariaDB TX, proven in production and driven by the community, is a complete database solution for any and every enterprise — a modern database for modern applications.

In this post, we take a close look at MariaDB ColumnStore data redundancy, a new feature of MariaDB AX. This feature enables you to have highly available storage and automated PM failover when using local disk storage.

MariaDB ColumnStore data redundancy leverages an open-source filesystem called GlusterFS, maintained by RedHat. GlusterFS is an open-source distributed file system that provides continued access to data and is capable of scaling very large data. To enable data redundancy, you must install and enable GlusterFS prior to running postConfigure. For more information on this topic, refer to Preparing for MariaDB ColumnStore Installation: 1.1.X. Failover is configured automatically by MariaDB ColumnStore so that if a physical server experiences a service interruption, data is still accessible from another PM node.

During postConfigure, you are prompted to enter the number of redundant copies for each dbroot:

Enter Number of Copies [2-N] (2) >

N = Number of PMs. (An actual number is displayed by postConfigure.)

On a multi-node install with internal storage, the DBRoots are tied directly to a single PM.

On a multi-node install with data redundancy, replicated GlusterFS volumes are created for each DBRoot. To users on the outside, this appears to be the same as above. Under the hood, a DBRoot is now a gluster volume, where a gluster volume is a collection of gluster bricks that map to directories on the local file system located here:

/usr/local/mariadb/columnstore/gluster/brick(n) (Default). 

This directory will contain the subdirectories brick1 to brick[n], where n = copies configured.

Note: Bricks are numbered sequentially on each PM, as they are created by MariaDB ColumnStore and are not related to each other or to DBRoot IDs.

Visually:


A three-PM installation with data redundancy copies = 2

In mcsadmin getStorageConfig, this is displayed in text form like this:

Data Redundant Configuration

Copies Per DBroot = 2
DBRoot #1 has copies on PMs = 1 2 
DBRoot #2 has copies on PMs = 2 3 
DBRoot #3 has copies on PMs = 1 3 

The number of copies can be increased as high as the number of PMs. For a three-PM system, that would that would look like this:


A three-PM installation with data redundancy copies = 3

It is important to note that as the number of copies increases, the amount of network resources for distributing redundant data between PMs also increases. The configuration of the number of copies should be kept as low as your data redundancy requirements allow. Alternatively, if hardware configurations allow, a dedicated network can be configured during installation with postConfigure to help offload gluster network data.

MariaDB ColumnStore assigns DBRoots to a PM by using GlusterFS to mount a dbroot to its associated data directory and used as normal.

PM1:

mount -tglusterfs PM1:/dbroot1 /usr/local/mariadb/columnstore/data1

PM2:

mount -tglusterfs PM2:/dbroot2 /usr/local/mariadb/columnstore/data2

PM3:

mount -tglusterfs PM3:/dbroot3 /usr/local/mariadb/columnstore/data3

At this point, when a change is made to any files in a data(n) directory, it is copied to the connected brick. Only the assigned bricks are mounted as the logical DBRoots. The unassigned bricks are standby copies waiting for a failover event.

Three-PM data redundancy copies = 2

A failover occurs when a service interruption is detected from a PM. In a normal local disk installation, data stored on the dbroot for that module would be inaccessible. With data redundancy, a small interruption occurs while the DBRoot is reassigned to the secondary brick.

In our example system, PM #3 has lost power. PM #1 would be assigned DBRoot3 along with DBRoot1 since it has been maintaining the replica brick for DBroot3. PM #2 will see no change.

Three-PM data redundancy copies = 2 and failure of PM #3

When PM #3 returns, data changes for DBRoot3 and DBRoot2 will be synced across bricks for the volumes by GlusterFS. PM #3 returns to operational and DBRoot3 is unmounted from PM #1 and returned to PM #3.

Three-PM data redundancy copies = 2 and PM #3 recovered

This is only a simple example meant to illustrate how MariaDB ColumnStore with data redundancy leverages GlusterFS to provide a simple and effective way to keep your data accessible through service interruptions.

We are excited to offer data redundancy as part of MariaDB ColumnStore 1.1, which is available for download as part of MariaDB AX, an enterprise open-source solution for modern data analytics and data warehousing.

MariaDB AX is an open source database for modern analytics: distributed, columnar and easy to use.

Topics:
mariadb columnstore ,database ,data warehousing ,data analytics ,data redundancy ,tutorial

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}