Over a million developers have joined DZone.

Streaming Data From MariaDB Server Into MariaDB ColumnStore via MariaDB MaxScale

DZone's Guide to

Streaming Data From MariaDB Server Into MariaDB ColumnStore via MariaDB MaxScale

Read this article in order to learn how to stream data from MariaDB server into MariaDB Columnstore via MariaDB MaxScale.

· Database Zone ·
Free Resource

MariaDB TX, proven in production and driven by the community, is a complete database solution for any and every enterprise — a modern database for modern applications.

In this blog post, we look at how to configure Change Data Capture (CDC) from the MariaDB Server to MariaDB ColumnStore via MariaDB MaxScale. Our goal in this blog post is to have our analytical ColumnStore instance reflect the changes that happen on our operational MariaDB Server.

MariaDB MaxScale Configuration

We start by creating a MaxScale configuration with a binlogrouter and avrorouter instances. The former acts as a replication slave and fetches binary logs and the latter processes the binary logs into CDC records.





Copy the contents of this file into the `maxscale.cnf` file.

The docker-compose.yml File

The next step is to clone the MaxScale repository and to create the docker-compose file.

To clone the MaxScale repository, execute the following command.

git clone https://github.com/mariadb-corporation/MaxScale.git --branch=2.2 --depth=1

After the command completes, create the "docker-compose.yml" file with the following contents in the same directory where you cloned MaxScale.

version: '2'
        image: mariadb:10.2
        container_name: master
        command: mysqld --log-bin=mariadb-bin --binlog-format=ROW --server-id=1
            - "3306:3306"

        build: ./MaxScale/docker/
        container_name: maxscale
            - ./maxscale.cnf:/etc/maxscale.cnf.d/maxscale.cnf
            - "3307:3306"
            - "4001:4001"

        image: mariadb/columnstore_singlenode:latest
        container_name: mcs
            - "3308:3306"

        image: centos:7
        container_name: adapter
        command: /bin/sleep 0xffffffff

This file contains a MariaDB Server that acts as the master server, a MaxScale instance in a CDC configuration and a single-node ColumnStore container. We also use a plain CentOS 7 container where we install the adapter.

To start the cluster, run the following commands.

docker-compose build
docker-compose up -d


The next step is to copy the ColumnStore configuration file from the "mcs" container and modify it to use the container hostname instead of the loopback address. To do this, execute the following commands.

docker cp mcs:/usr/local/mariadb/columnstore/etc/Columnstore.xml .
sed -i 's/' Columnstore.xml
docker cp Columnstore.xml adapter:/etc/Columnstore.xml

After we have copied the configuration file into the "adapter" container, we are ready to install the adapter.

Installing Adapter

To access the container, execute "docker-compose exec adapter bash." This will launch a new shell where the following commands will be executed.

yum -y install epel-release
yum -y install https://downloads.mariadb.com/Data-Adapters/mariadb-columnstore-api/1.1.3/centos/x86_64/7/mariadb-columnstore-api-1.1.3-1-x86_64-centos7.rpm
curl -sS https://downloads.mariadb.com/MariaDB/mariadb_repo_setup | bash
yum -y install https://downloads.mariadb.com/Data-Adapters/mariadb-streaming-data-adapters/cdc-data-adapter/1.1.3/centos-7/mariadb-columnstore-maxscale-cdc-adapters-1.1.3-1-x86_64-centos7.rpm

After the adapter is installed, exit the shell.

Next, we can start preparing the data on the master server and configure the replication between it and MaxScale.

Preparing Data and Configuring Replication

We connect to the MariaDB Server running on the "master" container with the following command.

mysql -uroot -h -P 3306

Once connected, executing the following SQL. This will prepare the server, create a table and insert some dummy data into the table. It also modified the data to emulate changes in the database.

CREATE USER 'maxuser'@'%' IDENTIFIED BY 'maxpwd';
GRANT ALL ON *.* TO 'maxuser'@'%';
USE test;
INSERT INTO t1 VALUES (1), (2), (3);
UPDATE t1 SET id = 4 WHERE id = 2;

Once we have created some data, we configure the replication between MaxScale and the master server. To do this, execute the following command.

mysql -umaxuser -pmaxpwd -h -P 3307 -e "CHANGE MASTER TO MASTER_HOST='master', MASTER_PORT=3306, MASTER_USER='maxuser', MASTER_PASSWORD='maxpwd', MASTER_LOG_FILE='mariadb-bin.000001', MASTER_LOG_POS=4; START SLAVE"

MaxScale will start to replicate events from the master server and process them into CDC records.

Create CDC User

To use the CDC system in MaxScale, we have to create a user for it. Execute the following command to create a user.

docker-compose exec maxscale maxctrl call command cdc add_user avro-router cdcuser cdcpassword

Starting the Adapter

We again execute the commands inside the adapter container. To access the container, execute "docker-compose exec adapter bash."

Once inside the container, we can try to start the adapter. Given that the table "test.t1" does not exist on ColumnStore, the adapter will give us an error when we try to start it:

[root@d444d5c5b820 /]$ mxs_adapter -c /etc/Columnstore.xml -u cdcuser -p cdcpassword -h maxscale -P 4001 test t1
Table not found, create with:

    CREATE TABLE test.t1 (domain int, event_number int, event_type varchar(50), id int, sequence int, server_id int, timestamp int) ENGINE=ColumnStore;

To create the table on ColumnStore, we have to exit the container. Once out of the container, we connect to the ColumnStore container and create the table described in the error message with the following command.

mysql -uroot -h -P 3308 -e "CREATE TABLE test.t1 (domain int, event_number int, event_type varchar(50), id int, sequence int, server_id int, timestamp int) ENGINE=ColumnStore;"

Once the table is created, we go back into the adapter container with "docker-compose exec adapter bash" and try to start it again.

[root@d444d5c5b820 /]$ mxs_adapter -c /etc/Columnstore.xml -u cdcuser -p cdcpassword -h maxscale -P 4001 test t1 4 rows and 1 transactions inserted in 0.210798 seconds. GTID = 0-1-6 2 rows and 1 transactions inserted in 0.164197 seconds. GTID = 0-1-7

This time we see that it processed a total of six rows of data. We can now connect to the
ColumnStore on another terminal and see what the table contains.

[markusjm@localhost blog]$ mysql -uroot -h -P 3308 -e "SELECT * FROM test.t1"
| domain | event_number | event_type    | id   | sequence | server_id | timestamp  |
|      0 |            1 | insert        |    1 |        5 |         1 | 1523948280 |
|      0 |            2 | insert        |    2 |        5 |         1 | 1523948280 |
|      0 |            3 | insert        |    3 |        5 |         1 | 1523948280 |
|      0 |            1 | update_before |    2 |        6 |         1 | 1523948280 |
|      0 |            2 | update_after  |    4 |        6 |         1 | 1523948280 |
|      0 |            1 | delete        |    3 |        7 |         1 | 1523948281 |

The changes we did on the master MariaDB Server have been propagated to ColumnStore. To understand what the values are, we can map the SQL statements to the rows in the table.

The first SQL statement is "INSERT INTO t1 VALUES (1), (2), (3);", which inserts three values into the table. We see that the first three rows in the resultset are of type "insert" and the values match what we inserted.

The next SQL statement is "UPDATE t1 SET id = 4 WHERE id = 2;", which only touches one row. Although it modifies only one row in the database, it generated two rows in ColumnStore. This happened because the MaxScale CDC system stores both the before and after images of the modified row. This allows easy comparisons between new and old values.

The final SQL statement was "DELETE FROM t1 WHERE id = 3;", which deleted one row. This statement was converted to a delete entry with the data that was deleted (row with `id` of 3). This allows deleted data to be retained for analytical and auditing purposes without actually storing it on the master database.

MariaDB AX is an open source database for modern analytics: distributed, columnar and easy to use.

database ,mariadb ,columnstore ,maxscale ,streaming data

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}