Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Data Streaming With MariaDB

DZone's Guide to

Data Streaming With MariaDB

Data streaming is one of the more popular techniques for big data professionals. Read on to learn how to use data streaming with this popular database.

· Big Data Zone ·
Free Resource

MariaDB TX, proven in production and driven by the community, is a complete database solution for any and every enterprise — a modern database for modern applications.

Big Data vs. Fast Data

While Big Data is being used across the globe by companies to solve their analytical problems, sometimes it becomes a hassle to extract data from a bunch of data sources, do the necessary transformation, and then eventually load it into an analytical platform such as Hadoop or something else.

This obviously takes time and effort, instead of a bunch of ETL jobs, MariaDB provides a data streaming solution directly from OLTP MariaDB TX to OLAP MariaDB AX.

Fast real-time data streaming is achieved with the help of a CDC (Change Data Capture) framework that streams data from MariaDB to MariaDB ColumnStore using MariaDB MaxScale 2.2.x as a bin-log router. MaxScale makes use of the Binary Logs from a MariaDB TX server and steams the data directly to MariaDB AX (MariaDB ColumnStore) for analytics.

This is achieved with the MaxScale CDC Connector, CDC Adapter, and the ColumnStore API package. This sounds a bit complex, but, in reality, it's quite simple.

Here is a quick look at how the data streaming setup works:

MariaDB-server → MaxScale → MaxScale-CDC-Connector → MaxScale-CDC-Adapter → ColumnStore-API → ColumnStore Database

CDC Architecture.png

The Setup

Requirements

Here is a quick look at the setup that we are going to be working on, I am using Oracle Virtual Machines with CentOS 7.

  • 1x CentOS 7 VM for MariaDB TX 3.0.
  • 1x CentOS 7 VM for MaxScale as a Replication Router.
  • 1x CentOS 7 VM for CDC Adapter + Connector and ColumnStore API.
  • 1x CentOS 7 VM for ColumnStore. We will be using a single node "combined" ColumnStore setup for simplicity. Refer to the MariaDB Columnstore Installation Guide page for detailed distributed setup instruction using CentOS 7 VMs.
  • Our Setup will look like this:
    • MariaDB (192.168.56.101)
      • MaxScale (192.168.56.102)
        • CDC Server (192.168.56.103)
          • ColumnStore (192.168.56.104)

Note: We will assume that you already have a MariaDB (source of our data streaming) and ColumnStore (data's final destination) readily available for use.

Preparing the VM OS

There are a few important things that are required before we start the installations.

Note: the following steps must be performed and validated on all the VMs.

Disable SELinux

For the purposes of testing, we want SELinux disabled. Make sure that your SELinux configuration, in the file /etc/selinux/config, looks something like this on all the nodes:

# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
#     enforcing - SELinux security policy is enforced.
#     permissive - SELinux prints warnings instead of enforcing.
#     disabled - No SELinux policy is loaded.
SELINUX=disabled
# SELINUXTYPE= can take one of these two values:
#     targeted - Targeted processes are protected,
#     minimum - Modification of targeted policy. Only selected processes are protected.
#     mls - Multi Level Security protection.
SELINUXTYPE=targeted

The change here is the SELinux setting of course.

After a reboot of the server, check if the SELinux has actually been disabled, use either of the two commands (sestatus getenforce) to confirm.

[root@localhost ~] sestatus
SELinux status:                 disabled

[root@localhost ~] getenforce
Disabled

Disable Firewalld

Firewalld is a standard service that is disabled using the systemctl command on the REHL 7/CETOS 7. After disabling double check using systemctl status firewalld on all the nodes.

[root@localhost ~] systemctl stop firewalld

[root@localhost ~] systemctl disable firewalld
Removed symlink /etc/systemd/system/multi-user.target.wants/firewalld.service.
Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.

[root@localhost ~] systemctl status firewalld
● firewalld.service - firewalld - dynamic firewall daemon
   Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled)
   Active: inactive (dead)
     Docs: man:firewalld(1)

Setup Up the MariaDB Account

The mariadb user and group are required to be created on all the nodes. sudo privilege for the mariadb user is also mandatory.

Following this, all steps will be done using mariadb user with the help of sudo unless specified differently.

[root@localhost ~] groupadd mysql
[root@localhost ~] useradd -g mysql mysql

[root@localhost ~] echo "mysql ALL=(ALL) ALL" >> /etc/sudoers

[root@localhost ~] passwd mysql
Changing password for user mysql.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.

Enable Networking on the VirtualBox VMs

This is important so that the VMs could use Static IP addresses and also enable them to use Internet access of the host operating system.

Note: Make sure that NAT and Host-Only Adapters are both enabled for all the VMs.

Network setting within the VMs will need to be modified as follows for NAT:

[root@localhost ~] cat /etc/sysconfig/network-scripts/ifcfg-enp0s3
TYPE=Ethernet
BOOTPROTO=dhcp
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=no
NAME=enp0s3
DNS1=8.8.8.8
DNS2=8.8.4.4
UUID=89bfb7a3-17c3-49f9-aaa6-4d66c47ea6fb
DEVICE=enp0s3
ONBOOT=yes
PEERDNS=yes
PEERROUTES=yes

For a Host-Only Adapter, edit the IP address accordingly for each VM:

[root@localhost ~] cat /etc/sysconfig/network-scripts/ifcfg-enp0s8
TYPE=Ethernet
BOOTPROTO=none
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=no
NAME=enp0s8
PREFIX=24
IPADDR=192.168.56.101
UUID=21a845ad-8cae-4a2c-b563-164a1f8a30cf
DEVICE=enp0s8
ONBOOT=yes

Note: Make sure the above two files (ifcfg-enp0s3m, ifcfg-enp0s8) are changed on each Node/VM accordingly and have their own distinct Static IP defined in the file (ifcfg-enp0s8).

Set Hostnames on each VM for a better setup, this step is not mandatory but a good practice.

MariaDB VM:

[root@localhost ~] hostnamectl set-hostname mariadb101

MaxScale VM:

[root@localhost ~] hostnamectl set-hostname maxscale102

CDC VM:

[root@localhost ~] hostnamectl set-hostname cdc103

Install MariaDB TX 3.0

Setup the Repository

Login to the MariaDB Server (192.168.56.101).

  • Setup MariaDB repository by using the Repository Setup page or use the curl interface to do this automatically for your OS.
    • or curl -sS https://downloads.mariadb.com/MariaDB/mariadb_repo_setup | bash 

Note: Not all the Linux distros are supported by the curl tool.

  • Once the repository has been set up, simply use "yum install."
    • yum -y install MariaDB-server 
      • Take care with the Unix's case sensitivity.

We will be using the curl script to set up the MariaDB repositories. On production environments, internet access is normally not available, but in that case you can download the RPMs externally and transfer the files to the servers using your favorite secure file transfer tools.

Installation

We will use mariadb user to install MariaDB on the MariaDB server.

[root@mariadb101 ~] su - mysql
[mysql@mariadb101 ~]$ sudo curl -sS https://downloads.mariadb.com/MariaDB/mariadb_repo_setup | bash
[info] Repository file successfully written to /etc/yum.repos.d/mariadb.repo.
[info] Adding trusted package signing keys...
[info] Succeessfully added trusted package signing keys.

[mysql@mariadb101 ~]$ sudo yum install MariaDB-server
mariadb-main                                                             | 2.9 kB  00:00:00
mariadb-maxscale                                                         | 2.4 kB  00:00:00
mariadb-tools                                                            | 2.9 kB  00:00:00
(1/3): mariadb-maxscale/7/x86_64/primary_db                              | 6.3 kB  00:00:01
(2/3): mariadb-tools/7/x86_64/primary_db                                 | 9.6 kB  00:00:01
(3/3): mariadb-main/7/x86_64/primary_db                                  |  48 kB  00:00:01
Resolving Dependencies

================================================================================================
 Package                        Arch          Version                      Repository      Size
================================================================================================
Installing:
 MariaDB-compat                 x86_64        10.3.7-1.el7.centos          mariadb-main   2.8 M
     replacing  mariadb-libs.x86_64 1:5.5.56-2.el7
 MariaDB-server                 x86_64        10.3.7-1.el7.centos          mariadb-main   123 M
Installing for dependencies:
 MariaDB-client                 x86_64        10.3.7-1.el7.centos          mariadb-main    53 M
 MariaDB-common                 x86_64        10.3.7-1.el7.centos          mariadb-main   157 k
 galera                         x86_64        25.3.23-1.rhel7.el7.centos   mariadb-main   8.0 M

Transaction Summary
================================================================================================
Install  2 Packages (+40 Dependent packages)

Total download size: 200 M
Is this ok [y/d/N]: Y
...
...
Complete!

Enter "Y" when prompted, it will download and install a MariaDB server and Client along with all its dependencies.

Install MariaDB MaxScale 2.2.x

Setup the Repository

Login to the MaxScale server (102.168.56.102) and Use "curl -sS https://downloads.mariadb.com/MariaDB/mariadb_repo_setup | bash"

Just like with MariaDB, we will use mariadb user to set up the MariaDB repositories and install MaxScale using yum

[root@maxscale102 ~] curl -sS https://downloads.mariadb.com/MariaDB/mariadb_repo_setup | bash
[info] Repository file successfully written to /etc/yum.repos.d/mariadb.repo.
[info] Adding trusted package signing keys...
[info] Succeessfully added trusted package signing keys.

[root@maxscale102 ~] su - mysql
Last login: Fri Jun 22 14:39:53 EDT 2018 on pts/0

[mysql@maxscale102 ~]$ sudo yum install maxscale
mariadb-main                                                             | 2.9 kB  00:00:00
mariadb-maxscale                                                         | 2.4 kB  00:00:00
mariadb-tools                                                            | 2.9 kB  00:00:00
(1/3): mariadb-maxscale/7/x86_64/primary_db                              | 6.3 kB  00:00:01
(2/3): mariadb-tools/7/x86_64/primary_db                                 | 9.6 kB  00:00:01
(3/3): mariadb-main/7/x86_64/primary_db                                  |  48 kB  00:00:02
Resolving Dependencies
Dependencies Resolved

================================================================================================
 Package               Arch                Version                Repository               Size
================================================================================================
Installing:
 maxscale              x86_64              2.2.9-1                mariadb-maxscale         18 M
Installing for dependencies:
 gnutls                x86_64              3.3.26-9.el7           base                    677 k
 nettle                x86_64              2.7.1-8.el7            base                    327 k
 trousers              x86_64              0.3.14-2.el7           base                    289 k

Transaction Summary
================================================================================================
Install  1 Package (+3 Dependent packages)

Total download size: 19 M
Installed size: 70 M
Is this ok [y/d/N]: Y
...
...
Complete!

[mysql@maxscale102 ~]$ sudo systemctl enable maxscale
Created symlink from /etc/systemd/system/multi-user.target.wants/maxscale.service to /usr/lib/systemd/system/maxscale.service.
[mysql@maxscale102 ~]$ systemctl status maxscale
● maxscale.service - MariaDB MaxScale Database Proxy
   Loaded: loaded (/usr/lib/systemd/system/maxscale.service; enabled; vendor preset: disabled)
   Active: inactive (dead)

Install CDC Server

Setup the Repository

To login to the CDC Server (102.168.56.103) we will need to set up two extra repositories that are currently not automatically added by the curl script. CDC Adapter, however, will automatically be taken care of by it.

Using the above URLs, install the API and Adapter directly from the path. The connector, however, should be automatically downloaded based on the repository setup.

  • Sequence of Installation:
    • install maxscale-cdc-connector 
    • install epel-release 
      • install libuv 
    • install columnstore-api 
    • install cdc-adapters 

Install MaxScale CDC Connector

[root@cdc103 ~] yum install maxscale-cdc-connector
Resolving Dependencies
Dependencies Resolved

================================================================================================
 Package                          Arch            Version        Repository                Size
================================================================================================
Installing:
 maxscale-cdc-connector           x86_64          2.2.9-1        mariadb-maxscale         214 k

Transaction Summary
================================================================================================
Install  1 Package

Total download size: 214 k
Installed size: 1.4 M
Is this ok [y/d/N]: y
...
...
Complete!

Install Extra Packages for Enterprise Linux epel

[root@cdc103 ~] yum install epel-release
Resolving Dependencies
Dependencies Resolved

============================================================================================== 
Package                   Arch                Version              Repository           Size 
==============================================================================================
Installing:
 epel-release              noarch              7-11                 extras               15 k

Transaction Summary
==============================================================================================
Install  1 Package

Total download size: 15 k
Installed size: 24 k
Is this ok [y/d/N]: y
...
...
Complete!

[root@cdc103 ~] yum install libuv
Resolving Dependencies
Dependencies Resolved

============================================================================================== 
Package               Arch                Version                    Repository         Size 
==============================================================================================
Installing:
 libuv                 x86_64              1:1.19.2-1.el7             epel              121 k

Transaction Summary
==============================================================================================
Install  1 Package

Total download size: 121 k
Installed size: 308 k
Is this ok [y/d/N]: y
Downloading packages:
...
...
Complete!

Install ColumnStore API Package

You can Install the API directly from the source or download the RPM using "wget" and then install it locally.

[root@cdc103 ~] yum install https://downloads.mariadb.com/MariaDB/mariadb-columnstore-api/latest/yum/centos/7/x86_64/mariadb-columnstore-api-1.1.5-1-x86_64-centos7.rpm
Resolving Dependencies
Dependencies Resolved

=============================================================================================
 Package                 Arch   Version   Repository                                    Size
=============================================================================================
Installing:
 mariadb-columnstore-api x86_64 1.1.5-1   /mariadb-columnstore-api-1.1.5-1-x86_64      5.4 M

Transaction Summary
=============================================================================================
Install  1 Package
...
...
Complete!

Install CDC Adapter

You can Install the CDC Adapter directly from the source or download the RPM using "wget" and then install it locally.

[root@cdc103 ~] yum install https://downloads.mariadb.com/MariaDB/data-adapters/mariadb-streaming-data-adapters/latest/yum/centos/7/x86_64/mariadb-columnstore-maxscale-cdc-adapters-1.1.5-1-x86_64-centos7.rpm
Resolving Dependencies
Dependencies Resolved

================================================================================================
 Package                                 Arch   Version Repository                         Size
================================================================================================
Installing:
 mariadb-columnstore-data-adapters       x86_64 1.1.5-1 /mariadb-columnstore-maxscale
                                                                                           77 k
Installing for dependencies:
 maxscale-cdc-connector                  x86_64 2.2.9-1 mariadb-maxscale                  214 k

Transaction Summary
================================================================================================
Install  1 Package (+1 Dependent package)

Total size: 291 k
Total download size: 214 k
Installed size: 1.5 M
Is this ok [y/d/N]: y
...
...
Complete!

Setup Communication Between CDC and ColumnStore

Now that the CDC Adapter, API, and Connectors have been installed, we can start to set up communication between the CDC and ColumnStore servers.

In this exercise, we are using a single instance ColumnStore, in case of a distributed install, we will be working with ColumnStore UM1 Node.

ColumnStore Configuration

Connect to CDC server using ssh client and pull the /home/mysql/mariadb/columnstore/etc/Columnstore.xml from ColumnStore server using scp, rsync or any other method available for file transfer between the servers. Columnstore.xml should be downloaded to /etc and owned by root user.

The file can be downloaded from ColumnStore or any node will do as the Columnstore.xml is automatically synchronized between all the nodes. Use mariadb as the remote user to connect to ColumnStore.

As this is the first time connecting to ColumnStore from CDC server, Linux will ask for yes/noand for mariadb password.

Once the file is downloaded, ensure that it has 644 permission so that everyone can read it.

[root@cdc103 ~] scp mysql@192.168.56.104:/home/mysql/mariadb/columnstore/etc/Columnstore.xml /etc
The authenticity of host '192.168.56.104 (192.168.56.104)' can't be established.
ECDSA key fingerprint is SHA256:Jle/edRpKz9ysV8xp1K9TlIGvbg8Sb1p+GbDob3Id0g.
ECDSA key fingerprint is MD5:a1:ce:9d:58:80:c6:ed:5a:95:7b:33:82:68:cb:0f:40.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.168.56.104' (ECDSA) to the list of known hosts.
mysql@192.168.56.104's password:
Columnstore.xml                                                         100%   21KB  13.3MB/s   00:00
[root@cdc103 ~]
[root@cdc103 ~] ls -rlt /etc/Columnstore.xml
-rw-r--r-- 1 root root 21089 Jul  1 05:09 /etc/Columnstore.xml
[root@cdc103 ~]

Search and replace 127.0.0.1 with ColumnStore's IP address in the newly downloaded Columnstore.xml file using root user as the file is under /etc/ folder.

[root@cdc103 etc] sed -i 's/127.0.0.1/192.168.56.104/g' /etc/Columnstore.xml
[root@cdc103 etc]

Setup Master MariaDB Replication Master

Now that the CDC node is ready, it's time to setup MariaDB Server as the replication Master for MaxScale.

MaxScale will be working as a Replication Slave for this data streaming setup.

Login to the MariaDB Master server using the root user and add the following contents to the /etc/my.cnf.d/server.cnf file under the [mysqld] section and restart the MariaDB process.

[mysqld]
server_id=1
log-bin          = mariadb-bin
binlog-format    = ROW
gtid_strict_mode = 1
log_error
log-slave-updates
[root@mariadb101 ~] systemctl restart mariadb

The server_id will be used when configuring MaxScale as a replication slave node.

Setup Replication User

After MariaDB restarts, login to MariaDB using the root user and setup a replication user,maxuser,and grant REPLICATION SLAVE privilege to it.

The second step is to create a new database and a test table with some data that we are going to use for testing data streaming to ColumnStore.

[mysql@mariadb101 ~] mysql -uroot -p
Enter password:
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 10
Server version: 10.3.7-MariaDB-log MariaDB Server

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> RESET MASTER;
Query OK, 0 rows affected (0.004 sec)

MariaDB [(none)]> CREATE USER 'maxuser'@'%' IDENTIFIED BY 'maxpwd';
Query OK, 0 rows affected (0.000 sec)

MariaDB [(none)]> GRANT REPLICATION SLAVE ON *.* TO 'maxuser'@'%';
Query OK, 0 rows affected (0.000 sec)

MariaDB [(none)]> CREATE DATABASE cdc_test;
Query OK, 1 row affected (0.000 sec)

MariaDB [(none)]> USE cdc_test;
Database changed

MariaDB [cdc_test]> CREATE TABLE cdc_tab(id serial, col varchar(100));
Query OK, 0 rows affected (0.010 sec)

MariaDB [cdc_test]> INSERT INTO cdc_tab(col) values ('Row 1'), ('Row 2'), ('Row 3');
Query OK, 3 rows affected (0.004 sec)
Records: 3  Duplicates: 0  Warnings: 0

MariaDB [cdc_test]> UPDATE cdc_tab SET col = 'Updated Row 2' WHERE id = 2;
Query OK, 1 row affected (0.004 sec)
Rows matched: 1  Changed: 1  Warnings: 0

MariaDB [cdc_test]> INSERT INTO cdc_tab(col) values ('Row 4'), ('Row 5'), ('Row 6');
Query OK, 3 rows affected (0.004 sec)
Records: 3  Duplicates: 0  Warnings: 0

MariaDB [cdc_test]> DELETE FROM cdc_tab where id=5;
Query OK, 1 row affected (0.005 sec)

MariaDB [cdc_test]>

Events that are captured in the binary logs of the master MariaDB Server:

  • INSERT 3 Records to cdc_tab.
  • UPDATE col for a row in cdc_tab.
  • INSERT 3 Records to cdc_tab.
  • DELETE a row from cdc_tab.

Setup MaxScale

Log on to the MaxScale server, edit the /etc/maxscale.cnf file, remove everything under the [maxscale] threads=auto section and add the following configuration to it. This will set MaxScale for data streaming to ColumnStore using Avro Listener.

Take note of the server_id in the [replication-router] section, this points to the master MariaDB server's ID that we defined earlier while setting up MariaDB as a master.

#Replication configuration that points to a particular server_id and binlog
[replication-router]
type=service
router=binlogrouter
user=maxuser
passwd=maxpwd
server_id=2
master_id=1
binlogdir=/var/lib/maxscale
mariadb10-compatibility=1
filestem=mariadb-bin

#Replication listener that will listen to the Master DB using the port #6603
[replication-listener]
type=listener
service=replication-router
protocol=MySQLClient
port=6603

#Avro service that will generate JSON files form the Bin-Logs that were received from the MasterDB and store them in the <avrodir> using the replication-router
[avro-router]
type=service
router=avrorouter
source=replication-router
avrodir=/var/lib/maxscale

#Avro listener that is used by the Avro router on a specific port to be used by CDC
[avro-listener]
type=listener
service=avro-router
protocol=cdc
port=4001

[replication-router]

  • This section defines a bin-log router from Master MariaDB to MaxScale Slave using the replication user maxuser, which was created earlier.

[replication-listener]

  • This is the mysql client listener service. Any server with a MariaDB client can connect to MaxScale on the port 6603 specified in this section. We will connect to MaxScale later to set it up as a REPLICATION SLAVE using this port.

[avro-router]

  • This is the router service that routes the bin-log data into AVRO (JSON) files. CDC will use these AVRO files to generate bulk loading scripts for the ColumnStore database.

[avro-listener]

  • Avro Listener uses the avro-router service. This listener is used by the CDC Adapter to get the AVRO files and stream them to ColumnStore.

Once /etc/maxscale.cnf has been modified on the MaxScale server, restart the MaxScale service.

[root@maxscale102 ~] systemctl restart maxscale

Add CDC User/Password for Avro-Router Service in MaxScale

[mysql@maxscale102 ~]$ maxctrl call command cdc add_user avro-router cdcuser cdcpassword

This user will be the user that MaxScale uses to generate Avro schema files to be transferred and loaded into ColumnStore by the MaxScale CDC Adapter.

Setup MaxScale as a REPLICATION SLAVE

Install the MariaDB client on the MaxScale Server. We will use this client to connect to a MaxScale MariaDB listener and set up MaxScale as a replication SLAVE for MariaDB TX.

[mysql@maxscale102 ~]$ sudo yum -y install MariaDB-client
Resolving Dependencies
Dependencies Resolved

==============================================================================================
Package                         Arch            Version                Repository       Size 
==============================================================================================
Installing:
 MariaDB-client                  x86_64          10.3.7-1.el7.centos    mariadb-main     53 M  
 MariaDB-compat                  x86_64          10.3.7-1.el7.centos    mariadb-main    2.8 M
     replacing  mariadb-libs.x86_64 1:5.5.56-2.el7
Installing for dependencies:
 MariaDB-common                  x86_64          10.3.7-1.el7.centos    mariadb-main    157 k 
Running transaction
  Installing : MariaDB-common-10.3.7-1.el7.centos.x86_64                                 1/31
  Installing : MariaDB-compat-10.3.7-1.el7.centos.x86_64                                 2/31
==============================================================================================

Installed:
  MariaDB-client.x86_64 0:10.3.7-1.el7.centos         MariaDB-compat.x86_64 0:10.3.7-1.el7.centos

Dependency Installed:
  MariaDB-common.x86_64 0:10.3.7-1.el7.centos         perl.x86_64 4:5.16.3-292.el7
  perl-Carp.noarch 0:1.26-244.el7                     perl-Encode.x86_64 0:2.51-7.el7
  perl-Exporter.noarch 0:5.68-3.el7                   perl-File-Path.noarch 0:2.09-2.el7
  perl-File-Temp.noarch 0:0.23.01-3.el7               perl-Filter.x86_64 0:1.49-3.el7
  perl-Getopt-Long.noarch 0:2.40-3.el7                perl-HTTP-Tiny.noarch 0:0.033-3.el7
  perl-PathTools.x86_64 0:3.40-5.el7                  perl-Pod-Escapes.noarch 1:1.04-292.el7
  perl-Pod-Perldoc.noarch 0:3.20-4.el7                perl-Pod-Simple.noarch 1:3.28-4.el7
  perl-Pod-Usage.noarch 0:1.63-3.el7                  perl-Scalar-List-Utils.x86_64 0:1.27-248.el7
  perl-Socket.x86_64 0:2.010-4.el7                    perl-Storable.x86_64 0:2.45-3.el7
  perl-Text-ParseWords.noarch 0:3.29-4.el7            perl-Time-HiRes.x86_64 4:1.9725-3.el7
  perl-Time-Local.noarch 0:1.2300-2.el7               perl-constant.noarch 0:1.27-2.el7
  perl-libs.x86_64 4:5.16.3-292.el7                   perl-macros.x86_64 4:5.16.3-292.el7
  perl-parent.noarch 1:0.225-244.el7                  perl-podlators.noarch 0:2.5.1-3.el7
  perl-threads.x86_64 0:1.87-4.el7                    perl-threads-shared.x86_64 0:1.43-6.el7

Replaced:
  mariadb-libs.x86_64 1:5.5.56-2.el7

Complete!

Now we can connect to the MaxScale MariaDB service using the installed client and setup MaxScale as a REPLICATION SLAVE; use the proper Master MariaDB IP in the CHANGE MASTER TO MASTER_HOSTfollowed by START SLAVE and SHOW SLAVE STATUS to ensure the slave is running without any issues.

This setup is just like setting up a MariaDB Master/Slave replication, the only difference, in this case, is that we are setting up MaxScale as a bin-log router slave. Take note that we have used -h 192.168.56.102 argument to pass in MaxScale's IP address and -P 6603 argument to pass in the [replication-listener] port from the /etc/maxscale.cnf file. That port is for themysqlclientservice, which is why we are able to connect to MaxScale's MariaDB interface.

[mysql@maxscale102 ~]$ mysql -h 192.168.56.102 -u maxuser -p maxpwd -P 6603
Enter password:
ERROR 1049 (42000): Unknown database 'maxpwd'
[mysql@maxscale102 ~]$ mysql -h192.168.56.102 -umaxuser -pmaxpwd -P6603
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MySQL connection id is 4
Server version: 10.2.12 2.2.9-maxscale

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MySQL [(none)]> CHANGE MASTER TO MASTER_HOST='192.168.56.101', MASTER_PORT=3306, MASTER_USER='maxuser', MASTER_PASSWORD='maxpwd', MASTER_LOG_FILE='mariadb-bin.000001', MASTER_LOG_POS=4;
Query OK, 0 rows affected (0.001 sec)

MySQL [(none)]> START SLAVE;
Query OK, 0 rows affected (0.201 sec)

MySQL [(none)]> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
               Slave_IO_State: Binlog Dump
                  Master_Host: 192.168.56.101
                  Master_User: maxuser
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mariadb-bin.000001
          Read_Master_Log_Pos: 2047
               Relay_Log_File: mariadb-bin.000001
                Relay_Log_Pos: 2047
        Relay_Master_Log_File: mariadb-bin.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 2047
              Relay_Log_Space: 2047
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 1
                  Master_UUID: 5a2189b0-7d1a-11e8-b04f-080027ad63ed
             Master_Info_File: /var/lib/maxscale/master.ini
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave running
           Master_Retry_Count: 1000
                  Master_Bind:
      Last_IO_Error_TimeStamp:
     Last_SQL_Error_Timestamp:
               Master_SSL_Crl:
           Master_SSL_Crlpath:
                   Using_Gtid: No
                  Gtid_IO_Pos:
1 row in set (0.000 sec)

MySQL [(none)]>

Start Data Streaming

With all the settings in place, we can now start data streaming from MariaDB TX to ColumnStore in MariaDB AX using the CDC and Avro data.

Log in to the CDC Server and execute mxs_adapter to start streaming.

[mysql@cdc103 ~]$ mxs_adapter -c /etc/Columnstore.xml -u cdcuser -p cdcpassword -h 192.168.56.102 -P 4001 cdc_test cdc_tab

Let's review this command and its arguments:

  • -c indicates the path of the Columnstore.xml configuration file.
  • -u / -p indicates the CDC User and Password that were created earlier.
  • -h is the IP address of the MaxScale server.
  • -P is the port on which the MaxScale [avro-listener] service is listening.
  • cdc_test is the database name on the source (MariaDB) and target (ColumnStore).
  • cdc_tab is the table name on the source (MariaDB) and target (ColumnStore).

The target database/table should be already created before starting mxs_adapter, if not, mxs_adapter will provide a script to create the table in ColumnStore automatically. Let's start the service and see what happens.

[mysql@cdc103 ~]$ mxs_adapter -c /etc/Columnstore.xml -u cdcuser -p cdcpassword -h 192.168.56.102 -P 4001 cdc_test cdc_tab
Table not found, create with:

    CREATE TABLE cdc_test.cdc_tab (col varchar(100), domain int, event_number int, event_type varchar(50), id serial, sequence int, server_id int, timestamp int) ENGINE=ColumnStore;

[mysql@cdc103 ~]$

As expected, we get the error message "Table not found." So, next copy the CREATE TABLE script and execute it on the ColumnStore node using the mcsmysql interface.

[root@um1 ~] mcsmysql
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 43
Server version: 10.2.15-MariaDB-log Columnstore 1.1.5-1

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> CREATE DATABASE cdc_test;
Query OK, 1 row affected (0.00 sec)

MariaDB [(none)]> CREATE TABLE cdc_test.cdc_tab (col varchar(100), domain int, event_number int, event_type varchar(50), id serial, sequence int, server_id int, timestamp int) ENGINE=ColumnStore;
ERROR 1069 (42000): Too many keys specified; max 0 keys allowed
MariaDB [(none)]>

There is another problem! Our source table had a serial column which MariaDB treats as a Primary Key. Since ColumnStore does not support Primary Keys/Indexes, we will need to change the CREATE TABLE script before executing it in the ColumnStore DB.

Simply change serial to bigint unsigned .

Let's log back into ColumnStore mcsmysql and recreate the table.

MariaDB [(none)]> DROP TABLE cdc_test.cdc_tab;
Query OK, 0 rows affected (0.02 sec)

MariaDB [(none)]> CREATE TABLE cdc_test.cdc_tab (col varchar(100), domain int, event_number int, event_type varchar(50), id bigint unsigned, sequence int, server_id int, timestamp int) ENGINE=ColumnStore;
Query OK, 0 rows affected (0.55 sec)

MariaDB [(none)]>

Another thing to note here is that the table structure in ColumnStore is quite different from the source. There are a few additional columns. This table is an event table that captures all the INSERT, UPDATE, and DELETE events. We can use this data to identify the latest row for each specific ID and run some analytics.

One more thing to notice here is that at the moment, one mxs_adapter can only handle one table at a time. This is to prevent streaming the entire MariaDB databases to ColumnStore. Instead, we could create aggregate tables on MariaDB TX and stream that data to ColumnStore for a better analytics use case.

Let's start the mxs_adapter once more and see if it can stream the data from our source MariaDB TX to target MariaDB AX.

[mysql@cdc103 ~]$ mxs_adapter -c /etc/Columnstore.xml -u cdcuser -p cdcpassword -h 192.168.56.102 -P 4001 cdc_test cdc_tab
4 rows and 1 transactions inserted in 3.3203 seconds. GTID = 0-1-6
2 rows and 1 transactions inserted in 0.792351 seconds. GTID = 0-1-7
3 rows and 1 transactions inserted in 0.604614 seconds. GTID = 0-1-8

Success! The three events are captured, INSERT, UPDATE and DELETE and their GTIDs. Let's log in to ColumnStore and review the data in the cdc_tab table.

MariaDB [cdc_test]> SHOW TABLES;
+--------------------+
| Tables_in_cdc_test |
+--------------------+
| cdc_tab            |
+--------------------+
1 row in set (0.00 sec)

MariaDB [cdc_test]> SELECT * FROM cdc_tab;
+---------------+--------+--------------+---------------+------+----------+-----------+------------+
| col           | domain | event_number | event_type    | id   | sequence | server_id | timestamp  |
+---------------+--------+--------------+---------------+------+----------+-----------+------------+
| Row 1         |      0 |            1 | insert        |    1 |        5 |         1 | 1530439436 |
| Row 2         |      0 |            2 | insert        |    2 |        5 |         1 | 1530439436 |
| Row 3         |      0 |            3 | insert        |    3 |        5 |         1 | 1530439436 |
| Row 2         |      0 |            1 | update_before |    2 |        6 |         1 | 1530439675 |
| Updated Row 2 |      0 |            2 | update_after  |    2 |        6 |         1 | 1530439675 |
| Row 4         |      0 |            1 | insert        |    4 |        7 |         1 | 1530439697 |
| Row 5         |      0 |            2 | insert        |    5 |        7 |         1 | 1530439697 |
| Row 6         |      0 |            3 | insert        |    6 |        7 |         1 | 1530439697 |
| Row 5         |      0 |            1 | delete        |    5 |        8 |         1 | 1530439714 |
+---------------+--------+--------------+---------------+------+----------+-----------+------------+
9 rows in set (0.17 sec)

MariaDB [cdc_test]>

We can see the INSERT, UPDATE (Before and After) and a DELETE. The SEQUENCE column indicates the sequence in which these events were triggered and the timestamp column indicates the time when these events took place.

Avro Files

While setting up MaxScale, we specified avrodir as /var/lib/mysql. This can be any different location.

[avro-router]
type=service
router=avrorouter
source=replication-router
avrodir=/var/lib/maxscale

Let's see the contents of the avrodir folder. We can see two files generated .avsc contains the data structure while .avro contains the actual data from the bin logs.

[root@maxscale102 maxscale] pwd
/var/lib/maxscale

[root@maxscale102 maxscale] ls -rlt
total 40
drwxr-xr-x 2 maxscale maxscale    6 Jun 22 14:44 maxscale.cnf.d
-rw-r--r-- 1 maxscale maxscale   54 Jun 22 14:44 maxadmin-users
-rw-r--r-- 1 maxscale maxscale   84 Jun 22 14:44 passwd
drwxr--r-- 2 maxscale maxscale   25 Jul  1 04:25 MariaDB-Monitor
drwxr-xr-x 2 maxscale maxscale    6 Jul  1 06:34 data1288
drwxr-xr-x 2 maxscale maxscale   22 Jul  1 06:46 avro-router
-rw------- 1 maxscale maxscale  183 Jul  1 06:50 master.ini
drwx------ 2 maxscale maxscale  199 Jul  1 06:50 cache
-rw-r--r-- 1 maxscale maxscale 4096 Jul  1 06:50 gtid_maps.db
-rw-r--r-- 1 maxscale maxscale 2047 Jul  1 06:50 mariadb-bin.000001
-rw-r--r-- 1 maxscale maxscale  572 Jul  1 06:50 cdc_test.cdc_tab.000001.avsc
-rw-r--r-- 1 maxscale maxscale  797 Jul  1 06:50 cdc_test.cdc_tab.000001.avro
-rw-r--r-- 1 maxscale maxscale 5120 Jul  1 06:50 avro.index
-rw-r--r-- 1 maxscale maxscale   69 Jul  1 07:29 avro-conversion.ini
[root@maxscale102 maxscale]

In case of some issue with the data streaming, one can try to restart the MaxScale service. If there's still a failure, as a last resort one can delete *.avsc, *.avro, avro.index, and avro-conversion.ini and restart the MaxScale service. It should be able to recover.

Since the mxs_adapter is running,  when new data is inserted in the cdc_test.cdc_tab table, it will automatically be streamed into ColumnStore.

Conclusion

Using this setup, we can stream data directly from OLTP MariaDB TX not only to MariaDB AX but also to other sources that can take Avro/JSON data. There is a Kafka Adapter already available in the Data Adapters download page.

References

Thanks!

MariaDB AX is an open source database for modern analytics: distributed, columnar and easy to use.

Topics:
big data ,data streaming ,mariadb ,data science

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}