Data Streaming With MariaDB
Data streaming is one of the more popular techniques for big data professionals. Read on to learn how to use data streaming with this popular database.
Join the DZone community and get the full member experience.
Join For FreeBig Data vs. Fast Data
While Big Data is being used across the globe by companies to solve their analytical problems, sometimes it becomes a hassle to extract data from a bunch of data sources, do the necessary transformation, and then eventually load it into an analytical platform such as Hadoop or something else.
This obviously takes time and effort, instead of a bunch of ETL jobs, MariaDB provides a data streaming solution directly from OLTP MariaDB TX to OLAP MariaDB AX.
Fast real-time data streaming is achieved with the help of a CDC (Change Data Capture) framework that streams data from MariaDB to MariaDB ColumnStore using MariaDB MaxScale 2.2.x as a bin-log router. MaxScale makes use of the Binary Logs from a MariaDB TX server and steams the data directly to MariaDB AX (MariaDB ColumnStore) for analytics.
This is achieved with the MaxScale CDC Connector, CDC Adapter, and the ColumnStore API package. This sounds a bit complex, but, in reality, it's quite simple.
Here is a quick look at how the data streaming setup works:
MariaDB-server → MaxScale → MaxScale-CDC-Connector → MaxScale-CDC-Adapter → ColumnStore-API → ColumnStore Database
The Setup
Requirements
Here is a quick look at the setup that we are going to be working on, I am using Oracle Virtual Machines with CentOS 7.
- 1x CentOS 7 VM for MariaDB TX 3.0.
- 1x CentOS 7 VM for MaxScale as a Replication Router.
- 1x CentOS 7 VM for CDC Adapter + Connector and ColumnStore API.
- 1x CentOS 7 VM for ColumnStore. We will be using a single node "combined" ColumnStore setup for simplicity. Refer to the MariaDB Columnstore Installation Guide page for detailed distributed setup instruction using CentOS 7 VMs.
- Our Setup will look like this:
- MariaDB (192.168.56.101)
- MaxScale (192.168.56.102)
- CDC Server (192.168.56.103)
- ColumnStore (192.168.56.104)
- CDC Server (192.168.56.103)
- MaxScale (192.168.56.102)
- MariaDB (192.168.56.101)
Note: We will assume that you already have a MariaDB (source of our data streaming) and ColumnStore (data's final destination) readily available for use.
Preparing the VM OS
There are a few important things that are required before we start the installations.
Note: the following steps must be performed and validated on all the VMs.
Disable SELinux
For the purposes of testing, we want SELinux disabled. Make sure that your SELinux configuration, in the file /etc/selinux/config
, looks something like this on all the nodes:
# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
# enforcing - SELinux security policy is enforced.
# permissive - SELinux prints warnings instead of enforcing.
# disabled - No SELinux policy is loaded.
SELINUX=disabled
# SELINUXTYPE= can take one of these two values:
# targeted - Targeted processes are protected,
# minimum - Modification of targeted policy. Only selected processes are protected.
# mls - Multi Level Security protection.
SELINUXTYPE=targeted
The change here is the SELinux setting of course.
After a reboot of the server, check if the SELinux has actually been disabled, use either of the two commands (sestatus
/ getenforce
) to confirm.
[root@localhost ~] sestatus
SELinux status: disabled
[root@localhost ~] getenforce
Disabled
Disable Firewalld
Firewalld is a standard service that is disabled using the systemctl command on the REHL 7/CETOS 7. After disabling double check using systemctl status firewalld
on all the nodes.
[root@localhost ~] systemctl stop firewalld
[root@localhost ~] systemctl disable firewalld
Removed symlink /etc/systemd/system/multi-user.target.wants/firewalld.service.
Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
[root@localhost ~] systemctl status firewalld
● firewalld.service - firewalld - dynamic firewall daemon
Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled)
Active: inactive (dead)
Docs: man:firewalld(1)
Setup Up the MariaDB Account
The mariadb
user and group are required to be created on all the nodes. sudo
privilege for the mariadb
user is also mandatory.
Following this, all steps will be done using mariadb
user with the help of sudo
unless specified differently.
[root@localhost ~] groupadd mysql
[root@localhost ~] useradd -g mysql mysql
[root@localhost ~] echo "mysql ALL=(ALL) ALL" >> /etc/sudoers
[root@localhost ~] passwd mysql
Changing password for user mysql.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.
Enable Networking on the VirtualBox VMs
This is important so that the VMs could use Static IP addresses and also enable them to use Internet access of the host operating system.
Note: Make sure that NAT and Host-Only Adapters are both enabled for all the VMs.
Network setting within the VMs will need to be modified as follows for NAT:
[root@localhost ~] cat /etc/sysconfig/network-scripts/ifcfg-enp0s3
TYPE=Ethernet
BOOTPROTO=dhcp
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=no
NAME=enp0s3
DNS1=8.8.8.8
DNS2=8.8.4.4
UUID=89bfb7a3-17c3-49f9-aaa6-4d66c47ea6fb
DEVICE=enp0s3
ONBOOT=yes
PEERDNS=yes
PEERROUTES=yes
For a Host-Only Adapter, edit the IP address accordingly for each VM:
[root@localhost ~] cat /etc/sysconfig/network-scripts/ifcfg-enp0s8
TYPE=Ethernet
BOOTPROTO=none
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=no
NAME=enp0s8
PREFIX=24
IPADDR=192.168.56.101
UUID=21a845ad-8cae-4a2c-b563-164a1f8a30cf
DEVICE=enp0s8
ONBOOT=yes
Note: Make sure the above two files (ifcfg-enp0s3m
, ifcfg-enp0s8
) are changed on each Node/VM accordingly and have their own distinct Static IP defined in the file (ifcfg-enp0s8
).
Set Hostnames on each VM for a better setup, this step is not mandatory but a good practice.
MariaDB VM:
[root@localhost ~] hostnamectl set-hostname mariadb101
MaxScale VM:
[root@localhost ~] hostnamectl set-hostname maxscale102
CDC VM:
[root@localhost ~] hostnamectl set-hostname cdc103
Install MariaDB TX 3.0
Setup the Repository
Login to the MariaDB Server (192.168.56.101).
- Setup MariaDB repository by using the Repository Setup page or use the curl interface to do this automatically for your OS.
- or
curl -sS https://downloads.mariadb.com/MariaDB/mariadb_repo_setup | bash
- or
Note: Not all the Linux distros are supported by the curl tool.
- Once the repository has been set up, simply use "
yum install
."yum -y install MariaDB-server
- Take care with the Unix's case sensitivity.
We will be using the curl
script to set up the MariaDB repositories. On production environments, internet access is normally not available, but in that case you can download the RPMs externally and transfer the files to the servers using your favorite secure file transfer tools.
Installation
We will use mariadb
user to install MariaDB on the MariaDB server.
[root@mariadb101 ~] su - mysql
[mysql@mariadb101 ~]$ sudo curl -sS https://downloads.mariadb.com/MariaDB/mariadb_repo_setup | bash
[info] Repository file successfully written to /etc/yum.repos.d/mariadb.repo.
[info] Adding trusted package signing keys...
[info] Succeessfully added trusted package signing keys.
[mysql@mariadb101 ~]$ sudo yum install MariaDB-server
mariadb-main | 2.9 kB 00:00:00
mariadb-maxscale | 2.4 kB 00:00:00
mariadb-tools | 2.9 kB 00:00:00
(1/3): mariadb-maxscale/7/x86_64/primary_db | 6.3 kB 00:00:01
(2/3): mariadb-tools/7/x86_64/primary_db | 9.6 kB 00:00:01
(3/3): mariadb-main/7/x86_64/primary_db | 48 kB 00:00:01
Resolving Dependencies
================================================================================================
Package Arch Version Repository Size
================================================================================================
Installing:
MariaDB-compat x86_64 10.3.7-1.el7.centos mariadb-main 2.8 M
replacing mariadb-libs.x86_64 1:5.5.56-2.el7
MariaDB-server x86_64 10.3.7-1.el7.centos mariadb-main 123 M
Installing for dependencies:
MariaDB-client x86_64 10.3.7-1.el7.centos mariadb-main 53 M
MariaDB-common x86_64 10.3.7-1.el7.centos mariadb-main 157 k
galera x86_64 25.3.23-1.rhel7.el7.centos mariadb-main 8.0 M
Transaction Summary
================================================================================================
Install 2 Packages (+40 Dependent packages)
Total download size: 200 M
Is this ok [y/d/N]: Y
...
...
Complete!
Enter "Y" when prompted, it will download and install a MariaDB server and Client along with all its dependencies.
Install MariaDB MaxScale 2.2.x
Setup the Repository
Login to the MaxScale server (102.168.56.102) and Use "curl -sS https://downloads.mariadb.com/MariaDB/mariadb_repo_setup | bash"
Just like with MariaDB, we will use mariadb
user to set up the MariaDB repositories and install MaxScale using yum
[root@maxscale102 ~] curl -sS https://downloads.mariadb.com/MariaDB/mariadb_repo_setup | bash
[info] Repository file successfully written to /etc/yum.repos.d/mariadb.repo.
[info] Adding trusted package signing keys...
[info] Succeessfully added trusted package signing keys.
[root@maxscale102 ~] su - mysql
Last login: Fri Jun 22 14:39:53 EDT 2018 on pts/0
[mysql@maxscale102 ~]$ sudo yum install maxscale
mariadb-main | 2.9 kB 00:00:00
mariadb-maxscale | 2.4 kB 00:00:00
mariadb-tools | 2.9 kB 00:00:00
(1/3): mariadb-maxscale/7/x86_64/primary_db | 6.3 kB 00:00:01
(2/3): mariadb-tools/7/x86_64/primary_db | 9.6 kB 00:00:01
(3/3): mariadb-main/7/x86_64/primary_db | 48 kB 00:00:02
Resolving Dependencies
Dependencies Resolved
================================================================================================
Package Arch Version Repository Size
================================================================================================
Installing:
maxscale x86_64 2.2.9-1 mariadb-maxscale 18 M
Installing for dependencies:
gnutls x86_64 3.3.26-9.el7 base 677 k
nettle x86_64 2.7.1-8.el7 base 327 k
trousers x86_64 0.3.14-2.el7 base 289 k
Transaction Summary
================================================================================================
Install 1 Package (+3 Dependent packages)
Total download size: 19 M
Installed size: 70 M
Is this ok [y/d/N]: Y
...
...
Complete!
[mysql@maxscale102 ~]$ sudo systemctl enable maxscale
Created symlink from /etc/systemd/system/multi-user.target.wants/maxscale.service to /usr/lib/systemd/system/maxscale.service.
[mysql@maxscale102 ~]$ systemctl status maxscale
● maxscale.service - MariaDB MaxScale Database Proxy
Loaded: loaded (/usr/lib/systemd/system/maxscale.service; enabled; vendor preset: disabled)
Active: inactive (dead)
Install CDC Server
Setup the Repository
To login to the CDC Server (102.168.56.103) we will need to set up two extra repositories that are currently not automatically added by the curl
script. CDC Adapter, however, will automatically be taken care of by it.
- Refer to the following links to the direct path to the two RPMs:
Using the above URLs, install the API and Adapter directly from the path. The connector, however, should be automatically downloaded based on the repository setup.
- Sequence of Installation:
install maxscale-cdc-connector
install epel-release
install libuv
install columnstore-api
install cdc-adapters
Install MaxScale CDC Connector
[root@cdc103 ~] yum install maxscale-cdc-connector
Resolving Dependencies
Dependencies Resolved
================================================================================================
Package Arch Version Repository Size
================================================================================================
Installing:
maxscale-cdc-connector x86_64 2.2.9-1 mariadb-maxscale 214 k
Transaction Summary
================================================================================================
Install 1 Package
Total download size: 214 k
Installed size: 1.4 M
Is this ok [y/d/N]: y
...
...
Complete!
Install Extra Packages for Enterprise Linux epel
[root@cdc103 ~] yum install epel-release
Resolving Dependencies
Dependencies Resolved
==============================================================================================
Package Arch Version Repository Size
==============================================================================================
Installing:
epel-release noarch 7-11 extras 15 k
Transaction Summary
==============================================================================================
Install 1 Package
Total download size: 15 k
Installed size: 24 k
Is this ok [y/d/N]: y
...
...
Complete!
[root@cdc103 ~] yum install libuv
Resolving Dependencies
Dependencies Resolved
==============================================================================================
Package Arch Version Repository Size
==============================================================================================
Installing:
libuv x86_64 1:1.19.2-1.el7 epel 121 k
Transaction Summary
==============================================================================================
Install 1 Package
Total download size: 121 k
Installed size: 308 k
Is this ok [y/d/N]: y
Downloading packages:
...
...
Complete!
Install ColumnStore API Package
You can Install the API directly from the source or download the RPM using "wget
" and then install it locally.
[root@cdc103 ~] yum install https://downloads.mariadb.com/MariaDB/mariadb-columnstore-api/latest/yum/centos/7/x86_64/mariadb-columnstore-api-1.1.5-1-x86_64-centos7.rpm
Resolving Dependencies
Dependencies Resolved
=============================================================================================
Package Arch Version Repository Size
=============================================================================================
Installing:
mariadb-columnstore-api x86_64 1.1.5-1 /mariadb-columnstore-api-1.1.5-1-x86_64 5.4 M
Transaction Summary
=============================================================================================
Install 1 Package
...
...
Complete!
Install CDC Adapter
You can Install the CDC Adapter directly from the source or download the RPM using "wget
" and then install it locally.
[root@cdc103 ~] yum install https://downloads.mariadb.com/MariaDB/data-adapters/mariadb-streaming-data-adapters/latest/yum/centos/7/x86_64/mariadb-columnstore-maxscale-cdc-adapters-1.1.5-1-x86_64-centos7.rpm
Resolving Dependencies
Dependencies Resolved
================================================================================================
Package Arch Version Repository Size
================================================================================================
Installing:
mariadb-columnstore-data-adapters x86_64 1.1.5-1 /mariadb-columnstore-maxscale
77 k
Installing for dependencies:
maxscale-cdc-connector x86_64 2.2.9-1 mariadb-maxscale 214 k
Transaction Summary
================================================================================================
Install 1 Package (+1 Dependent package)
Total size: 291 k
Total download size: 214 k
Installed size: 1.5 M
Is this ok [y/d/N]: y
...
...
Complete!
Setup Communication Between CDC and ColumnStore
Now that the CDC Adapter, API, and Connectors have been installed, we can start to set up communication between the CDC and ColumnStore servers.
In this exercise, we are using a single instance ColumnStore, in case of a distributed install, we will be working with ColumnStore UM1 Node.
ColumnStore Configuration
Connect to CDC server using ssh client and pull the /home/mysql/mariadb/columnstore/etc/Columnstore.xml
from ColumnStore server using scp
, rsync
or any other method available for file transfer between the servers. Columnstore.xml
should be downloaded to /etc
and owned by root
user.
The file can be downloaded from ColumnStore or any node will do as the Columnstore.xml
is automatically synchronized between all the nodes. Use mariadb
as the remote user to connect to ColumnStore.
As this is the first time connecting to ColumnStore from CDC server, Linux will ask for yes/no
and for mariadb
password.
Once the file is downloaded, ensure that it has 644
permission so that everyone can read it.
[root@cdc103 ~] scp mysql@192.168.56.104:/home/mysql/mariadb/columnstore/etc/Columnstore.xml /etc
The authenticity of host '192.168.56.104 (192.168.56.104)' can't be established.
ECDSA key fingerprint is SHA256:Jle/edRpKz9ysV8xp1K9TlIGvbg8Sb1p+GbDob3Id0g.
ECDSA key fingerprint is MD5:a1:ce:9d:58:80:c6:ed:5a:95:7b:33:82:68:cb:0f:40.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.168.56.104' (ECDSA) to the list of known hosts.
mysql@192.168.56.104's password:
Columnstore.xml 100% 21KB 13.3MB/s 00:00
[root@cdc103 ~]
[root@cdc103 ~] ls -rlt /etc/Columnstore.xml
-rw-r--r-- 1 root root 21089 Jul 1 05:09 /etc/Columnstore.xml
[root@cdc103 ~]
Search and replace 127.0.0.1
with ColumnStore's IP address in the newly downloaded Columnstore.xml
file using root
user as the file is under /etc/
folder.
[root@cdc103 etc] sed -i 's/127.0.0.1/192.168.56.104/g' /etc/Columnstore.xml
[root@cdc103 etc]
Setup Master MariaDB Replication Master
Now that the CDC node is ready, it's time to setup MariaDB Server as the replication Master for MaxScale.
MaxScale will be working as a Replication Slave for this data streaming setup.
Login to the MariaDB Master server using the root
user and add the following contents to the /etc/my.cnf.d/server.cnf
file under the [mysqld]
section and restart the MariaDB process.
[mysqld]
server_id=1
log-bin = mariadb-bin
binlog-format = ROW
gtid_strict_mode = 1
log_error
log-slave-updates
[root@mariadb101 ~] systemctl restart mariadb
The server_id
will be used when configuring MaxScale as a replication slave node.
Setup Replication User
After MariaDB restarts, login to MariaDB using the root
user and setup a replication user,maxuser
,and grant REPLICATION SLAVE
privilege to it.
The second step is to create a new database and a test table with some data that we are going to use for testing data streaming to ColumnStore.
[mysql@mariadb101 ~] mysql -uroot -p
Enter password:
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 10
Server version: 10.3.7-MariaDB-log MariaDB Server
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MariaDB [(none)]> RESET MASTER;
Query OK, 0 rows affected (0.004 sec)
MariaDB [(none)]> CREATE USER 'maxuser'@'%' IDENTIFIED BY 'maxpwd';
Query OK, 0 rows affected (0.000 sec)
MariaDB [(none)]> GRANT REPLICATION SLAVE ON *.* TO 'maxuser'@'%';
Query OK, 0 rows affected (0.000 sec)
MariaDB [(none)]> CREATE DATABASE cdc_test;
Query OK, 1 row affected (0.000 sec)
MariaDB [(none)]> USE cdc_test;
Database changed
MariaDB [cdc_test]> CREATE TABLE cdc_tab(id serial, col varchar(100));
Query OK, 0 rows affected (0.010 sec)
MariaDB [cdc_test]> INSERT INTO cdc_tab(col) values ('Row 1'), ('Row 2'), ('Row 3');
Query OK, 3 rows affected (0.004 sec)
Records: 3 Duplicates: 0 Warnings: 0
MariaDB [cdc_test]> UPDATE cdc_tab SET col = 'Updated Row 2' WHERE id = 2;
Query OK, 1 row affected (0.004 sec)
Rows matched: 1 Changed: 1 Warnings: 0
MariaDB [cdc_test]> INSERT INTO cdc_tab(col) values ('Row 4'), ('Row 5'), ('Row 6');
Query OK, 3 rows affected (0.004 sec)
Records: 3 Duplicates: 0 Warnings: 0
MariaDB [cdc_test]> DELETE FROM cdc_tab where id=5;
Query OK, 1 row affected (0.005 sec)
MariaDB [cdc_test]>
Events that are captured in the binary logs of the master MariaDB Server:
- INSERT 3 Records to cdc_tab.
- UPDATE
col
for a row in cdc_tab. - INSERT 3 Records to cdc_tab.
- DELETE a row from cdc_tab.
Setup MaxScale
Log on to the MaxScale server, edit the /etc/maxscale.cnf
file, remove everything under the [maxscale] threads=auto
section and add the following configuration to it. This will set MaxScale for data streaming to ColumnStore using Avro Listener.
Take note of the server_id
in the [replication-router] section, this points to the master MariaDB server's ID that we defined earlier while setting up MariaDB as a master.
#Replication configuration that points to a particular server_id and binlog
[replication-router]
type=service
router=binlogrouter
user=maxuser
passwd=maxpwd
server_id=2
master_id=1
binlogdir=/var/lib/maxscale
mariadb10-compatibility=1
filestem=mariadb-bin
#Replication listener that will listen to the Master DB using the port #6603
[replication-listener]
type=listener
service=replication-router
protocol=MySQLClient
port=6603
#Avro service that will generate JSON files form the Bin-Logs that were received from the MasterDB and store them in the <avrodir> using the replication-router
[avro-router]
type=service
router=avrorouter
source=replication-router
avrodir=/var/lib/maxscale
#Avro listener that is used by the Avro router on a specific port to be used by CDC
[avro-listener]
type=listener
service=avro-router
protocol=cdc
port=4001
[replication-router]
- This section defines a bin-log router from Master MariaDB to MaxScale Slave using the replication user
maxuser
, which was created earlier.
[replication-listener]
- This is the
mysql
client listener service. Any server with a MariaDB client can connect to MaxScale on the port6603
specified in this section. We will connect to MaxScale later to set it up as a REPLICATION SLAVE using this port.
[avro-router]
- This is the router service that routes the bin-log data into AVRO (JSON) files. CDC will use these AVRO files to generate bulk loading scripts for the ColumnStore database.
[avro-listener]
- Avro Listener uses the avro-router service. This listener is used by the CDC Adapter to get the AVRO files and stream them to ColumnStore.
Once /etc/maxscale.cnf
has been modified on the MaxScale server, restart the MaxScale service.
[root@maxscale102 ~] systemctl restart maxscale
Add CDC User/Password for Avro-Router Service in MaxScale
[mysql@maxscale102 ~]$ maxctrl call command cdc add_user avro-router cdcuser cdcpassword
This user will be the user that MaxScale uses to generate Avro schema files to be transferred and loaded into ColumnStore by the MaxScale CDC Adapter.
Setup MaxScale as a REPLICATION SLAVE
Install the MariaDB client on the MaxScale Server. We will use this client to connect to a MaxScale MariaDB listener and set up MaxScale as a replication SLAVE for MariaDB TX.
[mysql@maxscale102 ~]$ sudo yum -y install MariaDB-client
Resolving Dependencies
Dependencies Resolved
==============================================================================================
Package Arch Version Repository Size
==============================================================================================
Installing:
MariaDB-client x86_64 10.3.7-1.el7.centos mariadb-main 53 M
MariaDB-compat x86_64 10.3.7-1.el7.centos mariadb-main 2.8 M
replacing mariadb-libs.x86_64 1:5.5.56-2.el7
Installing for dependencies:
MariaDB-common x86_64 10.3.7-1.el7.centos mariadb-main 157 k
Running transaction
Installing : MariaDB-common-10.3.7-1.el7.centos.x86_64 1/31
Installing : MariaDB-compat-10.3.7-1.el7.centos.x86_64 2/31
==============================================================================================
Installed:
MariaDB-client.x86_64 0:10.3.7-1.el7.centos MariaDB-compat.x86_64 0:10.3.7-1.el7.centos
Dependency Installed:
MariaDB-common.x86_64 0:10.3.7-1.el7.centos perl.x86_64 4:5.16.3-292.el7
perl-Carp.noarch 0:1.26-244.el7 perl-Encode.x86_64 0:2.51-7.el7
perl-Exporter.noarch 0:5.68-3.el7 perl-File-Path.noarch 0:2.09-2.el7
perl-File-Temp.noarch 0:0.23.01-3.el7 perl-Filter.x86_64 0:1.49-3.el7
perl-Getopt-Long.noarch 0:2.40-3.el7 perl-HTTP-Tiny.noarch 0:0.033-3.el7
perl-PathTools.x86_64 0:3.40-5.el7 perl-Pod-Escapes.noarch 1:1.04-292.el7
perl-Pod-Perldoc.noarch 0:3.20-4.el7 perl-Pod-Simple.noarch 1:3.28-4.el7
perl-Pod-Usage.noarch 0:1.63-3.el7 perl-Scalar-List-Utils.x86_64 0:1.27-248.el7
perl-Socket.x86_64 0:2.010-4.el7 perl-Storable.x86_64 0:2.45-3.el7
perl-Text-ParseWords.noarch 0:3.29-4.el7 perl-Time-HiRes.x86_64 4:1.9725-3.el7
perl-Time-Local.noarch 0:1.2300-2.el7 perl-constant.noarch 0:1.27-2.el7
perl-libs.x86_64 4:5.16.3-292.el7 perl-macros.x86_64 4:5.16.3-292.el7
perl-parent.noarch 1:0.225-244.el7 perl-podlators.noarch 0:2.5.1-3.el7
perl-threads.x86_64 0:1.87-4.el7 perl-threads-shared.x86_64 0:1.43-6.el7
Replaced:
mariadb-libs.x86_64 1:5.5.56-2.el7
Complete!
Now we can connect to the MaxScale MariaDB service using the installed client and setup MaxScale as a REPLICATION SLAVE
; use the proper Master MariaDB IP in the CHANGE MASTER TO MASTER_HOST
followed by START SLAVE
and SHOW SLAVE STATUS
to ensure the slave is running without any issues.
This setup is just like setting up a MariaDB Master/Slave replication, the only difference, in this case, is that we are setting up MaxScale as a bin-log
router slave. Take note that we have used -h 192.168.56.102
argument to pass in MaxScale's IP address and -P 6603
argument to pass in the [replication-listener]
port from the /etc/maxscale.cnf
file. That port is for themysqlclient
service, which is why we are able to connect to MaxScale's MariaDB interface.
[mysql@maxscale102 ~]$ mysql -h 192.168.56.102 -u maxuser -p maxpwd -P 6603
Enter password:
ERROR 1049 (42000): Unknown database 'maxpwd'
[mysql@maxscale102 ~]$ mysql -h192.168.56.102 -umaxuser -pmaxpwd -P6603
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MySQL connection id is 4
Server version: 10.2.12 2.2.9-maxscale
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MySQL [(none)]> CHANGE MASTER TO MASTER_HOST='192.168.56.101', MASTER_PORT=3306, MASTER_USER='maxuser', MASTER_PASSWORD='maxpwd', MASTER_LOG_FILE='mariadb-bin.000001', MASTER_LOG_POS=4;
Query OK, 0 rows affected (0.001 sec)
MySQL [(none)]> START SLAVE;
Query OK, 0 rows affected (0.201 sec)
MySQL [(none)]> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
Slave_IO_State: Binlog Dump
Master_Host: 192.168.56.101
Master_User: maxuser
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mariadb-bin.000001
Read_Master_Log_Pos: 2047
Relay_Log_File: mariadb-bin.000001
Relay_Log_Pos: 2047
Relay_Master_Log_File: mariadb-bin.000001
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 2047
Relay_Log_Space: 2047
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1
Master_UUID: 5a2189b0-7d1a-11e8-b04f-080027ad63ed
Master_Info_File: /var/lib/maxscale/master.ini
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave running
Master_Retry_Count: 1000
Master_Bind:
Last_IO_Error_TimeStamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Using_Gtid: No
Gtid_IO_Pos:
1 row in set (0.000 sec)
MySQL [(none)]>
Start Data Streaming
With all the settings in place, we can now start data streaming from MariaDB TX to ColumnStore in MariaDB AX using the CDC and Avro data.
Log in to the CDC Server and execute mxs_adapter
to start streaming.
[mysql@cdc103 ~]$ mxs_adapter -c /etc/Columnstore.xml -u cdcuser -p cdcpassword -h 192.168.56.102 -P 4001 cdc_test cdc_tab
Let's review this command and its arguments:
- -c indicates the path of the Columnstore.xml configuration file.
- -u / -p indicates the CDC User and Password that were created earlier.
- -h is the IP address of the MaxScale server.
- -P is the port on which the MaxScale [avro-listener] service is listening.
- cdc_test is the database name on the source (MariaDB) and target (ColumnStore).
- cdc_tab is the table name on the source (MariaDB) and target (ColumnStore).
The target database/table should be already created before starting mxs_adapter, if not, mxs_adapter will provide a script to create the table in ColumnStore
automatically. Let's start the service and see what happens.
[mysql@cdc103 ~]$ mxs_adapter -c /etc/Columnstore.xml -u cdcuser -p cdcpassword -h 192.168.56.102 -P 4001 cdc_test cdc_tab
Table not found, create with:
CREATE TABLE cdc_test.cdc_tab (col varchar(100), domain int, event_number int, event_type varchar(50), id serial, sequence int, server_id int, timestamp int) ENGINE=ColumnStore;
[mysql@cdc103 ~]$
As expected, we get the error message "Table not found
." So, next copy the CREATE TABLE
script and execute it on the ColumnStore node using the mcsmysql
interface.
[root@um1 ~] mcsmysql
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 43
Server version: 10.2.15-MariaDB-log Columnstore 1.1.5-1
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MariaDB [(none)]> CREATE DATABASE cdc_test;
Query OK, 1 row affected (0.00 sec)
MariaDB [(none)]> CREATE TABLE cdc_test.cdc_tab (col varchar(100), domain int, event_number int, event_type varchar(50), id serial, sequence int, server_id int, timestamp int) ENGINE=ColumnStore;
ERROR 1069 (42000): Too many keys specified; max 0 keys allowed
MariaDB [(none)]>
There is another problem! Our source table had a serial
column which MariaDB treats as a Primary Key. Since ColumnStore
does not support Primary Keys/Indexes, we will need to change the CREATE TABLE
script before executing it in the ColumnStore DB.
Simply change serial to bigint unsigned .
Let's log back into ColumnStore mcsmysql and recreate the table.
MariaDB [(none)]> DROP TABLE cdc_test.cdc_tab;
Query OK, 0 rows affected (0.02 sec)
MariaDB [(none)]> CREATE TABLE cdc_test.cdc_tab (col varchar(100), domain int, event_number int, event_type varchar(50), id bigint unsigned, sequence int, server_id int, timestamp int) ENGINE=ColumnStore;
Query OK, 0 rows affected (0.55 sec)
MariaDB [(none)]>
Another thing to note here is that the table structure in ColumnStore
is quite different from the source. There are a few additional columns. This table is an event table that captures all the INSERT, UPDATE, and DELETE events. We can use this data to identify the latest row for each specific ID and run some analytics.
One more thing to notice here is that at the moment, one mxs_adapter
can only handle one table at a time. This is to prevent streaming the entire MariaDB databases to ColumnStore
. Instead, we could create aggregate tables on MariaDB TX and stream that data to ColumnStore
for a better analytics use case.
Let's start the mxs_adapter once more and see if it can stream the data from our source MariaDB TX to target MariaDB AX.
[mysql@cdc103 ~]$ mxs_adapter -c /etc/Columnstore.xml -u cdcuser -p cdcpassword -h 192.168.56.102 -P 4001 cdc_test cdc_tab
4 rows and 1 transactions inserted in 3.3203 seconds. GTID = 0-1-6
2 rows and 1 transactions inserted in 0.792351 seconds. GTID = 0-1-7
3 rows and 1 transactions inserted in 0.604614 seconds. GTID = 0-1-8
Success! The three events are captured, INSERT, UPDATE and DELETE and their GTIDs. Let's log in to ColumnStore
and review the data in the cdc_tab
table.
MariaDB [cdc_test]> SHOW TABLES;
+--------------------+
| Tables_in_cdc_test |
+--------------------+
| cdc_tab |
+--------------------+
1 row in set (0.00 sec)
MariaDB [cdc_test]> SELECT * FROM cdc_tab;
+---------------+--------+--------------+---------------+------+----------+-----------+------------+
| col | domain | event_number | event_type | id | sequence | server_id | timestamp |
+---------------+--------+--------------+---------------+------+----------+-----------+------------+
| Row 1 | 0 | 1 | insert | 1 | 5 | 1 | 1530439436 |
| Row 2 | 0 | 2 | insert | 2 | 5 | 1 | 1530439436 |
| Row 3 | 0 | 3 | insert | 3 | 5 | 1 | 1530439436 |
| Row 2 | 0 | 1 | update_before | 2 | 6 | 1 | 1530439675 |
| Updated Row 2 | 0 | 2 | update_after | 2 | 6 | 1 | 1530439675 |
| Row 4 | 0 | 1 | insert | 4 | 7 | 1 | 1530439697 |
| Row 5 | 0 | 2 | insert | 5 | 7 | 1 | 1530439697 |
| Row 6 | 0 | 3 | insert | 6 | 7 | 1 | 1530439697 |
| Row 5 | 0 | 1 | delete | 5 | 8 | 1 | 1530439714 |
+---------------+--------+--------------+---------------+------+----------+-----------+------------+
9 rows in set (0.17 sec)
MariaDB [cdc_test]>
We can see the INSERT, UPDATE (Before and After) and a DELETE. The SEQUENCE column indicates the sequence in which these events were triggered and the timestamp column indicates the time when these events took place.
Avro Files
While setting up MaxScale, we specified avrodir
as /var/lib/mysql
. This can be any different location.
[avro-router]
type=service
router=avrorouter
source=replication-router
avrodir=/var/lib/maxscale
Let's see the contents of the avrodir
folder. We can see two files generated .avsc
contains the data structure while .avro
contains the actual data from the bin logs.
[root@maxscale102 maxscale] pwd
/var/lib/maxscale
[root@maxscale102 maxscale] ls -rlt
total 40
drwxr-xr-x 2 maxscale maxscale 6 Jun 22 14:44 maxscale.cnf.d
-rw-r--r-- 1 maxscale maxscale 54 Jun 22 14:44 maxadmin-users
-rw-r--r-- 1 maxscale maxscale 84 Jun 22 14:44 passwd
drwxr--r-- 2 maxscale maxscale 25 Jul 1 04:25 MariaDB-Monitor
drwxr-xr-x 2 maxscale maxscale 6 Jul 1 06:34 data1288
drwxr-xr-x 2 maxscale maxscale 22 Jul 1 06:46 avro-router
-rw------- 1 maxscale maxscale 183 Jul 1 06:50 master.ini
drwx------ 2 maxscale maxscale 199 Jul 1 06:50 cache
-rw-r--r-- 1 maxscale maxscale 4096 Jul 1 06:50 gtid_maps.db
-rw-r--r-- 1 maxscale maxscale 2047 Jul 1 06:50 mariadb-bin.000001
-rw-r--r-- 1 maxscale maxscale 572 Jul 1 06:50 cdc_test.cdc_tab.000001.avsc
-rw-r--r-- 1 maxscale maxscale 797 Jul 1 06:50 cdc_test.cdc_tab.000001.avro
-rw-r--r-- 1 maxscale maxscale 5120 Jul 1 06:50 avro.index
-rw-r--r-- 1 maxscale maxscale 69 Jul 1 07:29 avro-conversion.ini
[root@maxscale102 maxscale]
In case of some issue with the data streaming, one can try to restart the MaxScale service. If there's still a failure, as a last resort one can delete *.avsc
, *.avro
, avro.index
, and avro-conversion.ini
and restart the MaxScale service. It should be able to recover.
Since the mxs_adapter
is running, when new data is inserted in the cdc_test.cdc_tab
table, it will automatically be streamed into ColumnStore
.
Conclusion
Using this setup, we can stream data directly from OLTP MariaDB TX not only to MariaDB AX but also to other sources that can take Avro/JSON data. There is a Kafka Adapter already available in the Data Adapters download page.
References
- MariaDB TX download (includes MariaDB MaxScale)
- MariaDB AX download (includes MariaDB ColumnStore)
- ColumnStore Distributed Installation Guide Using a Tarball
- CDC Connectors
- CDC Adapter
Thanks!
Published at DZone with permission of Faisal Saeed, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments