How to Spin Up an HTAP Database in 5 Minutes With TiDB + TiSpark
Let's look at how to spin up a standard TiDB cluster using Docker Compose on your local computer, so you can get a taste of its hybrid power.
Join the DZone community and get the full member experience.
Join For FreeTiDB is an open-source distributed Hybrid Transactional and Analytical Processing (HTAP) database built by PingCAP, powering companies to do real-time data analytics on live transactional data in the same data warehouse — no more ETL, no more T+1, no more delays. More than 200 companies are now using TiDB in production. Its 2.0 version was launched in late April 2018 (read about it in this post).
In this 5-minute tutorial, we will show you how to spin up a standard TiDB cluster using Docker Compose on your local computer, so you can get a taste of its hybrid power, before using it for work or your own project in production. A standard TiDB cluster includes TiDB (MySQL compatible stateless SQL layer), TiKV (a distributed transactional key-value store where the data is stored), and TiSpark (an Apache Spark plug-in that powers complex analytical queries within the TiDB ecosystem).
Ready? Let’s get started!
Setting Up
Before we start deploying TiDB, we’ll need a few things first: wget
, Git, Docker, and a MySQL client. If you don’t have them installed already, here are the instructions to get them.
Setting Up MacOS
- To install
brew
, go here. - To install
wget
, use the command below in your Terminal:brew install wget --with-libressl
- To install Git, use the command below in your Terminal:
brew install git
- Install Docker: https://www.docker.com/community-edition.
- Install a MySQL client:
brew install mysql-client
Setting Up Linux
- To install
wget
, Git, and MySQL, use the command below in your Terminal:- For CentOS/Fedora:
sudo yum install wget git mysql
- For Ubuntu/Debian:
sudo apt install wget git mysql-client
- For CentOS/Fedora:
- To install Docker, go here.
After Docker is installed, use the following command to start it and add the current user to the Docker user group:
You need to log out and back in for this to take effect. Then use the following command to verify that Docker is running normally:sudo systemctl start docker # start docker daemo
docker info
Spin Up a TiDB Cluster
Now that Docker is set up, let’s deploy TiDB!
- Clone TiDB Docker Compose onto your laptop:
git clone https://github.com/pingcap/tidb-docker-compose
- Optionally, you can use
docker-compose pull
to get the latest Docker images. - Change your directory to
tidb-docker-compose
:cd tidb-docker-compose
- Deploy TiDB on your laptop:
docker-compose up -d
You can see messages in your terminal launching the default components of a TiDB cluster: 1 TiDB instance, 3 TiKV instances, 3 Placement Driver (PD) instances, Prometheus, Grafana, 2 TiSpark instances (one master, one slave), and a TiDB-Vision instance.
Your terminal will show something like this:
Congratulations! You have just deployed a TiDB cluster on your laptop!
To check if your deployment is successful:
- Go to: http://localhost:3000 to launch Grafana with default user/password: admin/admin.
- Go to
Home
and click on the pull down menu to see dashboards of different TiDB components: TiDB, TiKV, PD, entire cluster. - You will see a dashboard full of panels and stats on your current TiDB cluster. Feel free to play around in Grafana, e.g.
TiDB-Cluster-TiKV
, orTiDB-Cluster-PD
.
- Go to
Grafana display of TiKV metrics
- Now go to TiDB-vision at http://localhost:8010 (TiDB-vision is a cluster visualization tool to see data transfer and load-balancing inside your cluster).
- You can see a ring of 3 TiKV nodes. TiKV applies the Raft consensus protocol to provide strong consistency and high availability. Light grey blocks are empty spaces, dark grey blocks are Raft followers, and dark green blocks are Raft leaders. If you see flashing green bands, that represent communications between TiKV nodes.
- It looks something like this:
TiDB-vision
Test TiDB Compatibility With MySQL
As we mentioned, TiDB is MySQL compatible. You can use TiDB as MySQL slaves with instant horizontal scalability. That’s how many innovative tech companies, like Mobike, use TiDB.
To test out this MySQL compatibility:
- Keep the
tidb-docker-compose
running, and launch a new Terminal tab or window. - Add MySQL to the path (if you haven’t already):
export PATH=${PATH}:/usr/local/mysql/bin
- Launch a MySQL client that connects to TiDB:
mysql -h 127.0.0.1 -P 4000 -u root
Result: You will see the following message, which shows that TiDB is indeed connected to your MySQL instance:
Note: TiDB version number may be different.
Server version: 5.7.10-TiDB-v2.0.0-rc.4-31
The Compatibility of TiDB with MySQL
Let’s Get Some Data!
Now we will grab some sample data that we can play around with.
- Open a new Terminal tab or window and download the
tispark-sample-data.tar.gz
file.wget http://download.pingcap.org/tispark-sample-data.tar.gz
- Unzip the sample file:
tar zxvf tispark-sample-data.tar.gz
- Inject the sample test data from sample data folder to MySQL:
This will take a few seconds.mysql --local-infile=1 -u root -h 127.0.0.1 -P 4000 < tispark-sample-data/dss.ddl
- Go back to your MySQL client window or tab, and see what’s in there:
Result: You can see theSHOW DATABASES;
TPCH_001
database on the list. That’s the sample data we just ported over.Now let’s go into
TPCH_001
:
Result: You can see all the tables inUSE TPCH_001; SHOW TABLES;
TPCH_001
, likeNATION
,ORDERS
, etc. - Let’s see what’s in the
NATION
table:SELECT * FROM NATION;
Result: You’ll see a list of countries with some keys and comments.
Launch TiSpark
Now let’s launch TiSpark, the last missing piece of our hybrid database puzzle.
- In the same window where you downloaded TiSpark sample data (or open a new tab), go back to the
tidb-docker-compose
directory. - Launch Spark within TiDB with the following command:
This will take a few minutes. Result: Now you can Spark!docker-compose exec tispark-master /opt/spark-2.1.1-bin-hadoop2.7/bin/spark-shell
- Use the following three commands, one by one, to bind TiSpark to this Spark instance and map to the database
TPCH_001
, the same sample data that are available in our MySQL instance:
It looks something like this:import org.apache.spark.sql.TiContext val ti = new TiContext(spark) ti.tidbMapDatabase("TPCH_001")
- Now, let’s see what’s in the
NATION
table (should be the same as what we saw on our MySQL client):
Result:spark.sql("select * from nation").show(30);
Let’s Get Hybrid!
Now, let’s go back to the MySQL tab or window, make some changes to our tables, and see if the changes show up on the TiSpark side.
- In the MySQL client, try this
UPDATE
:UPDATE NATION SET N_NATIONKEY=444 WHERE N_NAME="CANADA"; SELECT * FROM NATION;
- Then see if the update worked:
SELECT * FROM NATION;
- Now go to the TiSpark Terminal window, and see if you can see the same update:
Result: Thespark.sql("select * from nation").show(30);
UPDATE
you made on the MySQL side shows up immediately in TiSpark!
You can see that both the MySQL and TiSpark clients return the same results – fresh data for you to do analytics on right away. Voila!
Summary
With this simple deployment of TiDB on your local machine, you now have a functioning Hybrid Transactional and Analytical processing (HTAP) database. You can continue to make changes to the data in your MySQL client (simulating transactional workloads) and analyze the data with those changes in TiSpark (simulating real-time analytics).
Of course, launching TiDB on your local machine is purely for experimental purposes. If you are interested in trying out TiDB for your production environment, send us a note or reach out on our website. We’d be happy to help you!
Published at DZone with permission of Jin Queeny. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments