DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • How to Build a Full-Stack App With Next.js, Prisma, Postgres, and Fastify
  • How Doris Connects to Multiple Databases for Federated Analytics and Data Migration
  • Moving PeopleSoft ERP Data Between Databases With Data Mover Scripts
  • Snowflake and dbt Integration to Enable Advanced Analytics

Trending

  • Introduction to Tactical DDD With Java: Steps to Build Semantic Code
  • Offline-First Patch Management for 10,000 Edge Nodes: A Practical Architecture That Scales
  • The Agentic Agile Office: Streamlining Enterprise Agile With Autonomous AI Agents
  • How to Write for DZone Publications: Trend Reports and Refcards
  1. DZone
  2. Data Engineering
  3. Databases
  4. Build a Real-time Analytics Application With TiDB Cloud

Build a Real-time Analytics Application With TiDB Cloud

This tutorial will teach you how to build a prototype real-time analytics dashboard that runs on a TiDB Cloud PoC cluster.

By 
Phoebe He user avatar
Phoebe He
·
Chenhao Huang user avatar
Chenhao Huang
·
Aug. 24, 22 · Tutorial
Likes (1)
Comment
Save
Tweet
Share
5.1K Views

Join the DZone community and get the full member experience.

Join For Free

Disclaimer:

  • PingExpress_DemoCorp is a dummy company. It does NOT reflect or imply any real company.
  • This tutorial is for demonstration purposes only. Do NOT use any material (including but not limited to code and commands) from this tutorial in production environments.

PingExpress_DemoCorp is a supply chain management company in the United States. With more people shopping online due to the pandemic,  PingExpress_DemoCorp’s business has scaled rapidly. They’re delivering tens of billions of packages a year.

With so many deliveries, a key part of their success is accurate and efficient package tracking. Business managers need to know where packages are so they can identify potential traffic blocks and rearrange delivery routes. Customers need accurate delivery dates so they can plan ahead. Therefore, real-time tracking, status updates, and a detailed dashboard are very important to PingExpress_DemoCorp.

In this tutorial, you will build a prototype for PingExpress_DemoCorp’s real-time analytics dashboard that runs on a TiDB Cloud Proof-of-Concept (PoC) cluster.

PingExpress_DemoCorp divides a package’s lifecycle into four stages:

  • “1_pkg_in”: The package is collected by the company.
  • “2_ori”: The package arrives at the processing center of the origin state.
  • “3_des”: The package reaches the processing center of the destination state.
  • “4_pkg_out”: The package is delivered to the end user.

Before You Begin

You should have the following software and packages installed:

  • Python (v. 3+)
  • MySQL connector for Python
  • SQLAlchemy
  • sqlalchemy-tidb
  • Metabase

Note: It is recommended to use pip3 to install packages, such as SQLAlchemy. We also suggest NOT to use the Mac application version for Metabase. It is gradually being phased out. You may use the jar version instead.

Create a TiDB PoC Cluster

  1. Log in to your TiDB Cloud account.
    If you do not have a TiDB Cloud account, click “Try TiDB Cloud for Free” above to sign up for an account.
  2. Click Create a Cluster to go to the plan selection page.
  3. Click apply for a PoC Cluster. It usually takes 24 ~ 48 hours to approve your request.
  4. On the Create a Cluster page, set up Cluster Name and Root Password. (In this tutorial, we will call our cluster PingExpressDB).
  5. Select the Region as AP-Southeast-1 (Singapore).
  6. Click Next.
  7. On the second page of the set, one TiDB node, three TiKV nodes, and one TiFlash node.
  8. Click Next.
  9. On the third page, review your settings and click Create Cluster.

Your TiDB Cloud cluster will be created in approximately 5 to 10 minutes.

Connect to TiDB Cloud

  1. Go to TiDBcloud and sign in to your cluster.
  2. On the TiDB Cloud console, in the upper right of the pane, click Connect.
    The Connect to TiDB dialog displays.
  3. Create the traffic filter for the cluster.
    1. Click Allow Access from Anywhere in Step 1.
    2. Click Create Filter.
      Note: For production environments, do not enable Allow Access from Anywhere. Note the information between the -h and -P parameters; you’ll need this for a later step. For example, mysql -u root -h tidb.xxx.xxxxx.us-west-prod.aws.tidbcloud.com -P 4000 -p.
  4. Click the Web SQL Shell tab.
  5. Click Open SQL Shell and enter the password for the cluster.

You are now able to write SQL commands.

Import the Sample Data

  1. Create a database.
     
    CREATE DATABASE PingExpressDB;


  2. Create a user of the database. The user name is 'PingExpress_client', and you should set your own password to replace '<pwd>', such as '123'.
     
    CREATE USER 'PingExpress_client' IDENTIFIED BY '<pwd>';


  3. Grant all privileges to the user you just created.
    GRANT ALL PRIVILEGES ON PingExpressDB.* TO 'PingExpress_client';
  4. Navigate to the TiDB Cloud Clusters page and find your cluster.
  5. In the upper right corner of the pane, click Import.
    The Data Import Task page is displayed.
  6. Enter the following information, and click Importto import the sample data:
    • Data Source Type: Select AWS S3
    • Bucket URL: s3://pingexpress-na/backup_dumpling2/
    • Bucket Region: Asia Pacific (Singapore)
    • Data Format: Select TiDB Dumpling
    • Setup Credentials: arn:aws:iam::577523860935:role/pingexpress-na-1g
    • Target Database:
      • Username: root.
      • Password: Enter your root password.
    • DB/Tables Filter: Leave this field blank.
      The data import process takes about 5 minutes. When the data import progress bar shows Success, you have successfully imported the sample data and the database schema in your database.
      Warning: Do not manipulate the data until the importing process finishes. Otherwise, the importing process fails.
  7. Check the imported data.
    1. Open Web SQL Shell and enter the cluster password. (Hint: On the TiDB Cloud console, in the upper right of the pane, click Connect, then click the Web SQL Shell tab.)
    2. Switch to PingExpressDB.
       
      USE PingExpressDB;


    3. Check the number of records
       
      SELECT COUNT(*) FROM packages;


      The result should be 13000000.

  8. Create TiFlash replicas for the table packages.
     
    ALTER TABLE packages SET TiFlash REPLICA 1;


    Wait for a few minutes and then run the following query to check whether the TiFlash node is ready.

     
    SELECT * FROM information_schema.TIFLASH_REPLICA;


When the TiFlash node is ready, the values of the “AVAILABLE” and “PROGRESS” columns turn to 1.


Connect to TiDB Cloud via Metabase

In this section, we walk you through the process of connecting to TiDB Cloud via Metabase. 

  1. Start the Metabase application and click Let’s get started.
  2. Select your preferred language and click Next.
  3. Enter your information and account details in Step 2 and click Next.
  4. In the Add your data step, click the Database type drop down menu and select MySQL.
  5. Specify the following settings:
    • Name: PingExpressDB
    • Host: <your_tidb_cloud_ip_address_obtained_previously>
    • Port: 4000
    • Database name: PingExpressDB
    • Username: root
    • Password: <password_of_your_tidb_cluster>
  6. Scroll down to the bottom and click Next.
  7. For the Usage data preference step, change your preference if needed and click Next.
  8. Click Take me to Metabase.

Build Your Dashboards in Metabase

This section shows you how to build a Metabase dashboard. The dashboard contains five figures (or “questions” in Metabase terminology). We include the instructions to build the whole dashboard. However, building the first three of them is enough for the purpose of this tutorial. The last two figures are optional. The following figure shows the real-time Metabase dashboard for PingExpress_DemoCorp.

the real-time Metabase dashboard for PingExpress_DemoCorp.

  1. Create a dashboard.
    1. In the top right corner of the dashboard, click the + sign and then choose New Dashboard.
    2. Enter the name PingExpress_dashboard.
    3. Click Create.
  2. Add a question.
    1. In the top right corner, click Ask a question on the top right corner, and then choose Native query.
    2. Select PingExpressDB as the database.
    3. Display the total number of packages delivered. Enter the following query and click the right side of the screen to run it:
       
      SELECT COUNT(*) FROM packages WHERE transaction_kind="4_pkg_out";


  3. Save the question.
    1. In the upper right corner, click Save.
    2. Enter the name Total packages delivered.
    3. When being asked if you would like to add this question to the dashboard, click Yes please! and choose PingExpress_dashboard.
      The result will now appear on the dashboard.
    4. Click Save.
  4. Repeat steps 2 and 3 for the second question, “Number of packages on the way.” This is the query to use:
     
    SELECT COUNT(*) FROM packages WHERE transaction_kind != "4_pkg_out";


  5. Visualize the Number of packages in process in each state.
    1. Repeat step 3 and use the following query instead.
       
      SELECT start_state, COUNT(package_id) FROM packages  WHERE transaction_kind = "1_pkg_in" GROUP BY start_state UNION SELECT destination_state, COUNT(package_id) FROM packages  WHERE transaction_kind = "3_des" GROUP BY destination_state


    2. After getting the result, click the Visualization button and choose Map. For the map options:
      • Map Type: Region map
      • Region Map: United States
        Leave everything else as default.
    3. Repeat step 4 and add this question to the dashboard.
      add this question to the dashboard.
  6. (Optional) Repeat the previous steps to add two more queries:
    • Number of packages in each stage (pie chart):
       
      SELECT transaction_kind, count(*)  FROM packages  WHERE transaction_kind != "4_pkg_out" GROUP BY transaction_kind


    • Number of new packages per day (line chart):
       
      SELECT DATE(start_time), count(*)  FROM packages  WHERE transaction_kind = "1_pkg_in" AND start_time > DATE(NOW()) - INTERVAL 30 DAY GROUP BY DATE(start_time)


Execute the Program

  1. Clone the program to your local machine or any virtual machines provided by your cloud vendor.
     
    git clone https://github.com/pingcap/education.git


  2. Go to the directory for PingExpress_DemoCorp.
     
    cd PingExpress_DemoCorp


  3. Update the connection string in the main method of PingExpress_client.py. The connection string is in the format:
     
    tidb://<client_name>:<pwd>@xxx.xxxx.xxxx.xxxx.prod.aws.tidbcloud.com:4000/<DB_name>


    For example, if the client name is PingExpress_client,  the password is 123, and the database name is PingExpressDB, the connection string is:

     
    tidb://PingExpress_client:[email protected]:4000/PingExpressDB


    You can find the host address for TiDB Cloud in Connect to TiDB – Standard Connection – Step 2 in the TiDB Console, in the format: xxx.xxxx.xxxx.xxxx.prod.aws.tidbcloud.com.

  4. Update the timestamp to match the current date:
     
    python3 PingExpress_client.py --update_timestamp


    It may take 5 ~ 10 minutes.

  5. Execute the program:
     
    python3 PingExpress_client.py --execute --clients 50 --duration 100


    The command assumes there will be 50 clients, and the program runs for 200 seconds.

  6. Observe the results. When the program starts running, you can navigate to the Metabase dashboard and observe the dashboard being updated in real time.
Analytics Database TiDB application Build (game engine) Cloud Dashboard (Mac OS) sql

Published at DZone with permission of Phoebe He. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • How to Build a Full-Stack App With Next.js, Prisma, Postgres, and Fastify
  • How Doris Connects to Multiple Databases for Federated Analytics and Data Migration
  • Moving PeopleSoft ERP Data Between Databases With Data Mover Scripts
  • Snowflake and dbt Integration to Enable Advanced Analytics

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook