Over a million developers have joined DZone.

Pipelining Versus Parallel Query Execution With MySQL 5.7 X Plugin

This post is a look at how the MySQL 5.7 X Plugin performs when querying both in pipeline and in parallel.

· Database Zone

Build fast, scale big with MongoDB Atlas, a hosted service for the leading NoSQL database. Try it now! Brought to you in partnership with MongoDB.

Pipelining versus Parallel Query Execution

In this blog post, we'll look at pipelining versus parallel query execution when using X Plugin for MySQL 5.7.

In my previous blog post, I showed how to use X Plugin for MySQL 5.7 for parallel query execution. The tricks I used to make it work:

  • Partitioning by hash
  • Open N connections to MySQL, where N = number of CPU cores

I had to do it manually (as well as to sort the result at the end) as X Plugin only supports "pipelining" (which only saves the round trip time) and does not "multiplex" connections to MySQL (MySQL does not use multiple CPU cores for a single query).

TL;DR Version

In this (long) post I'm playing with MySQL 5.7 X Plugin / X Protocol and document store. Here is the summary:

  1. X Plugin does not "multiplex" connections/sessions to MySQL. Similar to the original protocol, one connection to X Plugin will result in one session open to MySQL.
  2. An X Plugin query (if the library supports it) returns immediately and does not wait until the query is finished (async call). MySQL works like a queue.
  3. X Plugin does not have any additional server-level durability settings. Unless you check or wait for the acknowledgement (which is asynchronous) from the server, the data might or might not be written into MySQL ("fire and forget").

At the same time, X Protocol can be helpful if:

  • We want to implement an asynchronous client (i.e., we do not want to block the network communication such as downloading or API calls) when the MySQL table is locked.
  • We want to use MySQL as a queue and save the round-trip time.

Benchmark Results

I've done a couple of tests comparing the results between "pipelining" versus "parallelizing" versus a single query. Here are the results:

  1. Parallel queries with NodeJS:
  2. $time node async_wikistats.js
    ...
    All done! Total: 17753
    ...
    real 0m30.668s
    user 0m0.256s
    sys 0m0.028s
    $ time node async_wikistats_pipeline.js
    ...
    All done! Total: 17753
    ...
    real 5m39.666s
    user 0m0.212s
    sys 0m0.024s


    In the pipeline with NojeJS, I’m reusing the same connection (and do not open a new one for each thread).

  3. Direct query – partitioned table:

  4. Direct query – non-partitioned table.

Advantages of Pipelines With X Plugin 

Although pipelining with X Plugin does not significantly increase query response time (it can reduce the total latency), it might be helpful in some cases. For example, let’s say we are downloading something from the Internet and need to save the progress of the download as well as the metadata for the document. In this example, I use youtube-dl to search and download the metadata about YouTube videos, then save the metadata JSON into MySQL 5.7 Document Store. Here is the code:

var mysqlx = require('mysqlx');
# This is the same as running $ youtube-dl -j -i ytsearch100:"mysql 5.7"
const spawn = require('child_process').spawn;
const yt = spawn('youtube-dl', ['-j', '-i', 'ytsearch100:"mysql 5.7"'], {maxBuffer: 1024 * 1024 * 128});

var mySession =
mysqlx.getSession({
    host: 'localhost',
    port: 33060,
    dbUser: 'root',
    dbPassword: '<your password>'
    });

yt.stdout.on('data', (data) => {
    try {
    dataObj = JSON.parse(data);
    console.log(dataObj.fulltitle);
    mySession.then(session => {
        session.getSchema("yt").getCollection("youtube").add(  dataObj  )
        .execute(function (row) {
        }).catch(err => {
            console.log(err);
            })
        .then( function (notices) { console.log("Wrote to MySQL: " + JSON.stringify(notices))  });
        }).catch(function (err) {
            console.log(err);
            process.exit();
            });
        } catch (e) {
            console.log(" --- Can't parse json" + e );
        }
    });
yt.stderr.on('data', (data) => {
    console.log("Error receiving data");
    });
yt.on('close', (code) => {
  console.log(`child process exited with code ${code}`);
  mySession.then(session => {session.close() } );
    });


  • session 1: $ node yt.js > test_lock_table.log
  • session 2:
mysql> lock table youtube read; select sleep(10); unlock tables;
Query OK, 0 rows affected (0.00 sec)
+-----------+
| sleep(10) |
+-----------+
|         0 |
+-----------+
1 row in set (10.01 sec)
Query OK, 0 rows affected (0.00 sec)


Results:

...
Upgrade MySQL Server from 5.5 to 5.7
... => wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["d4d62a8a-fbfa-05ab-2110-2fd5cf6d"]}}
OSC15 - Georgi Kodinov - Secure Deployment Changes Coming in MySQL 5.7
... => wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["8ac1cdb9-1499-544c-da2a-5db1ccf5"]}}
MySQL 5.7: Create JSON string using mysql
FreeBSD 10.3 - Instalación de MySQL 5.7 desde Código Fuente - Source Code
Webinar replay: How To Upgrade to MySQL 5.7 - The Best Practices - part 1
How to install MySQL Server on Mac OS X Yosemite - ltamTube
Webinar replay: How To Upgrade to MySQL 5.7 - The Best Practices - part 4
COMO INSTALAR MYSQL VERSION 5.7.13
MySQL and JSON
MySQL 5.7: Merge JSON data using MySQL
... => wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["a11ff369-6f23-11e9-187b-e3713e6e"]}}
... => wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["06143a61-4add-79da-0e1d-c2b52cf6"]}}
... => wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["1eb94ef4-db63-cb75-767e-e1555549"]}}
... => wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["e25f15b5-8c19-9531-ed69-7b46807a"]}}
... => wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["02b5a4c9-6a21-f263-90d5-cd761906"]}}
... => wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["e0bef958-10af-b181-81cd-5debaaa0"]}}
... => wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["f48fa635-fa63-7481-0668-addabbac"]}}
... => wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["557fa5c5-3c8a-fe01-c17c-549c557e"]}}
MySQL 5.7 Install and Configuration on Ubuntu 14.04
... => wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["456b11d8-ba03-0aec-8e06-9517c6e1"]}}
MySQL WorkBench 6.3 installation on Ubuntu 14.04
... => wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["0b651987-9b23-b5e0-f8f7-49b8ba5c"]}}
Going through era of IoT with MySQL 5.7 - FOSSASIA 2016
... => wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["e133746c-836c-a7e0-3893-292a7429"]}}
MySQL 5.7: MySQL JSON operator example
... => wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["4d13830d-7b30-5b31-d068-c7305e0a"]}}


This can be very helpful when running a "download" process, and the network is a bottleneck. In a traditional synchronous query execution, when we lock a table the application gets blocked (including the network communication). With NodeJS and X Plugin, the download part will proceed with MySQL acting as a queue.

Pipeline Durability

How "durable" this pipeline, you might ask. In other words, what will happen if I will kill the connection? To test it out, I have (once again) locked the table (but now before starting the Node.js), killed the connection and finally unlocked the table. Here are the results:

Session 1:
----------
mysql> truncate table youtube_new;
Query OK, 0 rows affected (0.25 sec)
mysql> lock table youtube_new read;
Query OK, 0 rows affected (0.00 sec)
mysql> select count(*) from youtube_new;
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.00 sec)
Session 2:
----------
(when table is locked)
$ node yt1.js
11 03  MyISAM
Switching to InnoDB from MyISAM
tablas InnoDB a MyISAM
MongoDB vs MyISAM (MariaDB/MySQL)
MySQL Tutorial 35 - Foreign Key Constraints for the InnoDB Storage Engine
phpmyadmin foreign keys myisam innodb
Convert or change database manual from Myisam to Innodb
... >100 other results omited ...
^C
Session 1:
----------
mysql> select count(*) from youtube_new;
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.00 sec)
     Id: 4916
   User: root
   Host: localhost:33221
     db: NULL
Command: Query
   Time: 28
  State: Waiting for table metadata lock
   Info: PLUGIN: INSERT INTO `iot`.`youtube_new` (doc) VALUES ('{"upload_date":"20140319","protocol":"
mysql> unlock table;
Query OK, 0 rows affected (0.00 sec)
mysql> select count(*) from youtube_new;
+----------+
| count(*) |
+----------+
|        2 |
+----------+
1 row in set (0.00 sec)
mysql>  select json_unquote(doc->'$.title') from youtube_new;
+---------------------------------+
| json_unquote(doc->'$.title')    |
+---------------------------------+
| 11 03  MyISAM                   |
| Switching to InnoDB from MyISAM |
+---------------------------------+
2 rows in set (0.00 sec)


Please note: In the above, there isn’t a single acknowledgement from the MySQL server. When code receives a response from MySQL it prints "Wrote to MySQL: {"_state":{"rows_affected":1,"doc_ids":["…"]}}". Also, note that when the connection was killed the MySQL process is still there, waiting on the table lock.

What is interesting here is is that only two rows have been inserted into the document store. Is there a "history length" here or some other buffer that we can increase? I've asked Jan Kneschke, one of the authors of the X Protocol, and the answers were:

  • Q: Is there any history length or any buffer and can we tune it?
    • A: There is no “history” or “buffer” at all, it is all at the connector level.
  • Q: Then why were 2 rows finally inserted?
    • To answer this question I’ve collected tcpdump to port 33060 (X Protocol), see below

This is very important information! Keep in mind that the asynchronous pipeline has no durability settings: if the application fails and there are some pending writes, those writes can be lost (or could be written).

To fully understand how the protocol works, I’ve captured tcpdump (Jan Kneschke helped me to analyze it):

tcpdump -i lo -s0 -w tests/node-js-pipelining.pcap "tcp port 33060"


This is what is happening:

  • When I hit CTRL+C, Node closes the connection. As the table is still locked, MySQL can't write to it and will not send the result of the insert back.
  • When the table is unlocked, it starts the first statement despite the fact that the connection has been closed. It then acknowledges the first insert and starts the second one.
  • However, at this point the script (client) has already closed the connection and the final packet (write done, here is the id) gets denied. The X Plugin then finds out that the client closed the connection and stops executing the pipeline.

Actually, this is very similar to how the original MySQL protocol worked. If we kill the script/application, it doesn’t automatically kill the MySQL connection (unless you hit CTRL+C in the MySQL client, sends the kill signal) and the connection waits for the table to get unlocked. When the table is unlocked, it inserts the first statement from a file.

Session 1
---------
mysql> select * from t_sql;
Empty set (0.00 sec)
mysql> lock table t_sql read;
Query OK, 0 rows affected (0.00 sec)
Session 2:
----------
$ mysql iot < t.sql
$ kill -9 ...
[3]   Killed                  mysql iot < t.sql
Session 1:
----------
mysql> show processlist;
+------+------+-----------------+------+---------+---------+---------------------------------+-----------------------------------------------+
| Id   | User | Host            | db   | Command | Time    | State                           | Info                                          |
+------+------+-----------------+------+---------+---------+---------------------------------+-----------------------------------------------+
| 4913 | root | localhost       | iot  | Query   |      41 | Waiting for table metadata lock | insert into t_sql  values('{"test_field":0}') |
+------+------+-----------------+------+---------+---------+---------------------------------+-----------------------------------------------+
4 rows in set (0.00 sec)
mysql> unlock tables;
Query OK, 0 rows affected (0.00 sec)
mysql> select * from t_sql;
+-------------------+
| doc               |
+-------------------+
| {"test_field": 0} |
+-------------------+
1 row in set (0.00 sec)

Enforcing Unique Checks

If I restart my script, it finds the same videos again. We will probably need to enforce the consistency of our data. By default the plugin generates the unique key (_id) for the document, so it prevents inserting the duplicates.

Another way to enforce the unique checks is to create a unique key for youtube id. Here is the updated table structure:

CREATE TABLE `youtube` (
    `doc` json DEFAULT NULL,
    `youtube_id` varchar(11) GENERATED ALWAYS AS (json_unquote(json_extract(`doc`,'$.id'))) STORED NOT NULL,
    UNIQUE KEY `youtube_id` (`youtube_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4


I’ve changed the default "_id" column to the YouTube's unique ID. Now when I restart the script it shows:

MySQL 5.7: Merge JSON data using MySQL
{ [Error: Document contains a field value that is not unique but required to be]
  info:
     { severity: 0,
         code: 5116,
         msg: 'Document contains a field value that is not unique but required to be',
         sql_state: 'HY000' } }
... => wrote to MySQL: undefined


Conclusion

Although X Plugin pipelining does not necessarily significantly increase query response (it might save the roundtrip time) it can be helpful for some applications.We might not want to block the network communication (i.e., downloading or API calls) when the MySQL table is locked, for example. At the same time, unless you check/wait for the acknowledgement from the server, the data might or might not be written into MySQL.

Bonus: data analysis

Now we can see what we have downloaded. There are a number of interesting fields in the result:

"is_live": null,
"license": "Standard YouTube License",
"duration": 2965,
"end_time": null,
"playlist": ""mysql 5.7"",
"protocol": "https",
"uploader": "YUI Library",
"_filename": "Douglas Crockford - The JSON Saga--C-JoyNuQJs.mp4",
"age_limit": 0,
"alt_title": null,
"extractor": "youtube",
"format_id": "18",
"fulltitle": "Douglas Crockford: The JSON Saga",
"n_entries": 571,
"subtitles": {},
"thumbnail": "https://i.ytimg.com/vi/-C-JoyNuQJs/hqdefault.jpg",
"categories": ["Science & Technology"],
"display_id": "-C-JoyNuQJs",
"like_count": 251,
"player_url": null,
"resolution": "640x360",
"start_time": null,
"thumbnails": [{
"id": "0",
"url": "https://i.ytimg.com/vi/-C-JoyNuQJs/hqdefault.jpg"
}],
"view_count": 36538,
"annotations": null,
"description": "Yahoo! JavaScript architect Douglas Crockford tells the story of how JSON was discovered and how it became a major standard for describing data.",
"format_note": "medium",
"playlist_id": ""mysql 5.7"",
"upload_date": "20110828",
"uploader_id": "yuilibrary",
"webpage_url": "https://www.youtube.com/watch?v=-C-JoyNuQJs",
"uploader_url": "http://www.youtube.com/user/yuilibrary",
"dislike_count": 5,
"extractor_key": "Youtube",
"average_rating": 4.921875,
"playlist_index": 223,
"playlist_title": null,
"automatic_captions": {},
"requested_subtitles": null,
"webpage_url_basename": "-C-JoyNuQJs"


We can see the most popular videos. To do that I’ve added one more virtual field on view_count, and created an index on it:

CREATE TABLE `youtube` (
    `doc` json DEFAULT NULL,
    `youtube_id` varchar(11) GENERATED ALWAYS AS (json_unquote(json_extract(`doc`,'$.id'))) STORED NOT NULL,
    `view_count` int(11) GENERATED ALWAYS AS (json_unquote(json_extract(`doc`,'$.view_count'))) VIRTUAL,
    UNIQUE KEY `youtube_id` (`youtube_id`),
    KEY `view_count` (`view_count`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4


Now it's easier than ever to get started with MongoDB, the database that allows startups and enterprises alike to rapidly build planet-scale apps. Introducing MongoDB Atlas, the official hosted service for the database on AWS. Try it now! Brought to you in partnership with MongoDB.

Topics:
protocol ,parallel ,mysql ,query ,table

Published at DZone with permission of Alexander Rubin, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}