DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Last call! Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • SQL Query Performance Tuning in MySQL
  • Copy SQL Execution Plan from One Database to Another in Oracle 19c
  • How to Repair Corrupt MySQL Database Tables Step-by-Step
  • Low-Level Optimizations in ClickHouse: Utilizing Branch Prediction and SIMD To Speed Up Query Execution

Trending

  • Building Enterprise-Ready Landing Zones: Beyond the Initial Setup
  • Streamlining Event Data in Event-Driven Ansible
  • AI Meets Vector Databases: Redefining Data Retrieval in the Age of Intelligence
  • Docker Model Runner: Streamlining AI Deployment for Developers
  1. DZone
  2. Data Engineering
  3. Databases
  4. Increasing Slow Query Performance with the Parallel Query Execution

Increasing Slow Query Performance with the Parallel Query Execution

By 
Peter Zaitsev user avatar
Peter Zaitsev
·
Apr. 25, 15 · Interview
Likes (0)
Comment
Save
Tweet
Share
12.9K Views

Join the DZone community and get the full member experience.

Join For Free

[This article was written by Alexander Rubin]

MySQL and Scaling-up (using more powerful hardware) was always a hot topic. Originally MySQL did not scale well with multiple CPUs; there were times when InnoDB performed poorer with more  CPU cores than with less CPU cores. MySQL 5.6 can scale significantly better; however there is still 1 big limitation: 1 SQL query will eventually use only 1 CPU core (no parallelism). Here is what I mean by that: let’s say we have a complex query which will need to scan million of rows and may need to create a temporary table; in this case MySQL will not be able to scan the table in multiple threads (even with partitioning) so the single query will not be faster on the more powerful server. On the contrary, a server with more slower CPUs will show worse performance than the server with less (but faster) CPUs.

To address this issue we can use a parallel query execution. Vadim wrote about the PHP asynchronous calls for MySQL. Another way to increase the parallelism will be to use “sharding” approach, for  example with Shard Query. I’ve decided to test out the parallel (asynchronous) query execution with relatively large table: I’ve used the US Flights Ontime performance database, which was originally used by Vadim in the old post Analyzing air traffic performance. Let’s see how this can help us increase performance of the complex query reports.

Parallel Query Example

To illustrate the parallel query execution with MySQL I’ve created the following table:

CREATE TABLE `ontime` (
`YearD` year(4) NOT NULL,
`Quarter` tinyint(4) DEFAULT NULL,
`MonthD` tinyint(4) DEFAULT NULL,
`DayofMonth` tinyint(4) DEFAULT NULL,
`DayOfWeek` tinyint(4) DEFAULT NULL,
`FlightDate` date DEFAULT NULL,
`UniqueCarrier` char(7) DEFAULT NULL,
`AirlineID` int(11) DEFAULT NULL,
`Carrier` char(2) DEFAULT NULL,
`TailNum` varchar(50) DEFAULT NULL,
`FlightNum` varchar(10) DEFAULT NULL,
`OriginAirportID` int(11) DEFAULT NULL,
`OriginAirportSeqID` int(11) DEFAULT NULL,
`OriginCityMarketID` int(11) DEFAULT NULL,
`Origin` char(5) DEFAULT NULL,
`OriginCityName` varchar(100) DEFAULT NULL,
`OriginState` char(2) DEFAULT NULL,
`OriginStateFips` varchar(10) DEFAULT NULL,
`OriginStateName` varchar(100) DEFAULT NULL,
`OriginWac` int(11) DEFAULT NULL,
`DestAirportID` int(11) DEFAULT NULL,
`DestAirportSeqID` int(11) DEFAULT NULL,
`DestCityMarketID` int(11) DEFAULT NULL,
`Dest` char(5) DEFAULT NULL,
-- ... (removed number of fields)
`id` int(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`),
KEY `YearD` (`YearD`),
KEY `Carrier` (`Carrier`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

And loaded 26 years of data into it. The table is 56G with ~152M rows.

Software: Percona 5.6.15-63.0. Hardware: Supermicro; X8DTG-D; 48G of RAM; 24xIntel(R) Xeon(R) CPU L5639 @ 2.13GHz, 1xSSD drive (250G)

So we have 24 relatively slow CPUs

Simple query

Now we can run some queries. The first query is very simple: find all flights per year (in the US):

select yeard, count(*) from ontime group by yeard

As we have the index on YearD, the query will use the index:

mysql> explain select yeard, count(*) from ontime group by yeardG
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: ontime
type: index
possible_keys: YearD,comb1
key: YearD
key_len: 1
ref: NULL
rows: 148046200
Extra: Using index
1 row in set (0.00 sec)

The query is simple, however, it will have to scan 150M rows. Here is the results of the query (cached):

mysql> select yeard, count(*) from ontime group by yeard;
+-------+----------+
| yeard | count(*) |
+-------+----------+
| 1988  | 5202096  |
| 1989  | 5041200  |
| 1990  | 5270893  |
| 1991  | 5076925  |
| 1992  | 5092157  |
| 1993  | 5070501  |
| 1994  | 5180048  |
| 1995  | 5327435  |
| 1996  | 5351983  |
| 1997  | 5411843  |
| 1998  | 5384721  |
| 1999  | 5527884  |
| 2000  | 5683047  |
| 2001  | 5967780  |
| 2002  | 5271359  |
| 2003  | 6488540  |
| 2004  | 7129270  |
| 2005  | 7140596  |
| 2006  | 7141922  |
| 2007  | 7455458  |
| 2008  | 7009726  |
| 2009  | 6450285  |
| 2010  | 6450117  |
| 2011  | 6085281  |
| 2012  | 6096762  |
| 2013  | 5349447  |
+-------+----------+
26 rows in set (54.10 sec)

The query took 54 seconds and utilized only 1 CPU core. However, this query is perfect for running in parallel.  We can run 26 parallel queries, each will count its own year. I’ve used the following shell script to run the queries in background:

#!/bin/bash
date
for y in {1988..2013}
do
  sql="select yeard, count(*) from ontime where yeard=$y"
  mysql -vvv ontime -e "$sql" &>par_sql1/$y.log &
done
wait
date

Here are the results:

par_sql1/1988.log:1 row in set (3.70 sec)
par_sql1/1989.log:1 row in set (4.08 sec)
par_sql1/1990.log:1 row in set (4.59 sec)
par_sql1/1991.log:1 row in set (4.26 sec)
par_sql1/1992.log:1 row in set (4.54 sec)
par_sql1/1993.log:1 row in set (2.78 sec)
par_sql1/1994.log:1 row in set (3.41 sec)
par_sql1/1995.log:1 row in set (4.87 sec)
par_sql1/1996.log:1 row in set (4.41 sec)
par_sql1/1997.log:1 row in set (3.69 sec)
par_sql1/1998.log:1 row in set (3.56 sec)
par_sql1/1999.log:1 row in set (4.47 sec)
par_sql1/2000.log:1 row in set (4.71 sec)
par_sql1/2001.log:1 row in set (4.81 sec)
par_sql1/2002.log:1 row in set (4.19 sec)
par_sql1/2003.log:1 row in set (4.04 sec)
par_sql1/2004.log:1 row in set (5.12 sec)
par_sql1/2005.log:1 row in set (5.10 sec)
par_sql1/2006.log:1 row in set (4.93 sec)
par_sql1/2007.log:1 row in set (5.29 sec)
par_sql1/2008.log:1 row in set (5.59 sec)
par_sql1/2009.log:1 row in set (4.44 sec)
par_sql1/2010.log:1 row in set (4.91 sec)
par_sql1/2011.log:1 row in set (5.08 sec)
par_sql1/2012.log:1 row in set (4.85 sec)
par_sql1/2013.log:1 row in set (4.56 sec)

Complex Query 

Now we can try more complex query. Lets imagine we want to find out which airlines have maximum delays for the flights inside continental US during the business days from 1988 to 2009 (I was trying to come up with the complex query with multiple conditions in the where clause).

select
   min(yeard), max(yeard), Carrier, count(*) as cnt,
   sum(ArrDelayMinutes>30) as flights_delayed,
   round(sum(ArrDelayMinutes>30)/count(*),2) as rate
FROM ontime
WHERE
   DayOfWeek not in (6,7) and OriginState not in ('AK', 'HI', 'PR', 'VI')
   and DestState not in ('AK', 'HI', 'PR', 'VI')
   and flightdate < '2010-01-01'
GROUP by carrier
HAVING cnt > 100000 and max(yeard) > 1990
ORDER by rate DESC

As the query has “group by” and “order by” plus multiple ranges in the where clause it will have to create a temporary table:

id: 1
select_type: SIMPLE
table: ontime
type: index
possible_keys: comb1
key: comb1
key_len: 9
ref: NULL
rows: 148046200
Extra: Using where; Using temporary; Using filesort

(for this query I’ve created the combined index: KEY comb1 (Carrier,YearD,ArrDelayMinutes)  to increase performance)

The query runs in ~15 minutes:

+------------+------------+---------+----------+-----------------+------+
| min(yeard) | max(yeard) | Carrier | cnt      | flights_delayed | rate |
+------------+------------+---------+----------+-----------------+------+
|       2003 |       2009 | EV      |  1454777 |          237698 | 0.16 |
|       2006 |       2009 | XE      |  1016010 |          152431 | 0.15 |
|       2006 |       2009 | YV      |   740608 |          110389 | 0.15 |
|       2003 |       2009 | B6      |   683874 |          103677 | 0.15 |
|       2003 |       2009 | FL      |  1082489 |          158748 | 0.15 |
|       2003 |       2005 | DH      |   501056 |           69833 | 0.14 |
|       2001 |       2009 | MQ      |  3238137 |          448037 | 0.14 |
|       2003 |       2006 | RU      |  1007248 |          126733 | 0.13 |
|       2004 |       2009 | OH      |  1195868 |          160071 | 0.13 |
|       2003 |       2006 | TZ      |   136735 |           16496 | 0.12 |
|       1988 |       2009 | UA      |  9593284 |         1197053 | 0.12 |
|       1988 |       2009 | AA      | 10600509 |         1185343 | 0.11 |
|       1988 |       2001 | TW      |  2659963 |          280741 | 0.11 |
|       1988 |       2009 | CO      |  6029149 |          673863 | 0.11 |
|       2007 |       2009 | 9E      |   577244 |           59440 | 0.10 |
|       1988 |       2009 | DL      | 11869471 |         1156267 | 0.10 |
|       1988 |       2009 | NW      |  7601727 |          725460 | 0.10 |
|       1988 |       2009 | AS      |  1506003 |          146920 | 0.10 |
|       2003 |       2009 | OO      |  2654259 |          257069 | 0.10 |
|       1988 |       2009 | US      | 10276941 |          991016 | 0.10 |
|       1988 |       1991 | PA      |   206841 |           19465 | 0.09 |
|       1988 |       2005 | HP      |  2607603 |          235675 | 0.09 |
|       1988 |       2009 | WN      | 12722174 |         1107840 | 0.09 |
|       2005 |       2009 | F9      |   307569 |           28679 | 0.09 |
+------------+------------+---------+----------+-----------------+------+
24 rows in set (15 min 56.40 sec)

Now we can split this query and run the 31 queries (=31 distinct airlines in this table) in parallel. I have used the following script:

date
for c in '9E' 'AA' 'AL' 'AQ' 'AS' 'B6' 'CO' 'DH' 'DL' 'EA' 'EV' 'F9' 'FL' 'HA' 'HP' 'ML' 'MQ' 'NW' 'OH' 'OO' 'PA' 'PI' 'PS' 'RU' 'TW' 'TZ' 'UA' 'US' 'WN' 'XE' 'YV'
do
   sql=" select min(yeard), max(yeard), Carrier, count(*) as cnt, sum(ArrDelayMinutes>30) as flights_delayed, round(sum(ArrDelayMinutes>30)/count(*),2) as rate FROM ontime WHERE DayOfWeek not in (6,7) and OriginState not in ('AK', 'HI', 'PR', 'VI') and DestState not in ('AK', 'HI', 'PR', 'VI') and flightdate < '2010-01-01' and carrier = '$c'"
   mysql -uroot -vvv ontime -e "$sql" &>par_sql_complex/$c.log &
done
wait
date

In this case we will also avoid creating temporary table  (as we have an index which starts with carrier).

Results: total time is 5 min 47 seconds (3x faster)

Start: 15:41:02 EST 2013
  End: 15:46:49 EST 2013

Per query statistics:

par_sql_complex/9E.log:1 row in set (44.47 sec)
par_sql_complex/AA.log:1 row in set (5 min 41.13 sec)
par_sql_complex/AL.log:1 row in set (15.81 sec)
par_sql_complex/AQ.log:1 row in set (14.52 sec)
par_sql_complex/AS.log:1 row in set (2 min 43.01 sec)
par_sql_complex/B6.log:1 row in set (1 min 26.06 sec)
par_sql_complex/CO.log:1 row in set (3 min 58.07 sec)
par_sql_complex/DH.log:1 row in set (31.30 sec)
par_sql_complex/DL.log:1 row in set (5 min 47.07 sec)
par_sql_complex/EA.log:1 row in set (28.58 sec)
par_sql_complex/EV.log:1 row in set (2 min 6.87 sec)
par_sql_complex/F9.log:1 row in set (46.18 sec)
par_sql_complex/FL.log:1 row in set (1 min 30.83 sec)
par_sql_complex/HA.log:1 row in set (39.42 sec)
par_sql_complex/HP.log:1 row in set (2 min 45.57 sec)
par_sql_complex/ML.log:1 row in set (4.64 sec)
par_sql_complex/MQ.log:1 row in set (2 min 22.55 sec)
par_sql_complex/NW.log:1 row in set (4 min 26.67 sec)
par_sql_complex/OH.log:1 row in set (1 min 9.67 sec)
par_sql_complex/OO.log:1 row in set (2 min 14.97 sec)
par_sql_complex/PA.log:1 row in set (17.62 sec)
par_sql_complex/PI.log:1 row in set (14.52 sec)
par_sql_complex/PS.log:1 row in set (3.46 sec)
par_sql_complex/RU.log:1 row in set (40.14 sec)
par_sql_complex/TW.log:1 row in set (2 min 32.32 sec)
par_sql_complex/TZ.log:1 row in set (14.16 sec)
par_sql_complex/UA.log:1 row in set (4 min 55.18 sec)
par_sql_complex/US.log:1 row in set (4 min 38.08 sec)
par_sql_complex/WN.log:1 row in set (4 min 56.12 sec)
par_sql_complex/XE.log:1 row in set (24.21 sec)
par_sql_complex/YV.log:1 row in set (20.82 sec)

As we can see there are large airlines (like AA, UA, US, DL, etc) which took most of the time. In this case the load will not be distributed evenly as in the previous example; however, by running the query in parallel we have got 3x times better response time on this server.

CPU utilization:

Cpu3 : 22.0%us, 1.2%sy, 0.0%ni, 74.4%id, 2.4%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu4 : 16.0%us, 0.0%sy, 0.0%ni, 84.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu5 : 39.0%us, 1.2%sy, 0.0%ni, 56.1%id, 3.7%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu6 : 33.3%us, 0.0%sy, 0.0%ni, 51.9%id, 13.6%wa, 0.0%hi, 1.2%si, 0.0%st
Cpu7 : 33.3%us, 1.2%sy, 0.0%ni, 48.8%id, 16.7%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu8 : 24.7%us, 0.0%sy, 0.0%ni, 60.5%id, 14.8%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu9 : 24.4%us, 0.0%sy, 0.0%ni, 56.1%id, 19.5%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu10 : 40.7%us, 0.0%sy, 0.0%ni, 56.8%id, 2.5%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu11 : 19.5%us, 1.2%sy, 0.0%ni, 65.9%id, 12.2%wa, 0.0%hi, 1.2%si, 0.0%st
Cpu12 : 40.2%us, 1.2%sy, 0.0%ni, 56.1%id, 2.4%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu13 : 82.7%us, 0.0%sy, 0.0%ni, 17.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu14 : 55.4%us, 0.0%sy, 0.0%ni, 43.4%id, 1.2%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu15 : 86.6%us, 0.0%sy, 0.0%ni, 13.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu16 : 61.0%us, 1.2%sy, 0.0%ni, 37.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu17 : 29.3%us, 1.2%sy, 0.0%ni, 69.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu18 : 18.8%us, 0.0%sy, 0.0%ni, 52.5%id, 28.8%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu19 : 14.3%us, 1.2%sy, 0.0%ni, 57.1%id, 27.4%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu20 : 12.3%us, 0.0%sy, 0.0%ni, 59.3%id, 28.4%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu21 : 10.7%us, 0.0%sy, 0.0%ni, 76.2%id, 11.9%wa, 0.0%hi, 1.2%si, 0.0%st
Cpu22 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu23 : 10.8%us, 2.4%sy, 0.0%ni, 71.1%id, 15.7%wa, 0.0%hi, 0.0%si, 0.0%st

Note that in case of “order by” we will need to manually sort the results, however, sorting 10-100 rows will be fast.

Conclusion

Splitting a complex report into multiple queries and running it in parallel (asynchronously) can increase performance (3x to 10x in the above example) and will better utilize modern hardware. It is also possible to split the queries between multiple MySQL servers (i.e. MySQL slave servers) to further increase scalability (will require more coding).

Database Execution (computing) MySQL

Published at DZone with permission of Peter Zaitsev, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • SQL Query Performance Tuning in MySQL
  • Copy SQL Execution Plan from One Database to Another in Oracle 19c
  • How to Repair Corrupt MySQL Database Tables Step-by-Step
  • Low-Level Optimizations in ClickHouse: Utilizing Branch Prediction and SIMD To Speed Up Query Execution

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: