DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
The Latest "Software Integration: The Intersection of APIs, Microservices, and Cloud-Based Systems" Trend Report
Get the report
  1. DZone
  2. Data Engineering
  3. Data
  4. Speed Up GROUP BY Queries With Subselects in MySQL

Speed Up GROUP BY Queries With Subselects in MySQL

OFten we try to avoid subselects since they force the use of a temporary table. Learn to speed up GROUP BY queries in MySQL using subselects in this tutorial.

Peter Zaitsev user avatar by
Peter Zaitsev
·
Jun. 17, 15 · Interview
Like (0)
Save
Tweet
Share
5.58K Views

Join the DZone community and get the full member experience.

Join For Free

[This article was written by David Ducos]

We usually try to avoid subselects because sometimes they force the use of a temporary table and limits the use of indexes. But, when is good to use a subselect?

This example was tested over table a (1310723 rows), b, c and d ( 5 rows each) and with MySQL version 5.5 and 5.6.

Let’s suppose we have a query like this:

select a.name,sum(a.count) aSum,avg(a.position) aAVG,b.col1,c.col2,d.col3
from
a join
b on (a.bid = b.id) join
c on (a.cid = c.id) join
d on (a.did = d.id)
group by a.name,b.id,c.id,d.id

What will MySQL do? First it will take the entire data set – this means that will go through each row scanning the value of  “bid,” “cid” and “did” and then apply the join to each table. At this point it has the complete data set and then it will start to cluster it, executing the sum and the average functions.

Let’s analyze it step by step:

  1. Scan each row of  table a which has 1310720 rows.
  2. Join each row of table a with b, c and d – this means that each of the 1310720 rows will be joined, making the temporary table bigger.
  3. Execute the group by which will scan again the 1310720 rows and creating the result data set.

What can we do to optimize this query? We can’t avoid the group by over the 1.3M rows, but we are able to avoid the join over 1.3M of rows. How? We need all of the information from table a for the “group by” but we don’t need to execute all the joins before clustering them. Let’s rewrite the query:

select a.name,aSum,aAVG,b.col1,c.col2,d.col3
from
( select name,sum(count) aSum ,avg(position) aAVG,bid,cid,did
  from a
  group by name,bid,cid,did) a join
b on (a.bid = b.id) join
c on (a.cid = c.id) join
d on (a.did = d.id)

We see from the above query that we are doing the “group by” only over table a, the result data set of that subquery is just 20 rows. But what about the query response time? The first query took 2.3 sec avg and the optimized query took 1.8 sec average, half a second faster.

What about adding a covering index? The index that we can add will be:

alter table a add index (name,bid,cid,did,count,position);

The explain plan of both queries shows that it is using just the index to resolve the query.

Now, the response time of the original query is 1.9 sec which is near the time of the optimized query. However, the response time of the optimized query now is 0.7 sec, nearly 3x faster. The cons of adding this index is that we are indexing the whole table and it shows that the index length is near 80% of the data length.

If the original query had “where” conditions, it will depend over which field. Let’s suppose add c.col2=3:
select a.name,sum(a.count) aSum,avg(a.position) aAVG,b.col1,c.col2,d.col3
from
a join
b on (a.bid = b.id) join
c on (a.cid = c.id) join
d on (a.did = d.id)
where c.col2=3
group by a.name,b.id,c.id,d.id




Now, in the new query, the subquery will change. Table c and the “where” clause must be added to the subquery:
select a.name,aSum,aAVG,b.col1,a.col2,d.col3
from
( select a.name,sum(count) aSum ,avg(position) aAVG,bid,cid,did,c.col2
 from a join
 c on (a.cid = c.id)
 where c.col2=3
 group by name,bid,cid,did) a join
b on (a.bid = b.id) join
d on (a.did = d.id)

But the differences in times are not as big (original query 1.1 sec and new query 0.9). Why? because the original query will have less data to group by. Adding c.col2=3 to the original query, the amount of data to group by is reduced from 1.3M to 262k. Indeed, if you add more “where” conditions on different tables, the dataset to sort will be smaller and the speed-up will decrease.

Conclusion: We usually add the GROUP BY at the end of queries, and that is ok because the syntax forces us to do it. However we can use a subquery to group only the data that we need and then perform the joins over other tables. This could speed up some of our GROUP BY queries.

Database MySQL Data set Data (computing)

Published at DZone with permission of Peter Zaitsev, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Spring Boot, Quarkus, or Micronaut?
  • Introduction to Containerization
  • mTLS Everywere
  • Stop Using Spring Profiles Per Environment

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: