DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
  1. DZone
  2. Data Engineering
  3. Databases
  4. Containing Select Columns in Aggregate Functions to Improve Performance in MS SQL

Containing Select Columns in Aggregate Functions to Improve Performance in MS SQL

Dramatically improve performance in MS SQL by adding select columns to avoid working with joins and aggregate functions in the same query.

Noura Elarabi user avatar by
Noura Elarabi
·
Oct. 01, 18 · Tutorial
Like (2)
Save
Tweet
Share
8.37K Views

Join the DZone community and get the full member experience.

Join For Free

Recently I have been working on some MS SQL statements when I noticed a trick that improves performance by almost double. And that most people don’t know.

When working with joins and aggregate functions in the same query, we frequently face this error when we don’t add the select columns to the group by clause, even though the query will return one row by logic

For example, the column "Customer.Fname" is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

select Cx.Fname,C.Name as ContactName,C.Email,C.Phone, O.OrderDate,Sum (L.Price) as TotalOrderPrice
from Customer Cx
join Orders O on Cx.Id = O.CXID
join OrderItem OI on OI.OrderID = O.Id
join LineItems L on OI.LineItemID = L.Id
left join Contacts C on O.ContactId = C.Id
where OrderDate between'1-1-2017' and '1-31-2017'
group by O.Id

The common solution is to add all the selected columns to the group by clause

select Cx.Fname,C.Name as ContactName,C.Email,C.Phone, O.OrderDate,Sum (L.Price) as TotalOrderPrice
from Customer Cx
join Orders O on Cx.Id = O.CXID
join OrderItem OI on OI.OrderID = O.Id
join LineItems L on OI.LineItemID = L.Id
left join Contacts C on O.ContactId = C.Id
where OrderDate between'1-1-2017' and '1-31-2017'
group by O.Id,Cx.Fname,C.Name,C.Email,C.Phone, O.OrderDate

This leads to the following execution plan:

Image title

As you can see, the execution plan includes a sort that consumes 38% of the overall cost.

This sort is caused be the aggregate function, even though logically the outcome would be one row per order.

A good solution for that would be to use aggregate functions on the select columns that you know will not have more than one value

For example:

select max(Cx.Fname),max(C.Name) as ContactName,max(C.Email),max(C.Phone), max(O.OrderDate),Sum (L.Price) as TotalOrderPrice
from Customer Cx
join Orders O on Cx.Id = O.CXID
join OrderItem OI on OI.OrderID = O.Id
join LineItems L on OI.LineItemID = L.Id
left join Contacts C on O.ContactId = C.Id
where OrderDate between'1-1-2017' and '1-31-2017'
group by O.Id

Now the execution plan will be like so:

Image title

As you can see, the sort step is gone.

Now, if you run the batch, you will see that the cost of the code goes from 62% to 38%.

Microsoft SQL Server Database sql

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Why Open Source Is Much More Than Just a Free Tier
  • SAST: How Code Analysis Tools Look for Security Flaws
  • Memory Debugging: A Deep Level of Insight
  • Microservices Discovery With Eureka

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: