Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Containing Select Columns in Aggregate Functions to Improve Performance in MS SQL

DZone's Guide to

Containing Select Columns in Aggregate Functions to Improve Performance in MS SQL

Dramatically improve performance in MS SQL by adding select columns to avoid working with joins and aggregate functions in the same query.

· Performance Zone ·
Free Resource

Recently I have been working on some MS SQL statements when I noticed a trick that improves performance by almost double. And that most people don’t know.

When working with joins and aggregate functions in the same query, we frequently face this error when we don’t add the select columns to the group by clause, even though the query will return one row by logic

For example, the column "Customer.Fname" is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

select Cx.Fname,C.Name as ContactName,C.Email,C.Phone, O.OrderDate,Sum (L.Price) as TotalOrderPrice
from Customer Cx
join Orders O on Cx.Id = O.CXID
join OrderItem OI on OI.OrderID = O.Id
join LineItems L on OI.LineItemID = L.Id
left join Contacts C on O.ContactId = C.Id
where OrderDate between'1-1-2017' and '1-31-2017'
group by O.Id

The common solution is to add all the selected columns to the group by clause

select Cx.Fname,C.Name as ContactName,C.Email,C.Phone, O.OrderDate,Sum (L.Price) as TotalOrderPrice
from Customer Cx
join Orders O on Cx.Id = O.CXID
join OrderItem OI on OI.OrderID = O.Id
join LineItems L on OI.LineItemID = L.Id
left join Contacts C on O.ContactId = C.Id
where OrderDate between'1-1-2017' and '1-31-2017'
group by O.Id,Cx.Fname,C.Name,C.Email,C.Phone, O.OrderDate

This leads to the following execution plan:

Image title

As you can see, the execution plan includes a sort that consumes 38% of the overall cost.

This sort is caused be the aggregate function, even though logically the outcome would be one row per order.

A good solution for that would be to use aggregate functions on the select columns that you know will not have more than one value

For example:

select max(Cx.Fname),max(C.Name) as ContactName,max(C.Email),max(C.Phone), max(O.OrderDate),Sum (L.Price) as TotalOrderPrice
from Customer Cx
join Orders O on Cx.Id = O.CXID
join OrderItem OI on OI.OrderID = O.Id
join LineItems L on OI.LineItemID = L.Id
left join Contacts C on O.ContactId = C.Id
where OrderDate between'1-1-2017' and '1-31-2017'
group by O.Id

Now the execution plan will be like so:

Image title

As you can see, the sort step is gone.

Now, if you run the batch, you will see that the cost of the code goes from 62% to 38%.

Topics:
ms sql ,performance ,tutorial ,database ,sql ,performance optimization

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}