Containing Select Columns in Aggregate Functions to Improve Performance in MS SQL
Dramatically improve performance in MS SQL by adding select columns to avoid working with joins and aggregate functions in the same query.
Join the DZone community and get the full member experience.
Join For FreeRecently I have been working on some MS SQL statements when I noticed a trick that improves performance by almost double. And that most people don’t know.
When working with joins and aggregate functions in the same query, we frequently face this error when we don’t add the select columns to the group by clause, even though the query will return one row by logic
For example, the column "Customer.Fname" is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
select Cx.Fname,C.Name as ContactName,C.Email,C.Phone, O.OrderDate,Sum (L.Price) as TotalOrderPrice
from Customer Cx
join Orders O on Cx.Id = O.CXID
join OrderItem OI on OI.OrderID = O.Id
join LineItems L on OI.LineItemID = L.Id
left join Contacts C on O.ContactId = C.Id
where OrderDate between'1-1-2017' and '1-31-2017'
group by O.Id
The common solution is to add all the selected columns to the group by clause
select Cx.Fname,C.Name as ContactName,C.Email,C.Phone, O.OrderDate,Sum (L.Price) as TotalOrderPrice
from Customer Cx
join Orders O on Cx.Id = O.CXID
join OrderItem OI on OI.OrderID = O.Id
join LineItems L on OI.LineItemID = L.Id
left join Contacts C on O.ContactId = C.Id
where OrderDate between'1-1-2017' and '1-31-2017'
group by O.Id,Cx.Fname,C.Name,C.Email,C.Phone, O.OrderDate
This leads to the following execution plan:
As you can see, the execution plan includes a sort that consumes 38% of the overall cost.
This sort is caused be the aggregate function, even though logically the outcome would be one row per order.
A good solution for that would be to use aggregate functions on the select columns that you know will not have more than one value
For example:
select max(Cx.Fname),max(C.Name) as ContactName,max(C.Email),max(C.Phone), max(O.OrderDate),Sum (L.Price) as TotalOrderPrice
from Customer Cx
join Orders O on Cx.Id = O.CXID
join OrderItem OI on OI.OrderID = O.Id
join LineItems L on OI.LineItemID = L.Id
left join Contacts C on O.ContactId = C.Id
where OrderDate between'1-1-2017' and '1-31-2017'
group by O.Id
Now the execution plan will be like so:
As you can see, the sort step is gone.
Now, if you run the batch, you will see that the cost of the code goes from 62% to 38%.
Opinions expressed by DZone contributors are their own.
Comments