DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

How does AI transform chaos engineering from an experiment into a critical capability? Learn how to effectively operationalize the chaos.

Data quality isn't just a technical issue: It impacts an organization's compliance, operational efficiency, and customer satisfaction.

Are you a front-end or full-stack developer frustrated by front-end distractions? Learn to move forward with tooling and clear boundaries.

Developer Experience: Demand to support engineering teams has risen, and there is a shift from traditional DevOps to workflow improvements.

Related

  • Monitoring and Managing the Growth of the MSDB System Database in SQL Server
  • Master SQL Performance Optimization: Step-by-Step Techniques With Case Studies
  • Resolving Parameter Sensitivity With Parameter Sensitive Plan Optimization in SQL Server 2022
  • Useful System Table Queries in Relational Databases

Trending

  • Data Storage and Indexing in PostgreSQL: Practical Guide With Examples and Performance Insights
  • Software Specs 2.0: An Elaborate Example
  • Serverless IAM: Implementing IAM in Serverless Architectures with Lessons from the Security Trenches
  • Exploring Reactive and Proactive Observability in the Modern Monitoring Landscape
  1. DZone
  2. Data Engineering
  3. Databases
  4. CASE Statement in GROUP BY

CASE Statement in GROUP BY

Sometimes tackling a problem in a single query isn't the best approach. We have a look at a scenario where a CASE statement is needed in a GROUP BY clause. What's the best way to handle this?

By 
Grant Fritchey user avatar
Grant Fritchey
·
Updated Jun. 29, 16 · Tutorial
Likes (2)
Comment
Save
Tweet
Share
21.6K Views

Join the DZone community and get the full member experience.

Join For Free

set-based operations means you should put everything into a single statement, right?

well, not really. people seem to think that having two queries is really bad, so when faced with logical gaps, they just cram them into the query they have. this is partly because sql server and t-sql supports letting you do this, and it's partly because it looks like a logical extension of code reuse to arrive at a query structure that supports multiple logic chains. however, let's explore what happens when you do this in a particular situation: a case statement in a group by clause.

you see this a lot because a given set of data may be needed in slightly different context by different groups within the company. like many of my example queries, this could be better written. like many of my example queries, it mirrors what i see in the wild (and for those following along at home, i'm using the wideworldimporters database for tests now):

create procedure dbo.invoicegrouping (@x int)
as
select  sum(il.unitprice),
        count(i.contactpersonid),
        count(i.accountspersonid),
        count(i.salespersonpersonid)
from    sales.invoices as i
join    sales.invoicelines as il
        on il.invoiceid = i.invoiceid
group by case when @x = 7 then i.contactpersonid
              when @x = 15 then i.accountspersonid
              else i.salespersonpersonid
         end;
go


running this for any given value above, 7, 15, or other, you'll get the same execution plan, regardless of the column used in the group by. however, parameter sniffing is still something of a factor. when you group this data by salespersonid, you only get 10 rows back. this will be shown as the estimated number of rows returned if some value other than 7 or 15 is used as a parameter. however, this is always the plan:

coreplan

you can click on that to expand it into something readable. we can eliminate the parameter sniffing from the equation if we want to by modifying the query thus:

create procedure dbo.invoicegrouping_nosniff (@x int)
as
declare @x2 int;
set @x2 = @x;

select  sum(il.unitprice),
        count(i.contactpersonid),
        count(i.accountspersonid),
        count(i.salespersonpersonid)
from    sales.invoices as i
join    sales.invoicelines as il
        on il.invoiceid = i.invoiceid
group by case when @x2 = 7 then i.contactpersonid
              when @x2 = 15 then i.accountspersonid
              else i.salespersonpersonid
         end;
go


however, except for some deviation on the estimated rows (since it's averaging the rows returned), the execution plan is the same.

what's the big deal right? well, let's break down the code into three different procedures:

create procedure dbo.invoicegrouping_contact
as
select  sum(il.unitprice),
        count(i.contactpersonid),
        count(i.accountspersonid),
        count(i.salespersonpersonid)
from    sales.invoices as i
join    sales.invoicelines as il
        on il.invoiceid = i.invoiceid
group by i.contactpersonid;
go

create procedure dbo.invoicegrouping_sales
as
select  sum(il.unitprice),
        count(i.contactpersonid),
        count(i.accountspersonid),
        count(i.salespersonpersonid)
from    sales.invoices as i
join    sales.invoicelines as il
        on il.invoiceid = i.invoiceid
group by i.salespersonpersonid;
go

create procedure dbo.invoicegrouping_account
as
select  sum(il.unitprice),
        count(i.contactpersonid),
        count(i.accountspersonid),
        count(i.salespersonpersonid)
from    sales.invoices as i
join    sales.invoicelines as il
        on il.invoiceid = i.invoiceid
group by i.accountspersonid;
go


interestingly enough, these three queries produce a nearly identical execution plan. the one big difference is the compute scalar operator that is used to generate a value for the hash match aggregate is no longer in the query:

specificplan

the same basic set of structures, scans against both tables, to arrive at the data. cost estimates between the two plans are very different though, with the targeted queries having a much lower estimated cost.

performance-wise, interestingly enough, the average execution time of the first query, only returning the 10 rows, is 157ms on average, while the query grouping directly on the salespersonid averages about 190ms. now, the reads tell a slightly different story with 17428 on the generic query and 5721 on the specific query. so, maybe a server under load will see a significant performance increase. however, let's deal with what we have in front of us and say that, at least for these tests, the catch-all group by query performs well.

now let's change the paradigm slightly. let's add an index:

create index testinggroupby on sales.invoices (salespersonpersonid);


frankly, this isn't a very useful index. however, after adding it, the execution plan for the invoicegrouping_sales query changes. instead of scanning the table, it's now scanning the index. despite recompiles and attempts to force it using hints, the original invoicegrouping query will not use this index. duration of the invoicegrouping_sales query drops to 140ms on average and the reads drop a little further to 5021. getting an 11% increase on performance is a win.

this is a pretty simplified example, however. making the case statement more complex won't improve performance or further assist the optimizer to make good choices. instead of trying to cram multiple different logical groupings into a single query, a better approach would be to create the three new procedures that i did above, and make the original invoicegrouping procedure into a wrapping procedure that chooses which of the individual procedures to call. this way, if you do add indexes in support of each of the different possible groupings, you would realize a positive outcome in your performance.

want to talk more about execution plans and query tuning? in august, i'll be doing an all day pre-con at sqlserver geeks annual summit in bangalore india.

i'm also going to be in oslo norway for a pre-con before sql saturday oslo in september.

Database sql

Published at DZone with permission of Grant Fritchey, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Monitoring and Managing the Growth of the MSDB System Database in SQL Server
  • Master SQL Performance Optimization: Step-by-Step Techniques With Case Studies
  • Resolving Parameter Sensitivity With Parameter Sensitive Plan Optimization in SQL Server 2022
  • Useful System Table Queries in Relational Databases

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • [email protected]

Let's be friends: