DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
  1. DZone
  2. Data Engineering
  3. Databases
  4. Adaptive Joins and Join Hints

Adaptive Joins and Join Hints

Let's take a look at how join hints affect adaptive joins in SQL Server. Also explore the differences caused by a join hint.

Grant Fritchey user avatar by
Grant Fritchey
·
Oct. 02, 18 · Tutorial
Like (2)
Save
Tweet
Share
4.92K Views

Join the DZone community and get the full member experience.

Join For Free

At a recent all-day seminar on query performance tuning I was asked a question that I didn't know the answer to: "How do join hints affect adaptive joins?"

I don't know. Let's find out together.

Adaptive Joins

Here's a query that we can run against AdventureWorks:

SELECT p.Name,
COUNT(th.ProductID) AS CountProductID,
SUM(th.Quantity) AS SumQuantity,
AVG(th.ActualCost) AS AvgActualCost
FROM Production.TransactionHistory AS th
JOIN Production.Product AS p
ON p.ProductID = th.ProductID
GROUP BY th.ProductID,
p.Name;

Without a columnstore index in SQL Server 2017, the execution plan looks like this:

Let's introduce a columnstore index:

CREATE NONCLUSTERED COLUMNSTORE INDEX ix_csTest
ON Production.TransactionHistory
(
ProductID,
Quantity,
ActualCost
);

Now, if we run the same query, the execution plan changes to use an adaptive join like this:

You can read more on adaptive joins here if this is new behavior. Now, what if we decide that we just want to see a hash join all the time?

Adding a Join Hint

I'm not a giant fan of any of the query hints. Yeah, I use them. Heck, the OPTIMIZE FOR hint I advocate for (depending on the situation, etc.), but even that one makes me feel a little uncomfortable when I use it. However, the hints exist for a reason and there are valid choices in their use. Since an adaptive join may not perform as well as a hash join, we might see a situation where the hash join is superior. This means we decide to use the hint like this:

SELECT p.Name,
COUNT(th.ProductID) AS CountProductID,
SUM(th.Quantity) AS SumQuantity,
AVG(th.ActualCost) AS AvgActualCost
FROM Production.TransactionHistory AS th
INNER HASH JOIN Production.Product AS p
ON p.ProductID = th.ProductID
GROUP BY th.ProductID,
p.Name;

Capturing the execution plan, it now looks like this:

It's now using a hash join as we told it to. However, notice the plan shape. More importantly, notice the size of the pipes representing data flow. Before I explain what's going on, what about performance? On average the adaptive join ran in 70ms with 70 reads. On average the hinted query ran in 92 with 70 reads. What's going on?

Differences Caused by a Join Hint

The key thing to remember is that a hint is very bad name. They are not hints. They are commandments. It's not a suggestion or an idea that we're imparting to the optimizer, "Please consider using a hash join." The optimizer considered it, and rejected it in favor of the adaptive join. Why?

As always, the devil is in the details, or in this case, the properties. If we look at the Columnstore Index Scan from the plan with the Adaptive Join, we see a couple of interesting data points:

I've highlighted the interesting bit. "Actual Number of Locally Aggregated Rows" is part of aggregation push down, explained by the amazing Niko Negebauer here and here. Basically, the aggregation is occurring with the data access. So while there is a Hash Match operator for the aggregation, actually, the active part of the aggregation was performed within the columnstore. That's why the Actual Number of Rows coming out of the columnstore index itself is 0, but the number of rows coming out of the Hash Match Aggregate is 441.

So...why not another aggregate push down when we used the hint? Because the hint says, we MUST use a hash join. At that point, the optimizer has no choices on where, when, how it does data processing. It must, first, ensure that a hash join is used, so it does. First thing out of the gate, hash join. Then a hash aggregate. This difference in behavior results in a 24% decrease in performance. The only interesting thing is that the reads remained consistent. This means that it was just the processing of the join that added overhead.

Conclusion

There's nothing really all that fancy about an adaptive join if you decide to apply a join hint. The adaptive join doesn't take control away from the optimizer. You can choose how you want the query processed. However, that doesn't automatically mean you're going to be happy with the results.

Want to learn more about how to use tools like execution plans to do query tuning? I've got one last class scheduled this year.

Please join me in Munich on October 26, 2018. Go here now to join the class.

Joins (concurrency library) Database

Published at DZone with permission of Grant Fritchey, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Key Considerations When Implementing Virtual Kubernetes Clusters
  • Distributed SQL: An Alternative to Database Sharding
  • How To Use Terraform to Provision an AWS EC2 Instance
  • Understanding gRPC Concepts, Use Cases, and Best Practices

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: