DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
  1. DZone
  2. Data Engineering
  3. Databases
  4. An Overview of Adaptive Joins

An Overview of Adaptive Joins

An Adaptive Join helps database performance quite a lot. You can optimizing joins on the fly, with no code or structure changes needed.

Grant Fritchey user avatar by
Grant Fritchey
·
Feb. 28, 18 · Tutorial
Like (2)
Save
Tweet
Share
3.45K Views

Join the DZone community and get the full member experience.

Join For Free

I was surprised to find out that a lot of people hadn't heard about the new join type, Adaptive Join. So, I figured I could do a quick overview.

Adaptive Join Behavior

Currently, the Adaptive Join only works with columnstore indexes, but according to Microsoft, at some point, they will also work with rowstore. The concept is simple. For larger datasets, frequently (but not always... let's not try to cover every possible caveat — it depends, right?), a hash join is much faster than a loops join. For smaller datasets, frequently, a loops join is faster. Wouldn't it be nice if we could change the join type on the fly so that the most effective join is used depending on the data in the query? Ta-da! Enter: the Adaptive Join.

First, the statistics are used at compile time for the tables we're joining. Based on those statistics, a row target it set. Below that threshold, a loops join will be used. Above that threshold, a hash join. The way the row count is determined is that the operator will always build the hash table. With the hash table built and loaded, it will know how many rows it has. If it's going to do a loops join, the hash table is tossed and a loops join commences. If the threshold has been passed on the row counts and it's going to do a hash join, it already has the hash table built and proceeds to do a hash join. It's easy to understand. Let's see it in action.

Adaptive Join in Action

First, let's create a columnstore index inside AdventureWorks:

CREATE NONCLUSTERED COLUMNSTORE INDEX csTest
ON Production.TransactionHistory
(ProductID,
Quantity,
ActualCost);

With that in place, we can run a query that looks like this:

DECLARE @quantity INT; 

SET @quantity = 9726; 

SELECT p.NAME, 
       Count(th.productid) AS CountProductID, 
       Sum(th.quantity)    AS SumQuantity, 
       Avg(th.actualcost)  AS AvgActualCost 
FROM   production.transactionhistory AS th 
       JOIN production.product AS p 
         ON p.productid = th.productid 
WHERE  th.quantity = @quantity 
GROUP  BY th.productid, 
          p.NAME; 

SET @quantity = 1; 

SELECT p.NAME, 
       Count(th.productid) AS CountProductID, 
       Sum(th.quantity)    AS SumQuantity, 
       Avg(th.actualcost)  AS AvgActualCost 
FROM   production.transactionhistory AS th 
       JOIN production.product AS p 
         ON p.productid = th.productid 
WHERE  th.quantity = @quantity 
GROUP  BY th.productid, 
          p.NAME; 

What we have are two instances of the same query, just passing in two different values to get two different result sets. The first returns one row and the second returns 229 rows. Let's look at the estimated execution plan for the query:

Reading the plan in the logical processing order, the Adaptive Join operator is the first operator on the left after the SELECT operator (which, by the way, technically isn't an operator, but doesn't really have any other name, so...). The key to understanding how to read an Adaptive Join is to know that the two paths displayed below it represent, going down, the hash join path and the loops join path. In this case, then, the hash join path would have an Index Scan and a Filter operator while the loops path would have just a Clustered Index Seek.

To see the threshold, we just have to go to the properties of the Adaptive Join operator:

In this instance, then, the threshold is 11.4837 rows. Basically, 11 rows or less, you'll get a loops join (and the lower path), 12 rows or greater and you get a hash join (and the upper path).

Looking at an estimated plan, you can't see which path it took. You have to look at the actual plan. So, if we capture the plans for the query above, they look like this:

Just looking at the graphical plan, you only have one clue as to which path may have been taken. In the second lower plan, you can see that the pipes to the top input of the Adaptive Join operator are much thicker than they are in the first plan. That indicates data being moved, so you can assume that was the path. However, to know for sure, you have to look at the properties to see the actual number of rows being processed by the operators. If it's zero for one of the paths, then the other path was chosen. You can't get a situation where there will be values in both paths. It will always use a single path.

Conclusion

There are quite a bit more details to the implantation, but you can now see the basics. An Adaptive Join helps performance quite a lot. The nature of having to build the hash table before the path is picked does mean that the loops join path won't be as fast as a natural loops join, but who cares? We're optimizing joins on the fly now, with no code or structure changes needed. That's pretty cool and very handy.

Joins (concurrency library) Database

Published at DZone with permission of Grant Fritchey, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Test Execution Tutorial: A Comprehensive Guide With Examples and Best Practices
  • Low-Code Development: The Future of Software Development
  • How To Set Up and Run Cypress Test Cases in CI/CD TeamCity
  • 5 Steps for Getting Started in Deep Learning

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: