Why You Should Change the Cost Threshold for Parallelism
Why You Should Change the Cost Threshold for Parallelism
The default Cost Threshold set in your SQL Server instance is old and likely not an appropriate value anymore. Here's why and how you should tweak it.
Join the DZone community and get the full member experience.Join For Free
New whitepaper: Database DevOps – 6 Tips for Achieving Continuous Delivery. Discover 6 tips for continuous delivery with Database DevOps in this new whitepaper from Redgate. In 9 pages, it covers version control for databases and configurations, branching and testing, automation, using NuGet packages, and advice for how to start a pioneering Database DevOps project. Also includes further research on the industry-wide state of Database DevOps, how application and database development compare, plus practical steps for bringing DevOps to your database. Read it now free.
I’ve written several times about the Cost Threshold for Parallelism and its relationship to your execution plans, how to determine your execution plan cost, and even how to decide what value to set your Cost Threshold to. What I haven’t explicitly addressed in extremely clear terms is why you should adjust your Cost Threshold for Parallelism. There are two reasons to modify this value.
Cost Threshold for Parallelism Default Value
The primary reason to change the Cost Threshold for Parallelism is because the default value is not a good choice for the vast majority of systems. The default value is 5. This means that when a query has an estimated cost greater than 5, it may get a parallel execution plan.
Microsoft set the default value for the Cost Threshold for Parallelism back in the 1990s. They were developing SQL Server 2000. That means this value was determined as a good starting point for query plan costs over 17 years ago. In case you’re not aware, technology, T-SQL, SQL Server, and all the databases and database objects within them shifted, just a little, in the intervening 17 years. We can argue whether or not this value made sense as a starting point (and remember, the default settings are meant to be starting points covering a majority of cases, not a final immutable value) for determining your Cost Threshold for Parallelism 17 years ago. I think we can agree that it’s no longer even a good starting point.
The way we write queries has fundamentally changed. The options available to us in both our queries and the underlying data structure has changed. These changes lead to more complex, and therefore higher estimated cost, execution plans. It’s much easier now to have a query that exceeds the value of 5, leading to parallel execution plans for queries that will not benefit from the resources needed to run the query in parallel.
You should change the default value to a higher number. To determine that number, see the blog post above for a good method.
Max Degree of Parallelism
I see this all the time. People write queries that exceed the Cost Threshold for Parallelism. These queries then get a parallel execution plan. Performance on these queries degrades radically. To fix this, people will generally do one of two things. The first option that I see frequently is to set the server Max Degree of Parallelism to 1. This will eliminate all parallel execution of queries. Problem solved?
Well no. Some queries, very complex queries against large data sets, will absolutely benefit from parallel execution. They won’t benefit a little. They’ll benefit a lot. Parallel execution can be the difference between a successful application and a failed, problematic application that you spend all sorts of time, money and effort attempting to fix. Yes, some of your queries should not be running in parallel, but the way to address that is to change the Cost Threshold for Parallelism so that those queries are no longer parallel. Turning off parallelism removes any possibility of benefit. I’m assuming your servers have multiple processors. Don’t you want to use them? Then let’s turn parallelism back on.
More Max Degree of Parallelism
The second thing that I see people do, and this is frankly more problematic than changing the server setting, is to put in a query hint to force the query to run with a MAXDOP of 1. Why is setting this value at the query level worse than setting it at the server level. You’d think that doing something at a more granular level would be preferred, targeted instead of broad brush and all that. Yes, that’s true. However, the problem we have here is that people tend to set and follow patterns. I know lots of organizations that require every query to have a MAXDOP hint. Even if you’re not that extreme, you’re going to set the hint to lots and lots of queries.
What happens when you decide/realize/discover that, oops, that was a bad idea? You now have to go back and edit every one of those queries. That’s going to be a pretty major undertaking in some organizations.
Instead of editing some or all of your queries, why not simply change the Cost Threshold for Parallelism so that all those queries that you want to run serial, do, but without editing the code.
Change the Cost Threshold for Parallelism
Parallel execution of queries is wonderful. In the case of getting batch mode out of a columnstore index, vital. So you want queries to execute in parallel. However, the default value for the Cost Threshold for Parallelism is too low, leading to queries that should not be parallel to execute that way. Help the performance of your system and change the Cost Threshold for Parallelism.
Published at DZone with permission of Grant Fritchey , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.