Bad Query Performance Tips Rebutted
There is a litany of query performance tips that seem to make their rounds with query developers, but are all of them good?
Join the DZone community and get the full member experience.Join For Free
Once upon a time, someone somewhere wrote a list of bad query performance tips and they've been copied all over the internet, over and over. Even worse, sometimes the tips are copied or edited incorrectly, making a bad suggestion even worse. Can we please stop copying this list? I've tried to poke holes in this list before, one point at a time. This time, I'm just taking it all on.
Follow This Advice
I'm going to list all the bad query performance tips and I'll explain where they're good and where they're bad. However, the single most important thing I can tell you is, test your own system. Some of these tips have some merit situationally. You'll need to see if you're in the situation. So please, don't just trust what I'm saying. Validate it in your environment.
Do Not Follow This Advice
Index the predicates in JOIN, WHERE, ORDER BY and GROUP BY clauses.
What about the HAVING clause? Does the column order matter? Should we put a single column or multi-column index? INCLUDE statements? What kind of index, clustered, non-clustered, columnstore, XML, spatial? This piece of the advice is benign but so non-specific it's almost useless. Let me summarize: Indexes can be good.
Do not use sp_* naming convention
So, this one is true because it will add a VERY small amount of overhead as SQL Server searches the master database first for your object. However, for most of us, most of the time, this is so far down the list of worries about our database as to effectively vanish from sight.
Explore the tables used in the SELECT statement, so that filters can be applied in the WHERE clause. When there are queries with sub-selects, try to apply filters in the inner statement of the sub-selects.
I'm actually a little bit at a loss what this one is attempting to tell you. Filters are good? They are.
Only the columns which are selected will be read while using a column-oriented database management system.
You sure about that? Columns used in the WHERE clause but not in the SELECT list are not "read" also? I think they're going for "only move the data you need to move and only when you need to move it". That actually makes sense. Also, "column-oriented"? Are we talking columnstore? I don't think so, which means we're talking about row store tables, not ones that are "column-oriented".
Remove the JOINS from the unnecessary tables, which will help to reduce the database processing time.
Uhm... no. Back to only move, etc., only JOIN what you need to when you need to. However, just dropping JOINs because the tables are unnecessary. We need to define, unnecessary for what? I mean, don't join to tables that you never reference anywhere in the query if those tables are not a necessary part of the structure needed to return a valid result set, yes. Clarity is key and this tip is not clear.
In the SELECT statement, avoid writing *.
Honestly, I don't care that much anymore. I'm still on "only move, etc." but if you need to move everything, want to know the easiest way? SELECT *. Yes, there are ramifications, but I think lots of other things affect performance a lot more than an overuse of SELECT *.
If possible, avoid using nchar and nvarchar, as both the data types take double memory as that of char and varchar
Nope. Use the right data type for the data. Period. If you need Unicode, use these data types. Store the data correctly in all cases. Don't substitute VARCHAR for DATETIME and don't try to dodge NVARCHAR when you need it.
Try to avoid NULL in a fixed-length field.
Nope. I mean, fine, if you're going for the "never have a NULL field" proposition, great. Otherwise, store NULL values where you need to.
Avoid the HAVING clause, as this clause is required only if the user needs to filter the result of aggregations.
Uhmmmm... so... if I need a HAVING clause, use it, but if I don't need it, don't use it. Right. And don't use GROUP BY when you're not aggregating and don't use a CTE definition when you're not defining a CTE and... this is silly advice. If you want to filter results in aggregation, you know what you're going to apply? HAVING.
Create clustered and non-clustered Indexes and keep clustered index small. Sometimes, it is also used in a non-clustered index.
Sometimes? Let's say, every time. Every. Single. Time. If there is a clustered index, it's key is used as the row lookup in the nonclustered index. Always. And, we're back to "indexes can be useful".
It is a smart approach to create indexes columns which have integer values instead of characters. As integer values takes less space than character values.
So, true as far as it goes. But, seriously, what's the point. What if I want to search on the LastName column. Should I create an index on some other column? No, right?
Use WHERE clause to limit the size of the resultant tables that were created with joins.
Filters are good.
Select the appropriate Data Type.
Yes, 1000 times YES! And, of course, goes against the earlier tip.
To improve query performance in SQL server, use TABLOCKX while inserting into the table and use TABLOCK while merging.
Uhm....well, sort of I guess. As long as you don't mind lots and lots of blocking while you hold the ENTIRE TABLE to insert 1 row (or however many rows you're inserting). That row insert will be faster. The rest of the planet can wait because... performance? Understand, this hint is taking an exclusive lock on the entire table. That means not only will this prevent someone else from inserting/update/deleting rows while you're holding the lock, but you can't take out shared locks either, which means no reads. Generally, let SQL Server manage your locking. Don't force control over it through table hints. This represents one of the truly bad query performance tips in this list. This is not bad advice. It's dangerous advice.
Try to use SET NOCOUNT ON and TRY- CATCH, which will help to avoid the deadlock condition. Also, try to use WITH(NOLOCK) while querying the data from any given table to improve query performance in SQL server.
These are three wildly different tips. So, SET NOCOUNT ON does help a little. It removes some round trip information. That's fine. What the heck does that have to do with TRY/CATCH?
Which brings us to, will TRY/CATCH avoid deadlocks (whether or not it's somehow in combination with NOCOUNT which, I just don't want to think about any more)? No. TRY/CATCH does offer you the opportunity to catch the error caused by a deadlock and retry the query to hopefully avoid the deadlock, on the second execution of the query. However, that is not "avoid the deadlock condition". Not at all. You are still getting a deadlock, you can just prevent the "You've been chosen as a victim" message from going back to your users (no one likes to be called a victim).
NOLOCK. Can this improve performance? It can. Should you be using this on all your code? No, no, no, no. OK, let's say that you literally don't care that you're getting duplicate and missing data because of the lack of locks (and yes, that's what happens). You should use READUNCOMMITTED on the connection instead of NOLOCK in the code. You won't then need to modify all your code. Further, when you finally realize that you've hurt yourself, you only have a single change to make to fix the issue.
For a perfect reflection on the problems caused by NOLOCK, watch this video.
If table variable is used in place of temp table, it makes a great difference to improve SQL performance.
Nope, nope, nope. Table variables use tempdb, just like temporary tables. Further, table variables have no statistics on them. Which is nice if you're seeing lots of recompiles caused by DDL on the temporary tables, but horrifying if you're searching, filtering, joining on those temporary tables. The lack of statistics could seriously, NEGATIVELY, impact performance. So there is no simple, swap these out and you get better performance, tip here. It's just flat out wrong.
If UNION ALL is used in place of UNION, this improves the speed and thereby improves the SQL performance.
It sure does improve performance. It also changes the data returned by the query. I mean, if we can improve performance by just changing the data returned, the fastest queries are those that return no data at all. So you could certainly improve performance by truncating all the tables. Is that a viable tuning method? No? Neither is this.
In case of complex and frequently used queries, use Stored Procedure for rapid data retrieval.
I like stored procedures. They give us the ability to tune queries independently of application code. Procedures also, when written correctly, allow us to avoid SQL Injection. They promote a higher performing cache. Stored procedures allow for the use of parameters and these can be used to get better execution plans (although, a parameterized query works the same way as a procedure) through parameter sniffing (which, can also go bad).
However, there is literally nothing else that makes a query running inside of a stored procedure different than any other query. Complex or simple, once we're not talking about parameters and cache reuse, a query is a query. The optimizer will treat it the same. There's nothing really magical about stored procedures except the things I listed.
Multi-statement Table-Valued Functions (TVFs) are expensive compared to inline TVFs. Hence, avoid using Multi-statement TVFs.
OK. Here the tip is insufficiently fear-inducing. In fact, Multi-Statement, Table-Valued, User-Defined Functions are all things EVIL when it comes to performance. They're heinous. They should not be avoided. Instead, they should be set on fire. Lots of fire. Then scatter the ashes and sow the ground with salt. In short, I agree with this tip but would go a lot farther.
I get it. You want 10 or 15 items that you can go through, make the proposed changes, and suddenly, your 5tb database running inside your 8gb server with one core and a single disk of spinning rust will run so much faster. Unfortunately, it just doesn't work that way.
I wrote a 1000 page book on query tuning. The final chapter is a checklist, but it's 26 pages long. Why? Because you need to be very careful and clear in making performance tuning suggestions. There are caveats and gotchas and situational problems that must be explained. Probably the shortest tips in the chapter are, "Explicitly define the owner of an object," but even that links back to a several page explanation earlier in the book for exactly why this could be a problem.
Some of the tips listed above are quite benign and even accurate (if a little vague). A few of the tips are somewhat dangerous or silly. Some of the tips are downright stupid and need to be avoided.
I wish there was a way to get rid of this list that people keep copying, but it doesn't seem to be going away. Maybe having this rebuttal will help prevent people from using these bad query performance tips.
Published at DZone with permission of Grant Fritchey, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.