Just came across this very helpful article. The main highlights are at the bottom as follows…
Good candidates for clustered indexes are:
- Primary keys of the lookup/reference/dimension/master tables
- Foreign keys of the fact/detail tables
- Datetime fields of the tables queried by the date range
Optimization Rules of Thumb
- Always look at the query plan first. It will show you the optimal current execution plan from the query engine's point of view. Find the most expensive part of the execution plan and start optimizing from there. However, even before that, make sure that the statistics on all tables in your query are up to date, by running the update statistics <TableName> command on all tables in your query.
- If you see table scan, optimize. Table scan is the slowest possible way of execution. Table scan means not only that no index is used, but that there is no clustered index for this table at all. Even if you can only replace table scan with clustered index scan, it is still worth it.
- If you see clustered index scan, find out whether it can be replaced with index seek. For that, find what conditions are applied to this table. Usually, conditions exist for two or three fields of the table. Find out the most selective condition (that is, the condition that would produce the smallest number of records if applied alone), and see whether an index on this field exists. Any index that lists this field first will qualify. If there is no such index, create it and see whether the query engine picks it up.
- If the query engine is not picking up the existing index (that is, if it is still doing a clustered index scan), check the output list. It is possible that seek on your index is faster than clustered index scan, but involves bookmark lookup that makes the combined cost greater than use of a clustered index. Clustered index operations (scan or seek) never need bookmark lookup, since a clustered index already contains all the data. If the output list is not big, add those fields to the index, and see whether the query engine picks it up. Please remember that the combined size is more important than the number of fields. Adding three integer fields to the index is less expensive than adding one varchar field with an average data length of 20.
Summarizing this rule, try to make your index covering, and see whether it works better than clustered index scan. Please note that it is not always possible to make the query engine pick up your index automatically. A small table or a low-selectivity index will produce clustered index scan, even if your index is covering.
- If you see bookmark lookup, it means that your index is not covering. Try to make it covering if it makes sense (see the preceding guidelines).
- The execution plan selected by the query engine may be not the best one. The query engine makes certain assumptions about disk subsystem and CPU cost versus IO cost. These assumptions sometimes can be incorrect. If you don't believe that the query engine's selection is the best one, run a query in the loop for 10 to 15 minutes with automatic selection, change the query to use your index (you will have to use index hint to force it), and then run it for 10 to 15 minutes again. Compare the results to see which one works better.
- Avoid any operations on the fields, where possible. Some operations will prevent the use of the index on this field even if it exists—for example, the infamous ltrim(rtrim(FieldName)); other operations will degrade the performance. For example, instead of using the condition cast(DateField as varchar(20)) = @dateString, try to convert @dateString to an expression of datetime type first, and then compare it toDateField.
- Please note that the query engine cost estimate does not include the cost of embedded procedure or function calls. If you compare between plain join and select from table-value functions, the latter would seemto have smaller cost, but it usually does not. In such a situation, use your own metrics to find out which query performs better.
- When it is not possible to avoid operation on the field, use an index built on that expression. This can be done in two ways:
- Create a calculated field based on your expression.
- Create a view, and build an index on it.
Note SQL Server requires certain conditions to be met in order to allow the use of calculated fields and indexed views (set quoted_identifier on, set arithabort on, and so on).
- Indexed views are a good way to further speed up the query if you are not satisfied with the results. Indexed view is a clustered index built over the view's select list. You can also define additional indexes for the indexed view, just as you can for any regular table. Indexed views take disk space and involve some maintenance overhead (every time underlying tables change, the indexed view also has to change), but they usually provide a good boost in performance, even after all other optimization techniques are exhausted.
Special thanks to Andrei Volkov for many interesting discussions about the SQL Server internals.