When you send a query to your SQL Server database (and this applies to Azure SQL Database, APS, and Azure SQL Data Warehouse), that query is going to go through a process known as query optimization. The query optimization process figures out if you can use indexes to assist the query, whether or not it can seek against those indexes or has to use a scan, and a whole bunch of other stuff. The primary driving force in making these decisions are the statistics available on the indexes and on your tables.
What Are Statistics
Statistics are a mathematical construct to represent the data in your tables. Instead of scanning through the data each and every time to determine how many possible rows are going to come back for the query you provided, SQL Server uses statistics, which are gathered and calculated automatically by default, to determine the likely number of rows. I cover the details behind statistics in this article on Simple-Talk. The important thing to remember is that statistics are what drives the optimizer. Without statistics, you would only ever see table scans and performance would be horrible most of the time. Statistics show how many rows are likely to be returned by a given value. Statistics are created on indexes automatically. They are also created on columns that do not have indexes, by default, when that column is referenced in a WHERE, ON or HAVING clause of a query.
It can’t be over-emphasized that you must have statistics for the optimizer to make good choices. You can find out more about statistics in this other article on Simple-Talk.
Automatic Settings for Statistics
Statistics are so important that SQL Server will automatically create them for you. There is a setting that allows you to partially change this behavior. AUTO_CREATE_STATISTICS is enabled by default on each database. You can disable it using an ALTER DATABASE command. What will then happen is that the only statistics you’ll have in your system are those that are created for indexes. For the overwhelming vast majority of databases out there, turning off the automatic creation of statistics is a very bad idea. You’re taking away the ability of the optimizer to determine the likely rows being returned by a query. That means the optimizer is likely to make poor choices.
Statistics are also automatically maintained by SQL Server. You can read about the automatic processes in this documentation at MSDN. There are a number of modifications you can make to this automated behavior, and you can turn off automatic statistic maintenance by changing the AUTO_UPDATE_STATISTICS setting on a database. Just like the automatic creation of statistics, most systems are benefitting from the automatic maintenance of statistics. Don’t turn this setting off unless you are ready to take direct control over your statistics. Interestingly enough though, the automatic statistics maintenance may not be enough for many systems. You might need to augment statistics maintenance.
Maintaining Your Statistics
For most systems, start with leaving the AUTO_UPDATE_STATISTICS setting to on. Then, as you find you might need to manually update statistics, you can do one of two things. An easy way to update your statistics is to run sp_updatestats. This command will update the statistics across an entire database. You can somewhat control how it behaves. Read more about it here on MSDN. If you want to take direct control over how any given set of statistics are maintained, you use UPDATE STATISTICS. This gives the maximum amount of control.
In addition to writing your own code to maintain statistics, you might look to third party choices. One very common option is to use the scripts provided by Ola Hollengren. Not only do they manage and maintain statistics, but they also help with index fragmentation. A newer and much more interesting choice is to look to Minion Reindex by the Midnight DBA team. This tool offers quite a lot of control and functionality that is not present in Ola’s scripts. I wrote a review of an earlier version of the tool here on SQL Server Central.
This blog post just skims the surface of statistics. For lots more detail, please follow and read the links mentioned above. You must ensure that your SQL Server databases have statistics. You must also ensure that your SQL Server databases maintain their statistics. Statistics are a vital part of query performance that ignoring will hurt.