SQL Server 2005 collects statistical information about indexes and column data stored in the database. These statistics are used by the SQL Server query optimizer to choose the most efficient plan for retrieving or updating data. Good statistics allow the optimizer to accurately assess the cost of different query plans, and choose a high-quality plan. All information about a single statistics object is stored in several columns of a single row in the sysindexes table, and in a statistics binary large object (statblob) kept in an internal-only table. In addition, information about statistics can be found in the metadata views sys.stats and sys.indexes
Statistics Used by the Query Optimizer in Microsoft SQL Server 2005
SQL Server 2005 has the ability to automatically create and update statistics. This feature is on by default in SQL Server 2005 and SQL Server 2000. This allows the SQL Server 2005 query optimizer to produce good query plans consistently, while keeping development and administration costs low. If you need more control over statistics creation and update to get good query plans and manage the overhead of gathering statistics, you can use manual statistics creation and update capabilities.
SQL Server 2005 statistics features allow you to:
SQL Server Management Studio allows you to graphically browse and control statistics objects within its Object Explorer view. Statistics are listed in Object Explorer in a folder underneath each table object.
SQL Server 2005 maintains the following information at the table level.
SQL Server 2005 collects the following statistics about table columns and stores them in a statistics object (statblob):
Use sp_helpindex and sp_helpstats to display the list of all statistics available for a given table. sp_helpindex lists all indexes on the table, and sp_helpstats lists all the statistics on the table. Each index also carries the statistical information for its columns. The statistical information created using the CREATE STATISTICS command is equivalent to the statistics built by a CREATE INDEX command on the same columns. The only difference is that the CREATE STATISTICS command uses sampling by default while the CREATE INDEX command gathers the statistics with fullscan since it has to process all rows for the index anyway.
Creating Statistics with SQL Server 2005
You can create statistics in SQL Server 2005 in a number of different ways, as described below.
• The optimizer automatically creates single-column statistics as needed as a side effect of optimizing SELECT, INSERT, UPDATE, and DELETE statements when AUTO_CREATE_STATISTICS is enabled, which is the default setting. • CREATE INDEX generates the declared index in the first place, and also creates one set of statistics for the column combinations constituting the index keys (but not other included columns). CREATE STATISTICS only generates the statistics for a given column or combination of columns. • Use sp_createstats to create statistics for all eligible columns (all except XML columns) for all user tables in the current database. A new statistics object will not be created for columns that already have a statistics object. • Use dbcc dbreindex to rebuild one or more indexes for a table in the specified database. • In Management Studio, expand the folder under a Table object, right click the Statistics folder, and choose New Statistics. • Use the Database Tuning Advisor (DTA) to create indexes.
Here is an example of a CREATE STATISTICS command on the AdventureWorks.Person.Contact table:
CREATE STATISTICS FirstLast2 ON Person.Contact(FirstName,LastName)WITH SAMPLE 50 PERCENT
Usually, statistics with default sampling are sufficient to allow good query plans to be produced. However, there may be cases when statistics with larger sample sizes may benefit query optimization, such as when the values in the given column sample are not random. Non-random samples may occur if the data is sorted or clustered. Sorting or clustering may be caused by the creation of indexes, or loading data into a heap structure that is already sorted or clustered. The most commonly used larger sample size is fullscan because it gives the most accurate statistics. The price for using statistics with larger sample sizes is the time required to create the statistics.
The above command creates a single two-column statistics object. In this case, the SAMPLE 50 PERCENT is ignored and a full scan is performed because the table is too small. Sampling is used primarily to avoid excessive scans of data and affects only tables and indices with 1,024 or more pages (8 MB).
In SQL Server 2005, statistics are created for all indices at the index creation time. SQL Server creates single-column statistics automatically when compiling queries. These statistics are created for columns where the optimizer would otherwise have to estimate the approximate density or distribution. There are the following exceptions to this rule. Statistics may not be created automatically when (1) the database is read-only, (2) there are too many outstanding compilations in progress, and (3) the column's data type is not supported for auto stats.
The automatic statistics creation function can be disabled at the database level by executing
ALTER DATABASE dbname SET AUTO_CREATE_STATISTICS OFF
It's recommended that you leave this setting ON. Only disable it if you need to do so to work around known performance issues, such as the need to specify other than the default sampling rate for some tables.
By default, statistics are created by sampling the data set when executing the CREATE STATISTICS command or when the statistics are automatically created. CREATE INDEX scans the whole data set anyway; therefore, the index statistics are initially created without sampling (the equivalent of fullscan). The CREATE STATISTICS command allows you to set the sample size in the WITH clause either by specifying fullscan or the percentage of data or number of rows to scan. It is also possible to inherit the previous sample size when specifying WITH RESAMPLE on the UPDATE STATISTICS command.
For small tables, a minimum of 8 MB of data is sampled. If a table starts small, and you sample with the default sampling rate, and thereafter use the resample option when updating the statistics, you will get the equivalent of fullscan even as the table grows beyond 8 MB. Avoid the use of resample if you want the default sampling rate as table size varies.
The dbcc show_statistics command displays the sample size under the Rows Sampled heading. Statistics created automatically, or updated automatically (as described in the next section) are always generated using default sampling. The default sampling rate is a slow-growing function of the table size, which allows statistics to be gathered relatively quickly even for very large tables.
SQL Server Profiler can monitor automatic statistics creation. The Auto Stats event is in the group of Performance trace events. Select also the IntegerData, Success and ObjectID columns for the Auto Stats event when defining the trace.
The DROP STATISTICS command is used to drop statistics, but it is not possible to drop statistics that are a byproduct of an index. Such statistics are removed only when the index is dropped.
Maintaining Statistics in SQL Server 2005
After a series of INSERTs, DELETEs, and/or UPDATEs are performed on a table, the statistics may not reflect the true data distribution in a given column or index. If the SQL Server query optimizer requires statistics for a particular column in a table that has undergone substantial update activity since the last time the statistics were created or updated, SQL Server automatically updates the statistics by sampling the column values (using auto update statistics). The statistics auto update is triggered by query optimization or by execution of a compiled plan, and involves only a subset of the columns referred to in the query. Statistics are updated before query compilation if AUTO_UPDATE_STATISTCS_ASYNC is OFF, and asynchronously if it is ON.
When a query is first compiled, if the optimizer needs a particular statistics object, and that statistics object exists, the statistics object is updated if it is out of date. When a query is executed and its plan is in the cache, the statistics the plan depends on are checked to see if they are out of date. If so, the plan is removed from the cache and during re-compilation of the query the statistics are updated. The plan also is removed from the cache if any of the statistics it depends on have changed.
SQL Server 2005 determines whether to update statistics based on changes to column modification counters (colmodctrs).
A statistics object is considered out of date in the following cases:
If the statistics object is defined on a temporary table, it is out of date as discussed above, except that there is an additional threshold for recomputation at 6 rows, with a test otherwise identical to test 2 in the previous list.
Table variables do not have statistics at all.
The auto update statistics feature described above may be turned off at different levels.
Re-enabling the automatic updating of statistics can be done similarly using ALTER DATABASE, UPDATE STATISTICS, or sp_autostats.
Auto statistics update is always performed by sampling the index or table using the default sampling rate. To set the sampling rate explicitly, run CREATE or UPDATE STATISTICS
Statistics update is covered by the same SQL Profiler event as statistics creation.