[Prior Post in Series] [Next Post in Series]

Adding statistics helps SQL Server to determine which tables to scan first when there are joins. To understand statistics better, see Statistics Used by the Query Optimizer in Microsoft SQL Server 2005. Statistics are typically added after indexes because if there is an index on the column then there is usually not a need for a statistic.

I have run a sample trace log through Database Engine Tuning Advisor (DTA) and received recommendations which I have saved as XML.

My approach for this demonstration tuning is simple:

  • Identify all column names recommended per table.
  • Put statistics on all tables with these column names.

This produces a good solution given the light-weight trace log and does not require much analysis time.

Getting the list of recommended statistics columns

As cited in the introduction, I use the XML results file to speed analysis. I opened one of my saved XML files and then pasted it below after doing some simple modifications.

  • Delete:
    • <?xml version="1.0" encoding="utf-16"?>
  • Change:
    • <DTAXML xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://schemas.microsoft.com/sqlserver/2004/07/dta">
    • To
    • <DTAXML>

As cited before, this avoids issues with XML namespaces from my TSQL below.

DECLARE @Xml Xml Set @Xml='{paste here}' Select ColName,Count(1) FROM (SELECT node.value('(.)[1]','sysname') as ColName FROM @Xml.nodes('//Statistics/Column') as Ref(node)) DTA GROUP BY ColName ORDER BY COUNT(1) DESC

Which produced results similar to those shown below.

clip_image002

The first column is the column name and the second one is the number of tables it occurs on.

The stored procedure below creates and then executes TSQL to add statistics across all tables that match the patterns identified in the DTA recommendations. These patterns are modified as follows:

  • Only one statistic is added per table
  • The columns in the statistics are added in the order that they appear in the table

This is a quick reasonable solution. If you have the time, a more detailed analysis is best.

This approach reduces the recommendations into just 16 patterns, shown below

clip_image004

The Code Solution

The TSQL below does some transformations on the XML and then creates statistics on all tables matching the pattern. The Statistics name is generated from the table name and the column's ordinal number(aka position in the table) in the statistics so the procedure may be executed as needed without duplicating statistics.

CREATE PROC dta_Statistics @Xml Xml AS SET NOCOUNT ON DECLARE @CMD nvarchar(max),@ColName nvarchar(max), @Table nvarchar(max),@LastTable nvarchar(max),@StatID nvarchar(max) DECLARE @ColCnt int,@ColIdx int CREATE Table #temp(TName varchar(128),CName varchar(128), ColNo int) INSERT INTO #temp(TName,CName) SELECT DISTINCT TableName, node2.value('.','sysname') as ColName FROM ( SELECT node.value('(parent::*/parent::*/parent::*/Name)[1]','sysname') as TableName, node.query('.') as sNode FROM @Xml.nodes('//Create/Statistics') as Ref(node) ) DTA CROSS APPLY snode.nodes('(Statistics/Column/Name)') Ref(node2) UPDATE #Temp SET ColNo=Ordinal_Position FROM #Temp JOIN INFORMATION_SCHEMA.Columns ON TName=Table_Name And Replace(Replace(CName,'[',''),']','')=Column_Name CREATE Table #temp2(InCol varchar(max), ColCnt int, StatId varchar(max)) SET @Cmd='' SET @StatId='' SET @LastTable='' SET @ColCnt=0 DECLARE PK_Cursor CURSOR FOR Select TName,CName,ColNo FROM #temp ORDER BY TName,ColNo OPEN PK_Cursor; FETCH NEXT FROM PK_Cursor INTO @Table,@ColName,@ColIdx WHILE @@FETCH_STATUS = 0 BEGIN IF @LastTable <> @TABLE BEGIN IF Len(@CMD) > 0 INSERT INTO #temp2 (InCol,ColCnt,StatID) Values(@CMD,@ColCnt,@StatID) SET @LastTable = @TABLE SET @CMD='' SET @StatId='' SET @ColCnt=0 END IF Len(@CMD) > 0 BEGIN SET @CMD=@CMD+',' SET @StatId=@StatId+'_' END SET @CMD=@CMD+@ColName SET @StatID=@StatID+Cast(@ColIdx as varchar(11)) SET @ColCnt=@ColCnt+1 FETCH NEXT FROM PK_Cursor INTO @Table,@ColName,@ColIdx END IF Len(@CMD) > 0 INSERT INTO #temp2 (InCol,ColCnt,StatId) Values(@CMD,@ColCnt,@StatId) CLOSE PK_Cursor; DEALLOCATE PK_Cursor; CREATE Table #temp3(TName varchar(max), InCol varchar(max), StatId varchar(max) ) SET @CMD='' DECLARE P2_Cursor CURSOR FOR Select InCol,ColCnt,StatId FROM #temp2 ORDER BY InCol,ColCnt OPEN P2_Cursor; FETCH NEXT FROM P2_Cursor INTO @ColName,@ColCnt,@StatId WHILE @@FETCH_STATUS = 0 BEGIN IF Len(@CMD) > 0 SET @CMD=@CMD+' UNION ' SET @CMD=@CMD+'Select Table_Name,'''+@ColName +''',''STATS_''+Table_Name+''_''+'''+@StatID+''' FROM Information_Schema.Columns JOIN Sys.Objects ON NAME=Table_Name AND Objects.Type=''U'' WHERE COLUMN_Name in ('+REPLACE(REPLACE(@ColName,'[',''''),']','''')+') GROUP BY TABLE_NAME HAVING Count(1)=' +Cast(@ColCnt as varchar(11)) FETCH NEXT FROM P2_Cursor INTO @ColName,@ColCnt,@StatId END CLOSE P2_Cursor; DEALLOCATE P2_Cursor; CREATE TABLE #Temp4 (TName sysname,Cols nvarchar(max), StatId sysname) INSERT INTO #Temp4 EXEC (@Cmd) DECLARE P3_Cursor CURSOR FOR Select TName,Cols,StatId FROM #temp4 OPEN P3_Cursor; FETCH NEXT FROM P3_Cursor INTO @Table,@ColName,@StatId WHILE @@FETCH_STATUS = 0 BEGIN IF NOT EXISTS(SELECT 1 FROM Sys.Stats where Name=@STATID) BEGIN SET @CMD='CREATE STATISTICS ['+@STATID+'] ON ['+@Table +'] ('+@ColName+')' EXEC(@CMD) PRINT @CMD END FETCH NEXT FROM P3_Cursor INTO @Table,@ColName,@StatId END CLOSE P3_Cursor; DEALLOCATE P3_Cursor;

I did a PRINT @CMD above so you can see the creation of the statistics code, for example:

CREATE STATISTICS [STATS_USER_GROUP_8_9_10_11]
ON [USER_GROUP] ([USR_OBJECT_ID],[GRP_OBJECT_ID],[CNTX_OBJECT_ID],[CNTX_CLASS_ID]) CREATE STATISTICS [STATS_USERS_1_2_3_10]
ON [USERS] ([OBJECT_ID],[CLASS_ID],[LOGIN],[FIRST_NAME])

For this database, a total of 291 statistics were added.

Summary

The above solution is a good solution but is likely not perfect. Some of the statistics are candidates for further optimization. For example, consider this set:

CREATE STATISTICS [STATS_TN_VIDEO_1_2_3_4_5_6_7_8_9_10_16] CREATE STATISTICS [STATS_TN_VIDEO_1_2_3_4_5_6_7_8_9_10_50] CREATE STATISTICS [STATS_TN_VIDEO_1_2_3_4_5_6_7_8_9_10_51] CREATE STATISTICS [STATS_TN_VIDEO_1_2_3_4_5_6_7_8_9_10_54] CREATE STATISTICS [STATS_TN_VIDEO_1_2_3_4_5_6_7_8_9_10_57] CREATE STATISTICS [STATS_TN_VIDEO_1_2_3_4_5_6_7_8_9_10_58]

Analysis and testing would reveal if dropping these 6 indexes and replacing them with one of the following would improve performance better:

  • [STATS_TN_VIDEO_1_2_3_4_5_6_7_8_9_10]
  • [STATS_TN_VIDEO_1_2_3_4_5_6_7_8_9_10_16_50_51_54_57_58]

There is a balance between the time spent improving performance and the performance gain, and common sense should be used.

A second item that I did not do was checking for indexes using the columns specified. The TSQL code above was already pushing the limit for complexity in a blog post.