The Setup:
Standard best practise is to have auto create and auto update statistics set for SQL Server databases. But there is no corresponding setting to remove statistics that are no longer necessary. Unless explicitly managed by the DBA these statistics could be causing excess resource usage.
Demo:
Create a new database and port some data across from the AdventureWorks sample database.
if db_id('stats_duplicate') is not null drop database stats_duplicate go create database stats_duplicate go use stats_duplicate go select * into dbo.only_table from AdventureWorks2012.Sales.SalesOrderDetail
Verify that auto create and auto update statistics are set as expected.
select is_auto_create_stats_on , is_auto_update_stats_on from sys.databases where name = 'stats_duplicate'
There is a single index (the heap) on the new table and no statistics.
select * from sys.indexes where [object_id] = object_id('dbo.only_table') select * from sys.stats where [object_id] = object_id('dbo.only_table')
Now we run a query with a predicate.
select * from dbo.only_table where ProductID = 715
The query optimiser will automatically create a statistic on the ProductID column.
And we can examine the statistic:
dbcc show_statistics('dbo.only_table',_WA_Sys_00000005_0EA330E9)
We can see that the statistic is created on the ProductID column. Take note of when the statistic was last updated. (I know – I should have better things to do at 9:45pm!)
Now if we create an index on the ProductID column this will also create a statistic on the column.
create nonclustered index ix_productid on dbo.only_table(ProductID)
Which we can examine:
dbcc show_statistics('dbo.only_table',ix_productid)
Again – take note of the updated time.
Now let’s run an update of the ProductID column against all rows. This will mark the statistics as stale. If we run a query that will use the statistics this will trigger an automatic update.
update dbo.only_table set ProductID += 1 select * from dbo.only_table where ProductID = 715
And look at the stat headers.
dbcc show_statistics('dbo.only_table',_WA_Sys_00000005_0EA330E9) with stat_header dbcc show_statistics('dbo.only_table',ix_productid) with stat_header
We can see by the dates that both statistics were updated even though only one will ever be used. We can confirm this by using a technique from Paul White that can show what statistics are considered, and what statistics are used for a given query.
DBCC FREEPROCCACHE go select * from dbo.only_table where ProductID = 715 OPTION ( QUERYTRACEON 3604, QUERYTRACEON 9292, QUERYTRACEON 9204 )
Conclusion: Automatically created statistics can be rendered obsolete by subsequent index creations but they will still be updated and managed by SQL Server – consuming resources. I’ll leave with a script that I’ve based on ideas from Erin Stellato that will pick out any duplicate statistics in a given database.
with stats_on_indexes([object_id],[table_column_id],[index_name]) as( select o.[object_id] as [object_id], ic.[column_id] as [table_column_id], i.name from sys.indexes i join sys.objects o on i.[object_id] = o.[object_id] join sys.stats st on i.[object_id] = st.[object_id] and i.name = st.name join sys.index_columns ic on i.index_id = ic.index_id and i.[object_id] = ic.[object_id] where o.is_ms_shipped = 0 and i.has_filter = 0 and ic.index_column_id = 1 ) select o.[object_id] as [ID], o.name as [Table], c.name as [Column], s.name as [AutoCreatedStatistic], stats_on_indexes.index_name as [Index] from sys.stats s join sys.stats_columns sc on s.stats_id = sc.stats_id and s.[object_id] = sc.[object_id] join sys.objects o on sc.[object_id] = o.[object_id] join sys.columns c on sc.[object_id] = c.[object_id] and sc.column_id = c.column_id join stats_on_indexes on o.[object_id] = stats_on_indexes.[object_id] and stats_on_indexes.table_column_id = c.column_id where s.auto_created = 1 and s.has_filter = 0
Pingback: (SFTW) SQL Server Links 30/08/13 • John Sansom
Same idea i had here. http://dnhlmssql.blogspot.com/2013/04/create-index-also-creates-statistics.html
Thanks for the info. Good script.
Hi, thank you for this interesting info.
I have a question regarding your last script for retrieving the stats involved in compiling a plan.
Here is my output:
…
Stats header loaded: DbName: stats_duplicate, ObjName: dbo.only_table, IndexId: 2, ColumnName: ProductID, EmptyTable: FALSE
Stats loaded: DbName: stats_duplicate, ObjName: dbo.only_table, IndexId: 2, ColumnName: ProductID, EmptyTable: FALSE
Stats header loaded: DbName: stats_duplicate, ObjName: dbo.only_table, IndexId: 3, ColumnName: ProductID, EmptyTable: FALSE
(1218 row(s) affected)
I’m puzzled by the index_ids; index_id=3 corresponds to ix_productid but I see no index_id=2 anywhere when I run the following:
select index_id from sys.indexes
where [object_id] = object_id(‘dbo.only_table’)
output:
index_id
0
3
I don’t know what to make of it; any thoughts?
Thanks again
Hi Marios, thanks for the thoughtful comment. As I said – I got that technique from Paul White who could likely answer your question with more surety than I. But – from what I can see it looks like that id matches the stats id rather than the index id. It looks like index id 3 matches stat id 3, whereas stat id 2 is the auto created stat, with no corresponding index.
I have same question as Marios. Your last script is not helpful at all. Even if the content is inspired from Paul White, i would expect you to test it first and then publish!!!
You might like to read my answer to Marios’s question – I believe that I addressed his issue.
I generally only publish scripts that I have written and tested myself. When I have borrowed ideas or parts of scripts from elsewhere – I always test in my own environment and attribute credit appropriately. If you do not find a script that I have published useful – please feel free not to use it 🙂
Hi Matt,
I was pondering this very thing. A quick search brought me here. The script was exactly what I was looking for. Thank you for a great blog post!
Pingback: Removing Duplicate Statistics | Shaun J Stuart
I believe there is a bug in this script. It incorrectly identifies the first column in an index. I posted an updated version on my blog at http://shaunjstuart.com/archive/2016/08/removing-duplicate-statistics/
Thanks for taking the time to research, fix, and to post the link 🙂
Pingback: Why is the auto created statistic on this column empty?