Wildcards and Indexes

I was asked a good question the other day, one that I wasn’t sure of the answer. The question was “does SQL Server use an index if you have a wildcard in the search predicate?”

My initial thought was it would if the wildcard was not the first character but I thought it would be worth testing.

I wanted a nice simple table to use for testing so I grabbed an Id and first, middle and last names from one of the AdventureWorks sample databases and ported them into a new table in my sandpit test database:

select 
	BusinessEntityID,
	FirstName,
	MiddleName,
	LastName 
into sandpit.dbo.tbl_WildCardTest
from AdventureWorks2012.Person.Person

Have a look at what we’ve got:

select * from sandpit.dbo.tbl_WildCardTest

On my SQL Server 2012 instance with the AdventureWorks2012 database this returns 19972 rows.

Create a non-clustered index on the LastName column:

use sandpit
go

create nonclustered index ix_WildCardTest 
on tbl_WildCardTest(LastName)

Verify:

select * from sys.indexes 
where [object_id] = object_id('dbo.tbl_WildCardTest')

Okay, I’ve had a quick scan through and found a nice selective last name – one that under normal circumstances will use our non clustered index.

select * 
from sandpit.dbo.tbl_WildCardTest
where LastName like 'Dievendorff'

This returns a single row and the execution plan as you would expect has an index seek against the non clustered index followed by a RID lookup on the heap.

So switch on show actual execution plan by pressing [ctrl]+[m] or using this button in the GUI.

ExecutionPlanButton

And run a range of queries with a wildcard at the start, at the end, and both:

select * from sandpit.dbo.tbl_WildCardTest
where LastName like 'Dievendorf%'

select * from sandpit.dbo.tbl_WildCardTest
where LastName like '%ievendorff'

select * from sandpit.dbo.tbl_WildCardTest
where LastName like '%ievendorf%'

And check the plans.

ExecutionPlans1

So when the wild card is not at the start of the search predicate SQL Server chooses an index seek. When there is a wild card at the beginning of the predicate SQL Server needs to use a scan – the index is still chosen because a scan of a narrow index is lighter weight than a scan of the heap.

Even without the wild card at the start of the string eventually the selectivity of the predicate drops to the point that the index seek is abandoned in favour of a table scan.

select * from sandpit.dbo.tbl_WildCardTest
where LastName like 'Die%'

select * from sandpit.dbo.tbl_WildCardTest
where LastName like 'Di%'

ExecutionPlans2

What about if we run the same tests but this time limit the columns in the select list so that they are covered by the index?

select LastName from sandpit.dbo.tbl_WildCardTest
where LastName like 'Dievendorf%'

select LastName from sandpit.dbo.tbl_WildCardTest
where LastName like '%ievendorff'

select LastName from sandpit.dbo.tbl_WildCardTest
where LastName like '%ievendorf%'

ExecutionPlans3

So the way SQL Server chooses seeks or scans is the same – the difference is that now the query is covered so the RID lookup isn’t needed.

And as we lose selectivity…

select LastName from sandpit.dbo.tbl_WildCardTest
where LastName like 'Die%'

select LastName from sandpit.dbo.tbl_WildCardTest
where LastName like 'Di%'

ExecutionPlans4

This time because the index covers the query SQL Server chooses an index seek over the table scan and will do so even for the least selective query:

select LastName from sandpit.dbo.tbl_WildCardTest
where LastName like 'D%'

Conclusion: SQL Server does often use an index when a wildcard is present in the search predicate.

An index seek can be used when the wildcard is not at the start of the string, and there are enough literal characters before the wild card for the selectivity to warrant a seek.

If the wild card is at the start of the string then a scan must be used – SQL Server will choose to scan the index rather than the table because the index is narrower.

If the index covers the query then a RID (or clustered key) look-up won’t be required and SQL Server will choose an index seek for even the minimally selective predicate with a single literal character preceding the wildcard.

Posted in SQL Server, T-SQL, Database Theory | Leave a comment

Quick tip: Using sp_server_diagnostics

In SQL Server 2012 a new procedure sp_server_diagnostics returns a payload of information about server health and performance. The sproc is leveraged by the Windows Failover Cluster service and is designed to run in repeat mode. However the sproc can be run in non-repeat mode and loaded into a temp table. From there we can slice and dice the XML payload as desired.

Create a temp table and load the results of sp_server_diagnostics:

create table #servdiag
(
	create_time datetime,
	component_type sysname,
	component_name sysname,
	[state] int,
	state_desc sysname,
	data varchar(max)
)

insert into #servdiag
exec sp_server_diagnostics

And have a look at the results:

select * from #servdiag

To have a look at the XML we can do a quick convert:

select cast(data as xml) as xml_data 
from #servdiag for xml auto,elements

Clicking on this link will show all five XML fragments concatenated into a single fragment.

From here, there’s a number of ways to shred the XML into relational data – xquery, XML data type methods or openxml. I’ll present some queries that use openxml. This method begins by preparing an XML document so that SQL Server is able to parse the resultant DOM. Using the XML data type methods will implicitly create this DOM per query. The data type methods are slightly more efficient for single queries, however if you are using multiple queries there might be some advantage in preparing the DOM once and then accessing it multiple times. Your mileage may vary.

Here are some queries against our temp table from above.

Memory metrics:

declare @x varchar(max)
declare @dochandle int

select @x = data 
from #servdiag 
where component_name = 'resource'

exec sp_xml_preparedocument @dochandle output, @x

select * 
from openxml(@dochandle, '/resource/memoryReport/entry', 3) 
with (description varchar(255), value bigint)

exec sp_xml_removedocument @dochandle

Top 10 waits by count:

declare @x varchar(max)
declare @dochandle int

select @x = data 
from #servdiag 
where component_name = 'query_processing'

exec sp_xml_preparedocument @dochandle output, @x

select * 
from openxml(@dochandle, '/queryProcessing/topWaits/nonPreemptive/byCount/wait', 3) 
with (waitType varchar(255), waits bigint, averageWaitTime bigint, maxWaitTime bigint)

exec sp_xml_removedocument @dochandle

Top 10 waits by duration:

declare @x varchar(max)
declare @dochandle int

select @x = data 
from #servdiag 
where component_name = 'query_processing'

exec sp_xml_preparedocument @dochandle output, @x

select * 
from openxml(@dochandle, '/queryProcessing/topWaits/nonPreemptive/byDuration/wait', 3) 
with (waitType varchar(255), waits bigint, averageWaitTime bigint, maxWaitTime bigint)

exec sp_xml_removedocument @dochandle

Drop the temp table when you’ve finished playing:

drop table #servdiag
Posted in SQL Server, T-SQL | 1 Comment

SQL Server 2008 Statistics: What does a DBA need to know?

I’ve been looking into the behaviour of statistics in SQL Server 2008 – so I’d like to present an overview from a DBA perspective.

What are statistics?

Statistics are a database scoped object that are used by the query optimiser in creating query plans. They describe the distribution of values in a column or columns and help the optimiser choose indexes to use. The optimiser uses a cost based weighting to choose a best plan, and the statistics are used to estimate the number of rows that will be processed by a given operator, and hence the cost of that operator.

Statistics are built by sampling the data values, the ratio of rows sampled to total rows is the sample rate.

A statistic object is automatically created for every index on a table, and if the auto_create_statistics database option is set then the optimiser will create a statistic object for every column in a query predicate. A statistics object can also be created with the CREATE STATISTICS ddl.

Statistics can be examined at a database level by querying the system view sys.stats. This query will list all stats on user tables in the current database, and uses the stats_date() function to determine when the stat was last updated.

select 
o.type_desc
,o.name as 'Object'
,s.name as 'Stat'
,stats_date(s.object_id,stats_id) as LastUpdated
from sys.stats s
inner join sys.objects o
on s.object_id = o.object_id
where o.type_desc <> 'SYSTEM_TABLE' --exclude internal tables

And an individual statistics object, the so-called stat blob, can be examined using DBCC SHOW_STATISTICS(). Have a look at the links at the end of this post if you want to know more detail about the data surfaced by DBCC SHOW_STATISTICS().

Automatic statistic maintenance:

If you look at a query plan and find large discrepancies between estimated and actual row counts – there is a good chance your statistics are out of date.

There are three database scoped options that directly impact statistic maintenance: auto_create, auto_update and auto_update_async.

Auto create dictates whether the query optimiser will create statistics on columns used in predicates during optimisation if required.

Auto update dictates whether the optimiser will automatically update an out of date or stale statistic if it is detected at optimisation time. (An update statistics event will also trigger a recompilation of any cached plans that might use the statistic). SQL server keeps track of modifications at a column level using the undocumented internal table sys.sysrscols and this is used to determine whether the statistic is out of date.

Auto update async means that the statistic updates occur asynchronously on a background thread. This helps to mitigate an issue I’ll discuss shortly.

Possible issues with automatic maintenance:

Auto create and auto update statistics are the SQL Server defaults and should be left enabled unless there is a compelling reason to disable them.

There are a few reasons that a DBA might choose to manually manage the maintenance of statistics:

1. There is a regular maintenance window during which statistics are updated.

2. On large tables – auto update events can take too long and cause performance issues.

3. Again, for large tables, the automatically calculated sample rate may be too small for useful statistics. Manually updating statistics allows the DBA to control sample rates. Additionally, an index rebuild operation will also rebuild the associated statistic with a 100% sample rate – likely to be a far better than an automatic update for large tables.

4. As mentioned, a statistics update is initiated by the optimiser during the query optimisation phase. This means the update is part of the original transaction. If the update takes too long and pushes the transaction past a query timeout threshold – the transaction, including the statistics update will be cancelled and rolled back. If the transaction is attempted again – the statistics are still stale, and an update is initiated…

Using the auto update async option should ameliorate this issue by spawning the update onto a background thread while the original query proceeds with the stale stats.

5. The threshold at which a statistic is declared stale for tables greater than 500 rows is 500 + 20% of total rows. For large tables this can be too infrequent. Highly skewed updates can cause statistics to become misleading long before this threshold is reached.

Overriding automatic behaviour:

Statistics are always automatically created for indexes. It is possible to suppress the creation of the additional column statistics that the optimiser creates on predicate fields at the database scope.

ALTER DATABASE <database name> SET AUTO_CREATE_STATISTICS OFF

And to override the automatic updating behaviour at a database scope.

ALTER DATABASE <database name> SET AUTO_UPDATE_STATISTICS OFF

The automatic updating behaviour can be overridden at a statistic level by using the NORECOMPUTE option with the CREATE or UPDATE STATISTICS statements or by using sp_autostats.

This will show the auto update status of all indexes on a given table or indexed view:

exec sp_autostats 'table or indexed view name'

This will disable automatic updating for an individual statistic:

exec sp_autostats 'table or indexed view name'
                  ,'off'
                  ,'statistic name'

NOTE: You can disable automatic updating for individual statistics if this is enabled at the database level. You can NOT enable automatic updating for individual statistics if this is disabled at the database level.

Manual statistic maintenance:

If you look at a query plan and find large discrepancies between estimated and actual row counts – there is a good chance your statistics are out of date.

Statistics can be manually updated with the sproc sp_updatestats or with the UPDATE STATISTICS statement.

Running sp_updatestats will update all stale statistics in the current database.

Using the resample parameter:

EXEC sp_updatestats 'resample'

will cause the updates to use the last sample rate for each statistic. If resample is not specified the default sample rate calculations are used.

NOTE: only statistics considered stale will be updated by the sproc. Consider the output when run on my test machine:

sp_updatestats

The UPDATE STATISTICS statement can be executed at the table or indexed view or individual statistic level. This allows granular control over the sample rate and the statistic will be updated whether SQL considers it to be stale or not.

Index maintenance and statistics:

The distribution of data is not altered when an index is rebuilt or reorganised so there is no requirement for statistics to be updated as a result of these operations. However statistics are updated/rebuilt with a 100% sample rate when an index is rebuilt but are not updated not when an index is reorganised. It is, therefore, counter-productive to update stats after a rebuild.

However, index maintenance is likely to be occurring in a maintenance window, so it does make sense to maintain statistics where necessary during this window.

Some trace flags:

2371 This trace flag was added in SQL Server 2008 R2 SP1 and lowers the threshold at which statistics are considered stale. This is recommend for SAP systems.

205 & 8721 These trace flags will log auto update activity to the SQL Server logs to help troubleshoot issues with statistics.

Conclusion:

Statistics are a critical component of SQL Server’s query optimisation system. By default the maintenance of statistics should be left to SQL Server. However, there is the potential to closely manage select statistics as part of regular maintenance to improve performance.

A full treatment of SQL Server statistics is well beyond the scope of this overview. The interested reader is directed to the links below, alongside the various links in the text.

References:


http://msdn.microsoft.com/en-us/library/dd535534(v=sql.100).aspx


http://msdn.microsoft.com/en-us/library/ms190397(v=sql.100).aspx


http://technet.microsoft.com/en-US/sqlserver/gg508890.aspx

Microsoft SQL Server 2008 Internals, Kalen Delaney et al.

Posted in Database options, SQL Server, Uncategorized | Tagged , | Leave a comment

Restore database permissions

Introduction:

I came across a couple of subtle gotchas around the permissions required to restore a database in SQL Server 2008+ recently. I’m interested in situations where a DBA needs to allow certain users the ability to restore a given database, for example refreshing UAT, while maintaining minimum permission levels and following accepted best practice.

BOL documentation states that:

“If the database being restored does not exist, the user must have CREATE DATABASE permissions to be able to execute RESTORE. If the database exists, RESTORE permissions default to members of the sysadmin and dbcreator fixed server roles and the owner (dbo) of the database.”

“RESTORE permissions are given to roles in which membership information is always readily available to the server. Because fixed database role membership can be checked only when the database is accessible and undamaged, which is not always the case when RESTORE is executed, members of the db_owner fixed database role do not have RESTORE permissions.”

Now the principal of least privilege means I’m not really interested in members of the sysadmin server role.

Also, according to generally accepted practice our database should be owned by the sa account. (Interestingly, if a DBA chooses not to have databases owned by the sa account, they face further restrictions. SQL Server will not allow a database to be owned by a login created from a Windows group, leaving individual windows accounts or SQL Server accounts.)

So I’m interested in how dbcreator server role membership, db_owner database role membership and create database permission in the master database relate to the following three restore scenarios: restore into an existing database, restore into an existing database with replace, and restore to a new database.

Setup:

To test I’ll create a database and take a backup, and then create a SQL Server login.

USE [master]
GO
CREATE DATABASE TestRestore
GO
ALTER AUTHORIZATION ON DATABASE::TestRestore TO sa
GO
BACKUP DATABASE TestRestore 
TO DISK = N'<backup path>\TestRestore.bak'
GO
CREATE LOGIN [OwnerRoleMember] 
WITH PASSWORD = '123', CHECK_POLICY = OFF
GO

Now I’ll gradually elevate permissions and see what happens with the restores, starting with.

db_owner database role membership:

Map the user and add to the db_owner role.

USE TestRestore
GO
CREATE USER [OwnerRoleMember] FROM LOGIN [OwnerRoleMember]
EXEC sp_addrolemember 'db_owner','OwnerRoleMember'

Impersonate the login/user we created:

EXECUTE AS LOGIN = 'OwnerRoleMember'

SELECT suser_sname()

And try some restores:

USE master
GO
RESTORE HEADERONLY 
FROM DISK = N'<backup path>\TestRestore.bak'
GO
RESTORE FILELISTONLY 
FROM DISK = N'<backup path>\TestRestore.bak'
GO
RESTORE DATABASE TestRestore 
FROM DISK = N'<backup path>\TestRestore.bak'
GO
RESTORE DATABASE TestRestore 
FROM DISK = N'<backup path>\TestRestore.bak' 
WITH REPLACE

All four restores error. If we drop the existing database and rerun the restores they will not surprisingly error again. If you didn’t delete the database and try the restores then you need to clean up and remove the db_owner membership.

REVERT --exit the security context from earlier
GO
USE TestRestore
GO
EXEC sp_droprolemember 'db_owner','OwnerRoleMember'

Create database permission in the master database:

Recreate the database if you dropped it during the last tests, map the login into the master database and grant the permission.

USE master
GO
CREATE USER [OwnerRoleMember] FROM LOGIN [OwnerRoleMember]
GO
GRANT CREATE DATABASE TO [OwnerRoleMember]

Have a look at the server scoped and database (in the master database) scoped permissions

EXECUTE AS LOGIN = 'OwnerRoleMember'

SELECT suser_sname()

SELECT * FROM fn_my_permissions(null,'database')
SELECT * FROM fn_my_permissions(null,'server')

We can see that our login has view any database server scoped permission and create database permission in the master database. Now we stay in our impersonated security context and test the restores against an existing database.

USE master
GO
RESTORE HEADERONLY 
FROM DISK = N'<backup path>\TestRestore.bak'
RESTORE FILELISTONLY 
FROM DISK = N'<backup path>\TestRestore.bak'
RESTORE DATABASE TestRestore 
FROM DISK = N'<backup path>\TestRestore.bak'
RESTORE DATABASE TestRestore 
FROM DISK = N'<backup path>\TestRestore.bak' WITH REPLACE

The restore headeronly and filelistonly both succeed but the actual restores fail. Delete the database and rerun the restores (you’ll have to revert security contexts to be able to drop the database).

REVERT
GO
DROP DATABASE TestRestore 
GO
EXECUTE AS LOGIN = 'OwnerRoleMember'
GO
RESTORE DATABASE TestRestore 
FROM DISK = N'<backup path>\TestRestore.bak'
GO
REVERT
DROP DATABASE TestRestore 
GO
EXECUTE AS LOGIN = 'OwnerRoleMember'
GO
RESTORE DATABASE TestRestore 
FROM DISK = N'<backup path>\TestRestore.bak' 
WITH REPLACE

All restores succeed. So this level of permission will allow you to restore a new database, but not restore over an existing database.
To make sure we start with a clean slate I’m going to drop the user/login. (After the most recent restore the database will be owned by our login so we need to set that back to sa)

ALTER AUTHORIZATION ON DATABASE::TestRestore TO sa
GO
REVOKE CREATE DATABASE FROM [OwnerRoleMember]
GO
DROP USER [OwnerRoleMember]
GO
DROP LOGIN [OwnerRoleMember];

dbcreator server role:

Recreate the database and login and add it to the dbcreator server role. Check the permissions and notice that we have the same permissions as before with the addition of the server scoped create any database privilege.

CREATE LOGIN [OwnerRoleMember] 
WITH PASSWORD = '123', CHECK_POLICY = OFF
GO
EXEC sp_addsrvrolemember [OwnerRoleMember],'dbcreator'
GO
EXECUTE AS LOGIN = 'OwnerRoleMember'

SELECT suser_sname()

SELECT * FROM fn_my_permissions(null,'database')
SELECT * FROM fn_my_permissions(null,'server')

And run the restores again. This time all restores succeed.

Conclusion:

In order to be able to restore a backup onto SQL Server requires some sort of server scoped permissions. The create database permission in the master database will also grant the view any database server scoped permission. This will allow the principal to restore into a new database but not over an existing database. Add the create any database server scoped permission (which is granted by the dbcreator server role) and you can also restore over the top of an existing database.

Posted in Database options, Disaster Recovery, Security, SQL Server, T-SQL | Tagged , , | Leave a comment

Migrating SQL Server Databases that use Database Master Keys

There’s a lot of things to consider when migrating databases between SQL Server instances. I want to provide some tips for dealing with Database Master Keys, and in particular the case when those keys are encrypted by the Service Master Key.

Before getting into the details, let’s take a quick overview of Service Master Keys, Database Master Keys and how they interact.

Service Master Key: At the top of the key hierarchy is the Service Master Key. There is one per SQL Server instance, it is a symmetric key, and it is stored in the master database. Used to encrypt Database Master Keys, Linked Server passwords and Credentials it is generated at first SQL Server startup.

There are no user configurable passwords associated with this key – it is encrypted by the SQL Server service account and the local machine key. On startup SQL Server can open the Service Master Key with either of these decryptions. If one of them fails – SQL Server will use the other one and ‘fix’ the failed decryption (if both fail – SQL Server will error). This is to account for situations like clusters where the local machine key will be different after a failover. This is also one reason why service accounts should be changed using SQL Server Configuration Manager – because then the Service Master Key encryption is regenerated correctly.

Service Master Keys can be manually regenerated using this statement:

alter service master key regenerate

This will decrypt and re-encrypt all secrets encrypted with the key.

The Service Master Key is backed up with the master database, but can be backed up and restored independently:

backup service master key to file = N'<filepath and filename>'
encryption by password = '<password>'
go

restore service master key from file = N'<filepath and filename>'
decryption by password = '<password>'
go

The Service Master Key properties are available by running the following query in the master database:

use master
go
select * from sys.symmetric_keys

Database Master Key: This is a database scoped symmetric key that is encrypted by the Service Master Key and stored in the database. It can be used to encrypt certificates and/or asymmetric keys within the database. Unlike the Service Master Key which is generated automatically, a Database Master Key must be created with DDL:

use <database>
go
create master key encryption by password = '<password>'

The Database Master Key can also be examined in the system catalog:

use <database>
go
select * from sys.symmetric_keys

Database Master Keys are backed up and restored with the database, but can also be backed up and restored independently.

use <database>
go
backup master key to file = N'<filepath and filename>'
	encryption by password = '<file password>'

restore master key from file = N'<filepath and filename>'
	decryption by password = '<file password>'
	encryption by password = '<new encryption password>'

There is one Database Master Key per database and it must be protected by a password. The Database Master Key can be encrypted by multiple passwords and any of these can be used to de-crypt the key. By default the Database Master Key is also encrypted by the Service Master Key, this can be switched off if needed.

Migrating Databases: If a database makes use of a Database Master Key, and that key is encrypted by the Service Master Key, then this encryption will need to be regenerated on the destination instance after any migration. Here are the basic steps.

  1. Firstly verify whether there are any Database Master Keys encrypted by the Service Master Key:
    select name
    from sys.databases
    where is_master_key_encrypted_by_server = 1
  2. (Optional) If you don’t know a valid password for the Database Master Key you can create a new one. (Remember that multiple passwords can encrypt the DMK)
    use <database>
    go
    alter master key 
    add encryption by password = 'migration_password'
    go
  3. Drop the encryption by the Service Master Key:
    use <database>
    go
    alter master key drop encryption by service master key
    go
  4. Migrate the database using either backup and restore, or detach and attach.
  5. Open the Database Master Key with a password (this could be the password created at step 2) and re-activate the encryption by Service Master Key – this will be mapped to the SMK on the new SQL instance:
    use <database>
    go
    open master key decryption by password = '<Password>'
    alter master key add encryption by service master key
    go
  6. (Optional) If you created a password specifically for the migration in step 2, then you should drop it:
    use <database>
    go
    alter master key 
    drop encryption by password = 'migration_password'
    go

Conclusion: The encryption, by default, of the Database Master Key by the Service Master Key means that it can easily go unnoticed by a DBA with sys admin privileges. My intention with this post was to provide a brief overview of the two keys, provide some statements used to modify and examine the keys, and outline the steps involved in migrating a database with a Database Master Key encrypted by the Service Master Key.

Posted in Database options, Disaster Recovery, Security, SQL Server, T-SQL | Tagged , , | 2 Comments

Don’t believe everything you read: Truncate table is not logged

Actually I’m not sure if anyone still believes this anymore. If you read the BOL description carefully it states that truncate de-allocates the pages rather than deleting the rows – meaning only the de-allocations are logged – resulting in far fewer log records. Incidentally this is also why the truncate operation can be rolled back – the pages haven’t been deleted, so can be re-allocated.

That said I had just discovered some cool queries in Itzik Ben-Gan’s book designed to analyse transaction log internals and wanted to try some experiments.

This query uses the undocumented fn_dblog() function. This function uses the current database context and accepts a start and end LSN as arguments, if these are passed as null then all active log records are returned.

The idea behind the query is to query the transaction log and store the number of log records, the size of the log records and the current time. Then execute some operation and query the transaction log again – comparing the results to the saved results to get the deltas for the operation. Results are aggregated for easier analysis and followed by a breakdown of the log operations.

Here’s the basic framework:

--declare variables for initial results
DECLARE @numrecords AS INT, @size AS BIGINT, @dt AS DATETIME;

--query and save initial results
SELECT 
  @numrecords = COUNT(*),
  @size       = COALESCE(SUM([Log Record Length]), 0),
  @dt         = CURRENT_TIMESTAMP
FROM fn_dblog(NULL, NULL) AS L
WHERE AllocUnitName = '<table name>' 
	OR AllocUnitName LIKE '<table name>.%';

--perform some operation

--query the log again and compare with initial results
SELECT 
  COUNT(*) - @numrecords AS numrecords,
  CAST((COALESCE(SUM([Log Record Length]), 0) - @size)
    / 1024. / 1024. AS NUMERIC(12, 2)) AS size_mb,
  CAST(DATEDIFF(millisecond, @dt, CURRENT_TIMESTAMP)/1000. 
   AS DECIMAL(12,3))
    AS duration_sec
FROM fn_dblog(NULL, NULL) AS L
WHERE AllocUnitName = '<table name>' 
	OR AllocUnitName LIKE '<table name>.%';

-- Breakdown of Log Record Types
SELECT Operation, Context,
  AVG([Log Record Length]) AS AvgLen, COUNT(*) AS Cnt
FROM fn_dblog(null, null) AS L
WHERE AllocUnitName = '<table name>' 
OR AllocUnitName LIKE '<table name>.%'
GROUP BY Operation, Context
ORDER BY Operation, Context;

So let’s put it to the test. For this demo I’m going to use the AdventureWorks2008 database – but any database will do really. I’m looking at the Sales.SalesOrderDetail table which has 121317 rows.
Backup the database first.

use master
go

Backup database AdventureWorks2008
to disk = N'<your backup directory>\AdventureWorks2008.bak'

Now run the query and delete every row from the table:

USE AdventureWorks2008
GO

DECLARE @numrecords AS INT, @size AS BIGINT, @dt AS DATETIME;

SELECT 
  @numrecords = COUNT(*),
  @size       = COALESCE(SUM([Log Record Length]), 0),
  @dt         = CURRENT_TIMESTAMP
FROM fn_dblog(NULL, NULL) AS D
WHERE AllocUnitName = 'Sales.SalesOrderDetail' 
OR AllocUnitName LIKE 'Sales.SalesOrderDetail.%';

DELETE [Sales].[SalesOrderDetail];

SELECT 
  COUNT(*) - @numrecords AS numrecords,
  CAST((COALESCE(SUM([Log Record Length]), 0) - @size)
    / 1024. / 1024. AS NUMERIC(12, 2)) AS size_mb,
  CAST(DATEDIFF(millisecond, @dt, CURRENT_TIMESTAMP)/1000. 
   AS DECIMAL(12,3))
    AS duration_sec
FROM fn_dblog(NULL, NULL) AS L
WHERE AllocUnitName = 'Sales.SalesOrderDetail' 
OR AllocUnitName LIKE 'Sales.SalesOrderDetail.%';

-- Breakdown of Log Record Types
SELECT Operation, Context,
  AVG([Log Record Length]) AS AvgLen, COUNT(*) AS Cnt
FROM fn_dblog(null, null) AS L
WHERE AllocUnitName = 'Sales.SalesOrderDetail' 
OR AllocUnitName LIKE 'Sales.SalesOrderDetail.%'
GROUP BY Operation, Context
ORDER BY Operation, Context;

With these results on my machine

386809 log records were created using 44MB of space and taking nearly 27 seconds.

Restore the backup and run the same query but this time with a truncate.

use master
go

restore database AdventureWorks2008
from disk = N'C:\MSSQL\SQLDataDumps\AdventureWorks2008.bak' 
with replace
go

USE AdventureWorks2008
GO

DECLARE @numrecords AS INT, @size AS BIGINT, @dt AS DATETIME;

SELECT 
  @numrecords = COUNT(*),
  @size       = COALESCE(SUM([Log Record Length]), 0),
  @dt         = CURRENT_TIMESTAMP
FROM fn_dblog(NULL, NULL) AS D
WHERE AllocUnitName = 'Sales.SalesOrderDetail' 
OR AllocUnitName LIKE 'Sales.SalesOrderDetail.%';

TRUNCATE TABLE [Sales].[SalesOrderDetail];

SELECT 
  COUNT(*) - @numrecords AS numrecords,
  CAST((COALESCE(SUM([Log Record Length]), 0) - @size)
    / 1024. / 1024. AS NUMERIC(12, 2)) AS size_mb,
  CAST(DATEDIFF(millisecond, @dt, CURRENT_TIMESTAMP)/1000. 
AS DECIMAL(12,3))
    AS duration_sec
FROM fn_dblog(NULL, NULL) AS L
WHERE AllocUnitName = 'Sales.SalesOrderDetail' 
OR AllocUnitName LIKE 'Sales.SalesOrderDetail.%';

-- Breakdown of Log Record Types
SELECT Operation, Context,
  AVG([Log Record Length]) AS AvgLen, COUNT(*) AS Cnt
FROM fn_dblog(null, null) AS L
WHERE AllocUnitName = 'Sales.SalesOrderDetail' 
OR AllocUnitName LIKE 'Sales.SalesOrderDetail.%'
GROUP BY Operation, Context
ORDER BY Operation, Context;

This time with these results:

 

Conclusion: The truncate is logged – as we can see it is all activity against PFS, IAM and GAM pages as we would expect for page de-allocations. It is also much faster and smaller than the delete with only 250 records on .02MB of space in less than a tenth of a second.

Posted in Database Theory, SQL Server, T-SQL | 1 Comment

Quick tip: Count table rows in a database, and locate those rows in the data files.

Here are a couple of quick database and table level queries that I have found useful lately.

How many rows in each table of a database?

The first temptation with this one was to use something like sp_MSforeachtable with a count(*), until I recalled that the catalog view sys.indexes has a row count as one of the columns. The view is database scoped so will return all indexes in the current database context. Books on line has the following index types as at SQL Server 2008R2:

0 = Heap

1 = Clustered

2 = Nonclustered

3 = XML

4 = Spatial

So if we filter on index types of 0 or 1 we will catch every table with no duplication.

Joining sys.indexes with sys.objects (which is also database scoped) allows us to filter out any system objects and just look at user tables, and also pick up some friendly names:

use <database name>
go
SELECT
	DB_NAME() as 'Database',
	SCHEMA_NAME(o.uid) as 'Schema',
    o.Name as 'Table'
    , i.Rows
FROM
    sysobjects o
    INNER JOIN sysindexes i
    ON o.id = i.id
WHERE
    type = 'U'	                --user table
    AND i.IndId in (0,1)	--heap, clustered index

Where are the rows stored?

If you want to dig a bit deeper and locate where particular rows are stored in the data files there are a couple of undocumented functions (2008+).

This query will return a column that contains the file number, row number and slot number separated by colons. (For this example to keep things general I’ve included all columns from the source table – you may choose to include as few or many columns as is useful)

SELECT 
	sys.fn_PhysLocFormatter (%%physloc%%) AS N'File:Page:Slot'
	, * 
	FROM <Schema>.<Table>;

The function sys.fn_PhysLocCracker returns the same information but is used slightly differently and will return the file, page and slot numbers as separate columns.

SELECT 
	files.file_id
	,files.page_id
	,files.slot_id
	, source.*
FROM <Schema>.<Table> as source 
CROSS APPLY sys.fn_PhysLocCracker(%%physloc%%) as files
Posted in Database options, Database Theory, SQL Server, T-SQL | Leave a comment

Querying database usage

A question that a DBA will often be asked is what databases are actively being used on my SQL Server? We get asked it all the time, usually as part of a consolidation or rationalisation project.

My solution to this is to query the sys.dm_db_index_usage_stats DMV, which maintains counters of index usage since the last SQL Server restart.

This query will provide aggregated details of user interactions with all databases on the instance:

select 
	db_name(database_id) as 'database'
	,sum(user_seeks) as 'user seeks'
	,sum(user_scans) as 'user scans'
	,sum(user_lookups) as 'user lookups'
	,sum(user_updates) as 'user updates'
	,max(last_user_seek) as 'last user seek'	
	,max(last_user_scan) as 'last user scan'	
	,max(last_user_lookup) as 'last user lookup'	
	,max(last_user_update) as 'last user update'	
from sys.dm_db_index_usage_stats
group by database_id

This query will compare the databases that have recorded index usages with the databases on the instance and return databases that appear to not have been used:

select db_name(database_id) 
from sys.databases 
where database_id not in (select database_id from sys.dm_db_index_usage_stats)
and db_name(database_id) not in ('master','msdb','model','tempdb')

A question that bugged me was – will operations against heaps still be registered in sys.dm_db_index_stats?

So to test this I create a database with a single table with no indexes – i.e. a heap:

create database heap_db
go
use heap_db
go
create table heap(id int, value varchar(255))

Confirm that there are no indexes on the table:

sp_helpindex 'heap'

So let’s put some data in and see what happens:

insert into heap
values (1,'some'),(2,'values')

And see if this registers (I’m filtering to just the database, otherwise on my test instance lots of other noise is returned by this query)

And what about retrieving data?

select * from heap

And the result is a scan as we might expect:

Clean up:

use master
go
drop database heap_db

So that means that queries of sys.dm_db_index_stats will pick up heaps as well as indexed databases and can give us a good idea of database usage.

There are still a couple of caveats:

-The results are only valid from last SQL Server restart.

-Closing a database will clear any rows in the DMV for that database – i.e. autoclose databases might not get picked up correctly.

 

Posted in Database options, SQL Server, T-SQL | Leave a comment

Quick tip: Querying database metadata with DBCC DBINFO

Page 9 in file 1 (of the primary filegroup) in a SQL Server database is the header page and contains a wealth of metadata about the database. We can have a look at this with DBCC PAGE or DBCC DBINFO.

In order to redirect the output of the DBCC command to the results we first need to enable trace flag 3604.

DBCC TRACEON(3604)

We can then look at the header page with:

DBCC PAGE(<database name>,1,9,3)

Where 1 is the file number, 9 the page number and 3 the dump style (possible values for this parameter are 1,2 or 3).

Another option is to use this:

DBCC DBINFO(<database name>)

You can run this command without a database name parameter and it will return the header page for the current database.

You can also redirect the output in table form with the tableresults option:

DBCC DBINFO(<database name>) with tableresults

(You don’t need to run the trace flag when using tableresults).

This then naturally leads to using a table variable, loading it with the DBCC output and then having the ability to query the set.

When was the last log backup?

declare @dbinfo table
(
ParentObject varchar(255),
[Object] varchar(255),
[Field] varchar(255),
[Value] varchar(255))
insert into @dbinfo
execute('dbcc dbinfo(''<database name>'') with tableresults')
select Field, Value from @dbinfo
where Field = 'dbi_LastLogBackupTime'

When was the last clean DBCC check?

declare @dbinfo table
(
ParentObject varchar(255),
[Object] varchar(255),
[Field] varchar(255),
[Value] varchar(255))
insert into @dbinfo
execute('dbcc dbinfo(''<database name>'') with tableresults')
select Field, Value from @dbinfo
where Field = 'dbi_dbccLastKnownGood'

Caveat: DBCC PAGE and DBCC DBINFO are commonly used but they are undocumented and unsupported commands.

Posted in Database options, SQL Server, T-SQL | Tagged , | Leave a comment

Quick tip: Using DBCC SQLPERF(‘LogSpace’)

A lot of a DBA’s time is spent managing and investigating transaction log files.

A commonly used tool is the command:

DBCC SQLPERF('LogSpace')

which returns a list of all databases, the size of their transaction log file, and the percentage of that space that is used for active transactions. One limitation of the command is that the results are returned in no particular order – on an instance with a large number of databases this can be difficult to read.

So I wrote a quick helper script to load the data into a temporary table allowing me to use a where clause to filter the results.

It’s not rocket science – but it is handy:

declare @LogSpace table
(
DatabaseName varchar(255),
[Log Size (MB)] float,
[Log Space Used (%)] float,
[Status] int)
insert into @LogSpace
execute('dbcc sqlperf(''LogSpace'')')

select * from @LogSpace
--where DatabaseName = '' --use for a particular database
--order by [Log Size (MB)] desc --find the biggest log file
--order by [Log Space Used (%)] desc --find the fullest log file
Posted in SQL Server, T-SQL | 2 Comments