Wildcards and Indexes

I was asked a good question the other day, one that I wasn’t sure of the answer. The question was “does SQL Server use an index if you have a wildcard in the search predicate?”

My initial thought was it would if the wildcard was not the first character but I thought it would be worth testing.

I wanted a nice simple table to use for testing so I grabbed an Id and first, middle and last names from one of the AdventureWorks sample databases and ported them into a new table in my sandpit test database:

select 
	BusinessEntityID,
	FirstName,
	MiddleName,
	LastName 
into sandpit.dbo.tbl_WildCardTest
from AdventureWorks2012.Person.Person

Have a look at what we've got:

select * from sandpit.dbo.tbl_WildCardTest

On my SQL Server 2012 instance with the AdventureWorks2012 database this returns 19972 rows.

Create a non-clustered index on the LastName column:

use sandpit
go

create nonclustered index ix_WildCardTest 
on tbl_WildCardTest(LastName)

Verify:

select * from sys.indexes 
where [object_id] = object_id('dbo.tbl_WildCardTest')

Okay, I've had a quick scan through and found a nice selective last name - one that under normal circumstances will use our non clustered index.

select * 
from sandpit.dbo.tbl_WildCardTest
where LastName like 'Dievendorff'

This returns a single row and the execution plan as you would expect has an index seek against the non clustered index followed by a RID lookup on the heap.

So switch on show actual execution plan by pressing [ctrl]+[m] or using this button in the GUI.

ExecutionPlanButton

And run a range of queries with a wildcard at the start, at the end, and both:

select * from sandpit.dbo.tbl_WildCardTest
where LastName like 'Dievendorf%'

select * from sandpit.dbo.tbl_WildCardTest
where LastName like '%ievendorff'

select * from sandpit.dbo.tbl_WildCardTest
where LastName like '%ievendorf%'

And check the plans.

ExecutionPlans1

So when the wild card is not at the start of the search predicate SQL Server chooses an index seek. When there is a wild card at the beginning of the predicate SQL Server needs to use a scan - the index is still chosen because a scan of a narrow index is lighter weight than a scan of the heap.

Even without the wild card at the start of the string eventually the selectivity of the predicate drops to the point that the index seek is abandoned in favour of a table scan.

select * from sandpit.dbo.tbl_WildCardTest
where LastName like 'Die%'

select * from sandpit.dbo.tbl_WildCardTest
where LastName like 'Di%'

ExecutionPlans2

What about if we run the same tests but this time limit the columns in the select list so that they are covered by the index?

select LastName from sandpit.dbo.tbl_WildCardTest
where LastName like 'Dievendorf%'

select LastName from sandpit.dbo.tbl_WildCardTest
where LastName like '%ievendorff'

select LastName from sandpit.dbo.tbl_WildCardTest
where LastName like '%ievendorf%'

ExecutionPlans3

So the way SQL Server chooses seeks or scans is the same - the difference is that now the query is covered so the RID lookup isn't needed.

And as we lose selectivity...

select LastName from sandpit.dbo.tbl_WildCardTest
where LastName like 'Die%'

select LastName from sandpit.dbo.tbl_WildCardTest
where LastName like 'Di%'

ExecutionPlans4

This time because the index covers the query SQL Server chooses an index seek over the table scan and will do so even for the least selective query:

select LastName from sandpit.dbo.tbl_WildCardTest
where LastName like 'D%'

Conclusion: SQL Server does often use an index when a wildcard is present in the search predicate.

An index seek can be used when the wildcard is not at the start of the string, and there are enough literal characters before the wild card for the selectivity to warrant a seek.

If the wild card is at the start of the string then a scan must be used - SQL Server will choose to scan the index rather than the table because the index is narrower.

If the index covers the query then a RID (or clustered key) look-up won't be required and SQL Server will choose an index seek for even the minimally selective predicate with a single literal character preceding the wildcard.

About these ads
This entry was posted in Database Theory, SQL Server, T-SQL. Bookmark the permalink.

One Response to Wildcards and Indexes

  1. Rue Meridin says:

    Can you compare these 2 an tell me why they both return the same number of rows, and in both I believe all the rows must be read from the table. But the seek is so much more efficient than the scan.

    select LastName from sandpit.dbo.tbl_WildCardTest
    where LastName like ‘%ievendorf%’

    select LastName from sandpit.dbo.tbl_WildCardTest
    where LastName like ‘[A-Z]%ievendorf%’

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s