Filtered Indexes Work-Around

Recently, I needed to create a stored procedure that queried a rather large table. The table has a filtered index on a date column, and it covers the query. However, the Query Optimizer was not using the index, which was increasing the execution time (not to mention IO!) by at least 10x. This wasn’t the first time I’ve had the Optimizer fail to use a filtered index. Normally when this happens, I use a table hint to force the filtered index — after I verify that it is indeed faster, of course. However, since this was a stored procedure, I was receiving the following error message whenever I tried to execute the proc:

Query processor could not produce a query plan because of the hints defined in this query. Resubmit the query without specifying any hints and without using SET FORCEPLAN.

SQL Server would not allow me to execute the stored procedure using the filtered index hint. If I removed the hint, it executed, but it used a different, non-covering and far more expensive index. For those of you not familiar with this issue, allow me to illustrate the problem.

First, create a table to play with and populate it with some bogus data:

Create Table dbo.filteredIndexTest
(
      myID   int Identity(1,3)
    , myDate smalldatetime 
    , myData char(100)
 
    Constraint PK_filteredIndexTest
        Primary Key Clustered(myID)
);
 
Set NoCount On;
Declare @date smalldatetime = '2010-01-01';
 
While @date < '2010-02-01'
Begin
 
    Insert Into dbo.filteredIndexTest
    (
          myDate
        , myData
    )
    Select @date
        , 'Date: ' + Convert(varchar(20), @date, 102);
 
    Set @date = DateAdd(minute, 1, @date);
 
End;
 
Select Count(*) From dbo.filteredIndexTest;

It looks like this will generate 44,640 rows of test data… plenty enough for our purposes. Now, let’s create our filtered index and write a query that will use it:

Create NonClustered Index IX_filteredIndexTest_1
    On dbo.filteredIndexTest(myDate)
    Include (myData)
    Where myDate >= '2010-01-27';
 
Select Distinct myData
From dbo.filteredIndexTest
Where myDate >= '2010-01-28';

If you look at the execution plan for this query, you’ll notice that the Optimizer is using the filtered index. Perfect! Now let’s parameterize it.

Declare @myDate1 smalldatetime = '2010-01-28';
 
Select Distinct myData
From dbo.filteredIndexTest
Where myDate >= @myDate1;

Uh oh. Looking at the execution plan, we see that SQL Server is no longer using the filtered index. Instead, it’s scanning the clustered index! Why is this? There’s actually a good explanation for it. The reason is that I could, in theory, pass a date to my parameter that fell outside of the filtered date range. If that’s the case, then SQL Server could not utilize the filtered index. Personally, I think it’s a bug and SQL Server should identify whether or not a filtered index could be used based on the actual value submitted, but… that’s a whole other blog post. :)

So what can we do? Well, dynamic SQL may be able to help us out in this case. Let’s give it a go. First, let’s try parameterized dynamic SQL.

Declare @mySQL1 nvarchar(2000)
    , @myParam nvarchar(2000) = '@p_myDate2 smalldatetime'
    , @myDate2 smalldatetime = '2010-01-28';
 
Set @mySQL1 = 'Select Distinct myData
              From dbo.filteredIndexTest
              Where myDate >= @p_myDate2';
 
Execute sp_executeSQL @mySQL1, @myParam, @p_myDate2 = @myDate2;

Looking at the execution plan, we see we’re still scanning on the clustered index. This is because the parameterized dynamic SQL resolves to be the exact same query as the one above it. Let’s try unparameterized SQL instead:

Declare @mySQL2 nvarchar(2000)
    , @myDate3 smalldatetime = '2010-01-28';
 
Set @mySQL2 = 'Select Distinct myData
              From dbo.filteredIndexTest
              Where myDate >= ''' + Cast(@myDate3 As varchar(20)) + '''';
 
Execute sp_executeSQL @mySQL2;
 
-- Drop Table dbo.filteredIndexTest;

Voila! We have a seek on our filtered index. Why? Because the statement resolves to be identical to our first query, where we hard-coded the date value in the WHERE clause.

Now, I want to stress this fact: you should always, ALWAYS use parameterized dynamic SQL whenever possible. Not only is it safer, but it’s also faster, because it can reuse cached plans. But sometimes you just cannot accomplish the same tasks with it. This is one of those times. If you do end up needing to use unparameterized dynamic SQL as a work-around, please make sure you’re validating your input, especially if you’re interfacing with any sort of external source.

There’s an even easier work-around for this problem that Dave (http://www.crappycoding.com) shared with me: recompile.

Adding “Option (Recompile)” to the end of your statements will force the Optimizer to re-evaluate which index will best meet the needs of your query every time the statement is executed. More importantly, it evaluates the plan based on the actual values passed to the parameter… just like in our hard-coded and dynamic SQL examples. Let’s see it in action:

DECLARE @myDate4 SMALLDATETIME = '2010-01-28';
 
SELECT DISTINCT myData
FROM dbo.filteredIndexTest 
WHERE myDate >= @myDate4
OPTION (RECOMPILE);
 
DECLARE @myDate5 SMALLDATETIME = '2010-01-20';
 
SELECT DISTINCT myData
FROM dbo.filteredIndexTest 
WHERE myDate >= @myDate5
OPTION (RECOMPILE);

If we look at the execution plans for the 2 queries above, we see that the first query seeks on the filtered index, and the second query scans on the clustered index. This is because the second query cannot be satisfied with the filtered index because we initially limited our index to dates greater than or equal to 1/27/2010.

There are, of course, trade-offs associated with each approach, so use whichever one best meets your needs. Do you have another work-around for this issue? If so, please let me know. :)

Update:

Alex Kuznetsov (http://www.simple-talk.com/author/alex-kuznetsov/) shared this method too:

DECLARE @myDate1 SMALLDATETIME = '2010-01-28';
 
SELECT DISTINCT myData
FROM dbo.filteredIndexTest
WHERE myDate = @myDate1
AND myDate >= '2010-01-27';

Like the other examples, this will result in an index seek on the filtered index. Basically, by explicitly declaring the start date of your filter, you’re letting the Optimizer know that the filtered index can satisfy the request, regardless of the parameter value passed. Thanks for the tip, Alex! :)

Filtered Indexes: What You Need To Know

Filtered indexes are probably my favorite feature in 2008. That’s saying a lot, since there are so many great new features to choose from. In this post, I want to explore a little about how filtered indexes work, how they can be applied, and some of the “gotchas” to be aware of.

First, for those of you who may not yet know about filtered indexes, allow me enlighten you. In short, filtered indexes allow you to create an index on a subset of data using a filtering predicate. Filters can only be applied to non-clustered indexes. The general syntax of a filtered index is:

Create NonClustered Index [index_name]
On [table_name] ([column_list])
Include ([column_list])
Where [filtered_criteria];

For our purposes, we’re going to be working with the Sales.SalesOrderDetail table in the AdventureWorks database. Let’s look at a specific example. Suppose we have a query that regularly searches on the [SpecialOfferID] column.

Select SalesOrderID
    , Count(*) As 'CountOfLineItem'
    , Sum(LineTotal) As 'SumOfLineTotal'
From Sales.SalesOrderDetail
Where SpecialOfferID <> 1
Group By SalesOrderID;

We notice that there’s no covering index for this query by looking at the actual execution plan:

Query Plan - Clustered Scan

Query Plan - Clustered Scan

If this is a commonly executed query, then we’d probably want to toss an index on it. Before we get started, let’s take a look at what the distribution of values are on that column:

Select SpecialOfferID
    , Count(*) As 'rows'
From Sales.SalesOrderDetail
Group By SpecialOfferID
Order By Count(*) Desc;

Our distribution of values is:

SpecialOfferID rows
-------------- -----------
1              115884
2              3428
3              606
13             524
14             244
16             169
7              137
8              98
11             84
4              80
9              61
5              2

As you can see, [SpecialOfferID] = 1 accounts for 96% of our values. In 2005, we’d create an index that may look something like this:

Create NonClustered Index IX_Sales_SalesOrderDetail_SpecialOfferID
    On Sales.SalesOrderDetail(SpecialOfferID)
    Include (SalesOrderID, LineTotal);

Now if we re-run our original query, this is what we see:

Indexed Query Plan

Indexed Query Plan

So we’re now performing a non-clustered index seek instead of a clustered index scan. Already this results in some pretty significant performance improvements. To see this, we’re going to use the INDEX query hint to force an index scan. We’re also going to use the DBCC command DROPCLEANBUFFERS, which will allow us to clear the buffer cache and better examine what’s happening with our IO.

Set Statistics IO On;
 
DBCC DropCleanBuffers;
 
Select SalesOrderID
    , Count(*) As 'CountOfLineItem'
    , Sum(LineTotal) As 'SumOfLineTotal'
From Sales.SalesOrderDetail With 
    (Index(PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID))
Where SpecialOfferID <> 1
Group By SalesOrderID;
 
DBCC DropCleanBuffers;
 
Select SalesOrderID
    , Count(*) As 'CountOfLineItem'
    , Sum(LineTotal) As 'SumOfLineTotal'
From Sales.SalesOrderDetail
Where SpecialOfferID <> 1
Group By SalesOrderID;
 
Set Statistics IO Off;
Clustered Index Scan:
Table 'SalesOrderDetail'. Scan count 1, logical reads 1240, physical reads 17, read-ahead reads 1242...
 
NonClustered Index Seek:
Table 'SalesOrderDetail'. Scan count 2, logical reads 30, physical reads 4, read-ahead reads 480...

As you can see, the non-clustered (NC) index seek performs quite a bit better. Now let’s create a filtered index and explore what happens:

Create NonClustered Index FIX_Sales_SalesOrderDetail_SpecialOfferID_Filtered
    On Sales.SalesOrderDetail(SalesOrderID)
    Include (LineTotal)
    Where SpecialOfferID <> 1;

First, let’s look at the pages consumed by each index:

SELECT i.name, ddips.index_depth, ddips.index_level
    , ddips.page_count, ddips.record_count
FROM sys.indexes AS i
Join sys.dm_db_index_physical_stats(DB_ID(), 
    OBJECT_ID(N'Sales.SalesOrderDetail'), Null, Null, N'Detailed') AS ddips
    ON i.OBJECT_ID = ddips.OBJECT_ID
    And i.index_id = ddips.index_id
WHERE i.name In ('IX_Sales_SalesOrderDetail_SpecialOfferID'
    , 'FIX_Sales_SalesOrderDetail_SpecialOfferID_Filtered'
    , 'PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID')
    AND ddips.index_level = 0;
name                                                       index_depth index_level page_count  record_count
---------------------------------------------------------- ----------- ----------- ----------- --------------------
PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID        3           0           1234        121317
IX_Sales_SalesOrderDetail_SpecialOfferID                   3           0           480         121317
FIX_Sales_SalesOrderDetail_SpecialOfferID_Filtered         2           0           19          5433

If you scroll over, you’ll see that the clustered index consumes the most pages, naturally. The non-filtered NC index consumes less pages than the clustered index because it’s narrower; however, it still consumes more pages than the filtered index because it’s storing every data row. The filtered index, with only 5433 rows stored, is by far our smallest index, consuming 96% less space than our non-filtered NC index.

Because we’re using less space to store this index, we should also see an equivalent performance boost. Let’s verify that this is the case:

Set Statistics IO On;
 
DBCC DropCleanBuffers;
 
Select SalesOrderID
    , Count(*) As 'CountOfLineItem'
    , Sum(LineTotal) As 'SumOfLineTotal'
From Sales.SalesOrderDetail With (Index(IX_Sales_SalesOrderDetail_SpecialOfferID))
Where SpecialOfferID <> 1
Group By SalesOrderID;
 
DBCC DropCleanBuffers;
 
Select SalesOrderID
    , Count(*) As 'CountOfLineItem'
    , Sum(LineTotal) As 'SumOfLineTotal'
From Sales.SalesOrderDetail
Where SpecialOfferID <> 1
Group By SalesOrderID;
 
Set Statistics IO Off;
NonClustered Index Seek:
Table 'SalesOrderDetail'. Scan count 2, logical reads 30, physical reads 4, read-ahead reads 480
 
Filtered Index Scan:
Table 'SalesOrderDetail'. Scan count 1, logical reads 24, physical reads 2, read-ahead reads 22
Filtered Query Plan

Filtered Query Plan

As expected, we get the best results with our filtered index scan.

You’ll notice that I did *not* create the index on the [SpecialOfferID] column like I did in [IX_Sales_SalesOrderDetail_SpecialOfferID]. This is because my query doesn’t care what my [SpecialOfferID] value is, just as long as it’s not equal to 1. My non-filtered NC index was created on [SpecialOfferID] because it needed to navigate the B-TREE to find the records where [SpecialOfferID] <> 1. With my filtered index, the query optimizer knows that all of my records already meet the criteria, so doesn’t need to navigate through the index to find the matching results.

We could choose to include the [SpecialOfferID] data in our filtered index, but we’d most likely want to make it an included column rather than part of the index key. In fact, it’s important to note that, if I don’t add [SpecialOfferID] as an included column and I want to return it in the results, i.e.

Select SalesOrderID
    , SpecialOfferID
    , Count(*) As 'CountOfLineItem'
    , Sum(LineTotal) As 'SumOfLineTotal'
From Sales.SalesOrderDetail
Where SpecialOfferID <> 1
Group By SalesOrderID
    , SpecialOfferID;

my filtered index will not be used and I will instead scan on the clustered index once more (assuming [IX_Sales_SalesOrderDetail_SpecialOfferID] does not exist). This is because the filtering criteria is not included anywhere on the actual index page. This is actually good news, in my opinion, since it allows you to create even leaner indexes. And like I already mentioned, if you do need the data returned, you can always add the filtering criteria as included columns.

What if you’re trying to find out whether or not an index is filtered, and what it’s filtered on? The sys.indexes catalog view has been updated in 2008 to include this information:

Select name, has_filter, filter_definition
From sys.indexes 
Where name In ('IX_Sales_SalesOrderDetail_SpecialOfferID'
    , 'FIX_Sales_SalesOrderDetail_SpecialOfferID_Filtered'
    , 'PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID');
name                                                   has_filter filter_definition
------------------------------------------------------ ---------- -------------------------
FIX_Sales_SalesOrderDetail_SpecialOfferID_Filtered     1          ([SpecialOfferID]<>(1))
IX_Sales_SalesOrderDetail_SpecialOfferID               0          NULL
PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID    0          NULL

I personally recommend Kimberly Tripp’s system stored proc, sp_helpindex2. It returns a lot of good information about your indexes, such as included columns and filtering criteria.

That’s all I have for today. Hopefully, you now understand how powerful filtered indexes can be. When used properly, filtered indexes can use less space, consume less IO, and improve overall query performance.