Disposable Indexes
Today I had to run an ad hoc query on a 8.5 billion row table. The table had a dozen columns of a variety of data types and was clustered on a bigint identity. There were no other indexes on the table. My query involved a join to a smaller table with a date range restriction. Without an adequate index to use, SQL Server was going to be forced to scan this 8.5 billion row table. Now, I don't have much patience for waiting for long running queries. I want to run the ad hoc, e-mail the results, and forget about it. But short of adding a nonclustered index, which would take a very long time to build and probably require additional space requisitioned from the SAN team, what could I do? Enter disposable indexes. Now, you might be asking yourself, "What the frilly heck does she mean by a disposable index? Is that new in Denali?" No, dear reader. I am actually referring to filtered indexes, which is available in SQL Server 2008 and 2008 R2. I call them "disposable" because I create them to significantly speed up ad hoc queries, then I drop them when I'm done.
Here, allow me to demonstrate using the AdventureWorks2008R2 database. Although the tables are smaller, this query is very similar in structure to what I needed to run today.
Select Count(Distinct sod.SalesOrderID) As 'distinctCount' From AdventureWorks2008R2.Sales.SalesOrderDetail As sod Join AdventureWorks2008R2.Production.Product As p On sod.ProductID = p.ProductID Where sod.ModifiedDate Between '2008-01-01' And '2008-07-31' And p.MakeFlag = 0;
Now, let's take a look at the type of indexes we currently have available:
Select name, has_filter, filter_definition From sys.indexes Where object_id = object_id('Sales.SalesOrderDetail');
name has_filter filter_definition ----------------------------------------------------------------------------------------- PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID 0 NULL AK_SalesOrderDetail_rowguid 0 NULL IX_SalesOrderDetail_ProductID 0 NULL (3 row(s) affected)
We need an index on ModifiedDate and ProductID, which it doesn't look like we have currently. Without this, we're going to end up scanning on the clustered index. That means SQL Server will have to evaluate each and every single row in the table to see if the row matches the criteria of our query. Not pretty, and certainly not fast. So instead, let's create a filtered index on date. But we can greatly speed up the time it takes to create our filtered index by doing a little investigating upfront and finding a range of clustering key values that will cover the query. Doing this allows SQL Server to seek on the clustered index, greatly reducing the amount of reads necessary to create our filtered index. So let's see this in action. First, let's find out the current max value of the table:
Select Max(SalesOrderDetailID) As 'maxID' From AdventureWorks2008R2.Sales.SalesOrderDetail;
maxID ----------- 121317
Now we get to do a little guessing. Let's go back and see what date we get if we look at half of the records:
Select SalesOrderDetailID, ModifiedDate From AdventureWorks2008R2.Sales.SalesOrderDetail Where SalesOrderDetailID = (121317/2);
SalesOrderDetailID ModifiedDate ------------------ ----------------------- 60658 2007-11-01 00:00:00.000
Okay, SalesOrderDetailID 60658 gets us back to 11/1/2007. That's a little too far. Let's see how a SalesOrderDetailID value of 75000 does...
Select SalesOrderDetailID, ModifiedDate From AdventureWorks2008R2.Sales.SalesOrderDetail Where SalesOrderDetailID = 75000;
SalesOrderDetailID ModifiedDate ------------------ ----------------------- 75000 2007-12-27 00:00:00.000
Okay, SalesOrderDetailID 75000 takes us back to 12/27/2007. That's close enough to 1/1/2008 for my purposes. Of course, depending on the size of the table, in real life it may make sense to try to get closer to the value you're looking for. But for now, this will do. And because we're looking for data through the "current date" (7/31/2008 in the AdventureWorks2008R2 database), we already know our outer limit is 121317.
Now let's take these ranges and create a filtered index that will cover our query:
Create Nonclustered Index IX_SalesOrderDetail_filtered On Sales.SalesOrderDetail(ModifiedDate, ProductID) Include (SalesOrderID) Where SalesOrderDetailID >= 75000 And SalesOrderDetailID < 121317;
By having this range identified, SQL Server can perform a seek on the clustered index to create the nonclustered index on just the subset of records that you need for your query. Remember that 8.5 billion row table I mentioned earlier? I was able to create a filtered index that covered my query in 10 seconds. Yes, that's right... 10 SECONDS.
The last thing we need to do is include our filtered index definition in our ad hoc query to ensure that the filtered index is used. It also doesn't hurt to explicitly tell SQL Server to use the filtered index if you're absolutely sure it's the best index for the job.
Select Count(Distinct sod.SalesOrderID) As 'distinctCount' From AdventureWorks2008R2.Sales.SalesOrderDetail As sod With (Index(IX_SalesOrderDetail_filtered)) Join AdventureWorks2008R2.Production.Product As p On sod.ProductID = p.ProductID Where sod.ModifiedDate Between '2008-01-01' And '2008-07-31' And p.MakeFlag = 0 And sod.SalesOrderDetailID >= 75000 And sod.SalesOrderDetailID < 121317;
That's all there is to it. Using this method, I was able to complete my ad hoc request in 40 seconds: 10 seconds to create the filtered index and 30 seconds to actually execute the ad hoc. Of course, it also took a couple of minutes to write the query, look at existing indexes, and search for the correct identity values. All in all, from the time I received the request to the time I send the e-mail was about 5 minutes. All because of disposable filtered indexes. How's that for some SQL #awesomesauce?
Categories
- Business Intelligence
- Internals
- Miscellaneous
- PASS
- Performance & Tuning
- Presentations
- SQL 2008
- SQL Tips
- Syndication
- T-SQL Scripts
Subscribe to my blog!
| Like what you see? Subscribe! |
![]() |
Around the Web
Recent Tweets
- [New Blog Post] A Question For My Blog Readers http://t.co/QPwN5yGw #SQLServer #Teradata
- GoDaddy is looking for a rockstar Teradata expert. Great opportunity on a high-visibility project. Come work w/ me! :) http://t.co/NE2EHnkf
- @datachick grass is always greener on the other side. I think that's why my daughter prefers to eat broccoli off my plate instead of hers
Archives
- February 2012
- November 2011
- October 2011
- September 2011
- July 2011
- June 2011
- May 2011
- April 2011
- January 2011
- December 2010
- November 2010
- October 2010
- September 2010
- July 2010
- June 2010
- May 2010
- April 2010
- February 2010
- January 2010
- November 2009
- October 2009
- September 2009
- August 2009
- July 2009
- June 2009
- May 2009
- April 2009
- March 2009
- February 2009
- January 2009
- December 2008
- November 2008
- October 2008



December 15th, 2010 - 13:56
Hey Michelle,
I also encountered a similar problem, and here is my take at it:
http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/07/13/optimizing-yet-another-query-that-involves-highly-correlated-columns.aspx
What do you think?
December 15th, 2010 - 16:48
Seems like a really neat solution when you have a guarantee that column values outside the identity range chosen are guaranteed to not have values you are restricting against. Need to add this to my bag of tricks…
December 16th, 2010 - 23:46
Nice!
Will really come in useful.
Thanks
December 30th, 2010 - 14:55
I’m just impressed. I think I have to upgrade my SQL2k to 2k8+… (hehe)
I also have a similar problem, but the index creation fails…
Let’s see this in work after I’ve upgraded…
Thanks.
January 5th, 2011 - 21:04
Awesome! If you don’t mind me asking, how many drives support this db with 8.5 billion rows and how many partitions does it have?
January 6th, 2011 - 07:24
The data sits on a SAN. I’m not sure how the SAN is architected. We do have a weekly partitioning scheme with an annual filegroup strategy. Hope that helps.
January 6th, 2011 - 20:44
Thanks Michelle! You have a great blog with excellent info!
February 21st, 2011 - 15:07
Great entry, thanks!
March 1st, 2011 - 13:41
Michelle, great blog, but I have a question: Why not just add a term to the original query’s WHERE clause, such as: AND SalesOrderID > 75000 AND SalesOrderID < 121317 ?
It seems to me you can take advantage of your knowledge about the relationship between SalesOrderID and ModifiedDate directly in the original query, and skip the index altogether.
Am I missing something?
March 3rd, 2011 - 07:30
Hi Colin,
Good question! It really depends on the type of ad hocs, and whether or not you can hit the table just once (i.e. do you KNOW what you need, or are you doing more investigative work?). But yes, you’re exactly right… in some cases you can just skip the filtered index altogether.
Thanks for reading.
Michelle
February 20th, 2012 - 20:14
I have been working with SQL Server since 2002 and have come across various professional bloggers dedicated to SQL server, this is on such blog which has very practical scenrios discussed and explained in detail, not many organisation ask questions such as Data Utilisation etc, which are quite critical now and this blog does it quite effortlessly. Kudos! and please add me to your fan base. Look forward to more such real world tips and tricks.