A couple of weeks ago, I was working on a Microsoft PDW proof-of-concept (POC) and had to measure compression ratios. In order to do this, I fired up SSMS and wrote a little script. The script will iterate through all tables in a database and run the sp_estimate_data_compression_savings stored procedure. This will only work in SQL Server 2008+ versions running Enterprise edition.
If you're not familiar with this stored procedure, it basically will tell you what effect PAGE or ROW compression will have on your table/index/partition, etc. There are pro's and con's with compression. What I've tended to see is that compression has very positive results on space, IO, and query duration, with a negative impact on CPU and write speed. Like most things, it's a trade-off and the results will vary by environment, so I recommend you do some testing before you apply compression to all tables. I tend to use compression mostly for my historical tables and partitions and leave my recent data uncompressed. And, back to the script, I use this stored procedure to estimate the impact of compression and to determine whether to use PAGE or ROW compression. PAGE is a higher level of compression, which means it's also more expensive in terms of CPU, so if the difference between the two results is negligible, I'm more apt to just use ROW compression.
Now that my impromptu compression discussion is done, let's get to the actual script. One final word of caution, however. This is an IO intensive process, so you may want to run it after peak business hours.
SET NOCOUNT ON;
DECLARE @printOnly BIT = 0 -- change to 1 if you don't want to execute, just print commands
, @tableName VARCHAR(256)
, @schemaName VARCHAR(100)
, @sqlStatement NVARCHAR(1000)
, @tableCount INT
, @statusMsg VARCHAR(1000);
IF EXISTS(SELECT * FROM tempdb.sys.tables WHERE name LIKE '%#tables%')
DROP TABLE #tables;
CREATE TABLE #tables
(
database_name sysname
, schemaName sysname NULL
, tableName sysname NULL
, processed bit
);
IF EXISTS(SELECT * FROM tempdb.sys.tables WHERE name LIKE '%#compression%')
DROP TABLE #compressionResults;
IF NOT EXISTS(SELECT * FROM tempdb.sys.tables WHERE name LIKE '%#compression%')
BEGIN
CREATE TABLE #compressionResults
(
objectName varchar(100)
, schemaName varchar(50)
, index_id int
, partition_number int
, size_current_compression bigint
, size_requested_compression bigint
, sample_current_compression bigint
, sample_requested_compression bigint
);
END;
INSERT INTO #tables
SELECT DB_NAME()
, SCHEMA_NAME([schema_id])
, name
, 0 -- unprocessed
FROM sys.tables;
SELECT @tableCount = COUNT(*) FROM #tables;
WHILE EXISTS(SELECT * FROM #tables WHERE processed = 0)
BEGIN
SELECT TOP 1 @tableName = tableName
, @schemaName = schemaName
FROM #tables WHERE processed = 0;
SELECT @statusMsg = 'Working on ' + CAST(((@tableCount - COUNT(*)) + 1) AS VARCHAR(10))
+ ' of ' + CAST(@tableCount AS VARCHAR(10))
FROM #tables
WHERE processed = 0;
RAISERROR(@statusMsg, 0, 42) WITH NOWAIT;
SET @sqlStatement = 'EXECUTE sp_estimate_data_compression_savings '''
+ @schemaName + ''', ''' + @tableName + ''', NULL, NULL, ''PAGE'';' -- ROW, PAGE, or NONE
IF @printOnly = 1
BEGIN
SELECT @sqlStatement;
END
ELSE
BEGIN
INSERT INTO #compressionResults
EXECUTE sp_executesql @sqlStatement;
END;
UPDATE #tables
SET processed = 1
WHERE tableName = @tableName
AND schemaName = @schemaName;
END;
SELECT *
FROM #compressionResults;
The SQL blogosphere has been lit up with PASS Summit recaps.
I debated about whether or not to write my own post, until I remembered that this blog serves as a mini-journal for me too. I have a notoriously poor memory--my husband likes to say that my CPU and memory are good, but I must have an unusual clustering strategy--so maybe this blog post will be a good pointer for me when I start prepping for next year's Summit.
This was definitely the best PASS Summit conference ever. While there will always be opportunities to do things better--improvement is a never-ending process--it was clear that the organizers of this event listened to the feedback they had received the previous year. One of the best changes? Backpacks. These were very useful, as evidenced by their presence everywhere. Nice job, organizers!
My absolute favorite thing about Summit is the chance to meet and reconnect with so many amazing SQL folks. There were entirely too many people to list out, but some highlights include meeting Crys Manson, Jorge Segarra, and Karen Lopez for the first time. I also had a chance encounter with Ola Hallengren in the Sheraton elevator. Apparently we were only staying a few rooms apart this year. We ended up having a couple of really great discussions about index fragmentation, the differences between our scripts, and things we'd like to see changed in future releases of SQL Server.
I had the opportunity to sit on the panel at the WIT luncheon. All of the women on the panel were amazing, and I was honored just to be sitting at the same table as them. I was especially pleased to meet Nora Denzel, a Senior Vice President at Intuit. Intelligent, confident, and witty, she is a great role model for young technical women, myself included. I can only hope that some of her gumption rubbed off on me due to our close proximity.
After the event, I was pleasantly surprised by how many folks--men and women both--came up to me to tell me how much they enjoyed it. Thanks to the WIT VC for organizing another great event!
The lightning talk sessions were a new feature this year, and I think I like it. The format of the lightning session is 7 speakers presenting on a topic for 5 quick minutes. Watching these sessions is kind of like skipping right to the center of a tootsie pop: all content and no fluff. The standout lightning talk presentation for me was Adam Machanic's. It was beautifully rehearsed and choreographed. Nice job, Adam!
Another of the many highlights of the week was meeting the Microsoft execs. In addition to meeting Ted Kummert, Mark Souza, and Donald Farmer--all very nice gentlemen--I had the opportunity to speak at length with Jose Blakely about Parallel Data Warehouse (PDW). PDW, formerly codenamed Madison, was officially launched at Summit. Jose was kind enough to explain the PDW architecture, both where it came from and the vision for where it's going. I'd attempt to regurgitate it here, but I think the probability of me misquoting would be high.
Suffice it to say, this technology has me excited. Why? Quite frankly, I think PDW will do for data warehousing what SQL Server did for databases, and what Analysis Services did for BI: make it affordable. With a compelling cost-per-terabyte, an attractive scale-out approach, and an entry point at under $1 million, we'll see more small-to-midsized companies implementing data warehousing and business intelligence. This is good news for those of us looking for an affordable data warehouse solution and for those of us who make our living with SQL Server. And for those of you who might suggest that few companies need a datawarehouse that can support multi-terabyte data, I'd like to point out that just 3 or 4 years ago, 100 GB was considered a lot of data.
I spent most of my week digging into the PDW architecture. It's not all roses--it's a first release and, as such, is immature compared to the much older and more established data warehouse systems--but again, it has a lot going for it, not least of all it's easy integration within a SQL Server environment and the relatively low cost. We're currently investigating this as a possible data warehouse solution for our business intelligence environment, so expect to see more from me about PDW as I learn more about it.
Recently, I needed to create a stored procedure that queried a rather large table. The table has a filtered index on a date column, and it covers the query. However, the Query Optimizer was not using the index, which was increasing the execution time (not to mention IO!) by at least 10x. This wasn't the first time I've had the Optimizer fail to use a filtered index. Normally when this happens, I use a table hint to force the filtered index -- after I verify that it is indeed faster, of course. However, since this was a stored procedure, I was receiving the following error message whenever I tried to execute the proc:
Query processor could not produce a query plan because of the hints defined in this query. Resubmit the query without specifying any hints and without using SET FORCEPLAN.
SQL Server would not allow me to execute the stored procedure using the filtered index hint. If I removed the hint, it executed, but it used a different, non-covering and far more expensive index. For those of you not familiar with this issue, allow me to illustrate the problem.
First, create a table to play with and populate it with some bogus data:
Create Table dbo.filteredIndexTest
(
myID int Identity(1,3)
, myDate smalldatetime
, myData char(100)
Constraint PK_filteredIndexTest
Primary Key Clustered(myID)
);
Set NoCount On;
Declare @date smalldatetime = '2010-01-01';
While @date < '2010-02-01'
Begin
Insert Into dbo.filteredIndexTest
(
myDate
, myData
)
Select @date
, 'Date: ' + Convert(varchar(20), @date, 102);
Set @date = DateAdd(minute, 1, @date);
End;
Select Count(*) From dbo.filteredIndexTest;
It looks like this will generate 44,640 rows of test data... plenty enough for our purposes. Now, let's create our filtered index and write a query that will use it:
Create NonClustered Index IX_filteredIndexTest_1
On dbo.filteredIndexTest(myDate)
Include (myData)
Where myDate >= '2010-01-27';
Select Distinct myData
From dbo.filteredIndexTest
Where myDate >= '2010-01-28';
If you look at the execution plan for this query, you'll notice that the Optimizer is using the filtered index. Perfect! Now let's parameterize it.
Declare @myDate1 smalldatetime = '2010-01-28';
Select Distinct myData
From dbo.filteredIndexTest
Where myDate >= @myDate1;
Uh oh. Looking at the execution plan, we see that SQL Server is no longer using the filtered index. Instead, it's scanning the clustered index! Why is this? There's actually a good explanation for it. The reason is that I could, in theory, pass a date to my parameter that fell outside of the filtered date range. If that's the case, then SQL Server could not utilize the filtered index. Personally, I think it's a bug and SQL Server should identify whether or not a filtered index could be used based on the actual value submitted, but... that's a whole other blog post.
So what can we do? Well, dynamic SQL may be able to help us out in this case. Let's give it a go. First, let's try parameterized dynamic SQL.
Declare @mySQL1 nvarchar(2000)
, @myParam nvarchar(2000) = '@p_myDate2 smalldatetime'
, @myDate2 smalldatetime = '2010-01-28';
Set @mySQL1 = 'Select Distinct myData
From dbo.filteredIndexTest
Where myDate >= @p_myDate2';
Execute sp_executeSQL @mySQL1, @myParam, @p_myDate2 = @myDate2;
Looking at the execution plan, we see we're still scanning on the clustered index. This is because the parameterized dynamic SQL resolves to be the exact same query as the one above it. Let's try unparameterized SQL instead:
Declare @mySQL2 nvarchar(2000)
, @myDate3 smalldatetime = '2010-01-28';
Set @mySQL2 = 'Select Distinct myData
From dbo.filteredIndexTest
Where myDate >= ''' + Cast(@myDate3 As varchar(20)) + '''';
Execute sp_executeSQL @mySQL2;
-- Drop Table dbo.filteredIndexTest;
Voila! We have a seek on our filtered index. Why? Because the statement resolves to be identical to our first query, where we hard-coded the date value in the WHERE clause.
Now, I want to stress this fact: you should always, ALWAYS use parameterized dynamic SQL whenever possible. Not only is it safer, but it's also faster, because it can reuse cached plans. But sometimes you just cannot accomplish the same tasks with it. This is one of those times. If you do end up needing to use unparameterized dynamic SQL as a work-around, please make sure you're validating your input, especially if you're interfacing with any sort of external source.
There's an even easier work-around for this problem that Dave (http://www.crappycoding.com) shared with me: recompile.
Adding "Option (Recompile)" to the end of your statements will force the Optimizer to re-evaluate which index will best meet the needs of your query every time the statement is executed. More importantly, it evaluates the plan based on the actual values passed to the parameter... just like in our hard-coded and dynamic SQL examples. Let's see it in action:
DECLARE @myDate4 SMALLDATETIME = '2010-01-28';
SELECT DISTINCT myData
FROM dbo.filteredIndexTest
WHERE myDate >= @myDate4
OPTION (RECOMPILE);
DECLARE @myDate5 SMALLDATETIME = '2010-01-20';
SELECT DISTINCT myData
FROM dbo.filteredIndexTest
WHERE myDate >= @myDate5
OPTION (RECOMPILE);
If we look at the execution plans for the 2 queries above, we see that the first query seeks on the filtered index, and the second query scans on the clustered index. This is because the second query cannot be satisfied with the filtered index because we initially limited our index to dates greater than or equal to 1/27/2010.
There are, of course, trade-offs associated with each approach, so use whichever one best meets your needs. Do you have another work-around for this issue? If so, please let me know.
Update:
Alex Kuznetsov (http://www.simple-talk.com/author/alex-kuznetsov/) shared this method too:
DECLARE @myDate1 SMALLDATETIME = '2010-01-28';
SELECT DISTINCT myData
FROM dbo.filteredIndexTest
WHERE myDate = @myDate1
AND myDate >= '2010-01-27';
Like the other examples, this will result in an index seek on the filtered index. Basically, by explicitly declaring the start date of your filter, you're letting the Optimizer know that the filtered index can satisfy the request, regardless of the parameter value passed. Thanks for the tip, Alex!
For those of you who are using partitioning, or who are considering using partitioning, allow me to share some tips with you.
Easy Partition Staging Tables
Switching partitions (or more specifically, hobts) in and out of a partitioned table requires the use of a staging table. The staging table has very specific requirements: it must be completely identical to the partitioned table, including indexing structures, and it must have a check constraint that limits data to the partitioning range. Thanks to my co-worker Jeff, I've recently started using the SQL Server Partition Management tool on CodePlex. I haven't used the automatic partition switching feature -- frankly, using any sort of data modification tool in a production environment makes me nervous -- but I've been using the scripting option to create staging tables in my development environment, which I then copy to production for use. It's nothing you can't do yourself, but it does make the whole process easy and painless, plus it saves you from annoying typos. But be careful when using this tool to just create the table and check constraints automatically, because you may need to...
Add Check Constraints After Loading Data
Most of the time, I add the check constraint when I create the staging table, then I load data and perform the partition switch. However, for some reason, I was receiving the following error:
.Net SqlClient Data Provider: Msg 4972, Level 16, State 1, Line 1
ALTER TABLE SWITCH statement failed. Check constraints or partition function of source table 'myStagingTable' allows values that are not allowed by check constraints or partition function on target table 'myDestinationTable'.
This drove me crazy. I confirmed my check constraints were correct, that I had the correct partition number, and that all schema and indexes matched identically. After about 30 minutes of this, I decided to drop and recreate the constraint. For some reason, it fixed the issue. Repeat tests produced the same results: the check constraint needed to be added *after* data was loaded. This error is occurring on a SQL Server 2008 SP1 box; to be honest, I'm not sure what's causing the error, so if you know, please leave me a comment. But I figured I'd share so that anyone else running into this issue can hopefully save some time and headache.
Replicating Into Partitioned and Non-Partitioned Tables
Recently, we needed to replicate a non-partitioned table to two different destinations. We wanted to use partitioning for Server A, which has 2008 Enterprise; Server B, which is on 2005 Standard, could not take advantage of partitioning. The solution was really easy: create a pre-snapshot and post-snapshot script for the publication, then modify to handle each server group differently. Using pseudo-code, it looked something like this:
/* Identify which servers get the partitioned version */
If @@ServerName In ('yourServerNameList')
Begin
/* Create your partitioning scheme if necessary */
If Not Exists(Select * From sys.partition_schemes Where name = 'InsertPartitionScheme')
CREATE PARTITION SCHEME InsertPartitionScheme
AS PARTITION InsertPartitionFunction ALL TO ([PRIMARY]);
/* Create your partitioning function if necessary */
If Not Exists(Select * From sys.partition_functions Where name = 'InsertPartitionFunction')
CREATE PARTITION FUNCTION InsertPartitionFunction (smalldatetime)
AS RANGE RIGHT FOR VALUES ('insertValues');
/* Create a partitioned version of your table */
CREATE TABLE [dbo].[yourTableName] (
[yourTableSchema]
) ON InsertPartitionScheme([partitioningKey]);
End
Else
Begin
/* Create a non-partitioned version of your table */
CREATE TABLE [dbo].[yourTableName] (
[yourTableSchema]
) ON [Primary];
End
You could also use an edition check instead of a server name check, if you prefer. The post-snapshot script basically looked the same, except you create partitioned indexes instead.
Compress Old Partitions
Did you know you can set different compression levels for individual partitions? It's true! I've just completed doing this on our largest partitioned table. Here's how:
/* Apply compression to your partitioned table */
Alter Table dbo.yourTableName
Rebuild Partition = All
With
(
Data_Compression = Page On Partitions(1 to 9)
, Data_Compression = Row On Partitions(10 to 11)
, Data_Compression = None On Partitions(12)
);
/* Apply compression to your partitioned index */
Alter Index YourPartitionedIndex
On dbo.yourTableName
Rebuild Partition = All
With
(
Data_Compression = Page On Partitions(1 to 9)
, Data_Compression = Row On Partitions(10 to 11)
, Data_Compression = None On Partitions(12)
);
/* Apply compression to your unpartitioned index */
Alter Index YourUnpartitionedIndex
On dbo.yourTableName
Rebuild With (Data_Compression = Row);
A couple of things to note. In all of our proof-of-concept testing, we found that compression significantly reduced query execution time, reads (IO), and storage. However, CPU was also increased significantly. The results were more dramatic, both good and bad, with page compression versus row compression. Still, for our older partitions, which aren't queried regularly, it made sense to turn on page compression. The newer partitions receive row compression, and the newest partitions, which are still queried very regularly by routine processes, were left completely uncompressed. This seems to strike a nice balance in our environment, but of course, results will vary depending on how you use your data.
Something to be aware of is that compressing your clustered index does *not* compress your non-clustered indexes; those are separate operations. Lastly, for those who are curious, it took us about 1 minute to apply row compression and about 7 minutes to apply page compression to partitions averaging 30 million rows.
Looking for more information on table partitioning? Check out my overview of partitioning, my example code, and my article on indexing on partitioned tables.
If you've been following my blog for a little while, you'll know that I'm a fan of SQL Server internals. There's a lot that can be learned or better understood by rolling up your sleeves and getting into the nitty-gritty of data pages (i.e. see my post on Overhead in Non-Unique Clustered Indexes). So imagine how happy I was when my co-worker Jeff shared an undocumented function with me today that retrieves the file number, page number, and slot number of a single record. Very cool! Well, at least to me. So now let's see how you can use it.
The fn_physLocCracker function can be called in the following way:
Select Top 100 plc.*, soh.SalesOrderID
From Sales.SalesOrderHeader As soh
Cross Apply sys.fn_physLocCracker (%%physloc%%) As plc;
Results (just a sample):
file_id page_id slot_id SalesOrderID
----------- ----------- ----------- ------------
1 14032 0 43659
1 14032 1 43660
1 14032 2 43661
1 14032 3 43662
1 14032 4 43663
If you look at the sp_helptext for sys.fn_physLocCracker, %%physloc%% is apparently a virtual column that contains information on where the record is stored. In fact, you can even append %%physloc%% to your column list if you want to see how the information is stored. But for our purposes, we now have a file number, page number, and slot number. What do we do with it?
Well, you can use the investigation proc I wrote to retrieve the actual data page:
Execute dba_viewPageData_sp
@databaseName = 'AdventureWorks'
, @fileNumber = 1
, @pageNumber = 14032;
Results (just a sample):
Slot 0 Column 1 Offset 0x4 Length 4 Length (physical) 4
SalesOrderID = 43659
Slot 0 Column 2 Offset 0x8 Length 1 Length (physical) 1
RevisionNumber = 1
Slot 0 Column 3 Offset 0x9 Length 8 Length (physical) 8
OrderDate = 2001-07-01 00:00:00.000
Slot 0 Column 4 Offset 0x11 Length 8 Length (physical) 8
DueDate = 2001-07-13 00:00:00.000
Neat, huh? So why would you use it to look up the data page and file number when you can just pass the table name and index name to my proc and retrieve data pages? Well, my investigation proc will retrieve data pages for any index type -- the fn_physLocCracker function will only retrieve data for the clustered index -- but it will not retrieve the data page for a specific record. So just something to be aware of.
That's all for now. Back to the #24HoursOfPASS!
If you haven't heard, Microsoft released an update to Books Online for SQL Server 2008 yesterday. You can find the download here:
http://www.microsoft.com/downloads/details.aspx?FamilyID=765433f7-0983-4d7a-b628-0a98145bcb97&displaylang=en
Filtered indexes are probably my favorite feature in 2008. That's saying a lot, since there are so many great new features to choose from. In this post, I want to explore a little about how filtered indexes work, how they can be applied, and some of the "gotchas" to be aware of.
First, for those of you who may not yet know about filtered indexes, allow me enlighten you. In short, filtered indexes allow you to create an index on a subset of data using a filtering predicate. Filters can only be applied to non-clustered indexes. The general syntax of a filtered index is:
Create NonClustered Index [index_name]
On [table_name] ([column_list])
Include ([column_list])
Where [filtered_criteria];
For our purposes, we're going to be working with the Sales.SalesOrderDetail table in the AdventureWorks database. Let's look at a specific example. Suppose we have a query that regularly searches on the [SpecialOfferID] column.
Select SalesOrderID
, Count(*) As 'CountOfLineItem'
, Sum(LineTotal) As 'SumOfLineTotal'
From Sales.SalesOrderDetail
Where SpecialOfferID <> 1
Group By SalesOrderID;
We notice that there's no covering index for this query by looking at the actual execution plan:

Query Plan - Clustered Scan
If this is a commonly executed query, then we'd probably want to toss an index on it. Before we get started, let's take a look at what the distribution of values are on that column:
Select SpecialOfferID
, Count(*) As 'rows'
From Sales.SalesOrderDetail
Group By SpecialOfferID
Order By Count(*) Desc;
Our distribution of values is:
SpecialOfferID rows
-------------- -----------
1 115884
2 3428
3 606
13 524
14 244
16 169
7 137
8 98
11 84
4 80
9 61
5 2
As you can see, [SpecialOfferID] = 1 accounts for 96% of our values. In 2005, we'd create an index that may look something like this:
Create NonClustered Index IX_Sales_SalesOrderDetail_SpecialOfferID
On Sales.SalesOrderDetail(SpecialOfferID)
Include (SalesOrderID, LineTotal);
Now if we re-run our original query, this is what we see:

Indexed Query Plan
So we're now performing a non-clustered index seek instead of a clustered index scan. Already this results in some pretty significant performance improvements. To see this, we're going to use the INDEX query hint to force an index scan. We're also going to use the DBCC command DROPCLEANBUFFERS, which will allow us to clear the buffer cache and better examine what's happening with our IO.
Set Statistics IO On;
DBCC DropCleanBuffers;
Select SalesOrderID
, Count(*) As 'CountOfLineItem'
, Sum(LineTotal) As 'SumOfLineTotal'
From Sales.SalesOrderDetail With
(Index(PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID))
Where SpecialOfferID <> 1
Group By SalesOrderID;
DBCC DropCleanBuffers;
Select SalesOrderID
, Count(*) As 'CountOfLineItem'
, Sum(LineTotal) As 'SumOfLineTotal'
From Sales.SalesOrderDetail
Where SpecialOfferID <> 1
Group By SalesOrderID;
Set Statistics IO Off;
Clustered Index Scan:
Table 'SalesOrderDetail'. Scan count 1, logical reads 1240, physical reads 17, read-ahead reads 1242...
NonClustered Index Seek:
Table 'SalesOrderDetail'. Scan count 2, logical reads 30, physical reads 4, read-ahead reads 480...
As you can see, the non-clustered (NC) index seek performs quite a bit better. Now let's create a filtered index and explore what happens:
Create NonClustered Index FIX_Sales_SalesOrderDetail_SpecialOfferID_Filtered
On Sales.SalesOrderDetail(SalesOrderID)
Include (LineTotal)
Where SpecialOfferID <> 1;
First, let's look at the pages consumed by each index:
SELECT i.name, ddips.index_depth, ddips.index_level
, ddips.page_count, ddips.record_count
FROM sys.indexes AS i
Join sys.dm_db_index_physical_stats(DB_ID(),
OBJECT_ID(N'Sales.SalesOrderDetail'), Null, Null, N'Detailed') AS ddips
ON i.OBJECT_ID = ddips.OBJECT_ID
And i.index_id = ddips.index_id
WHERE i.name In ('IX_Sales_SalesOrderDetail_SpecialOfferID'
, 'FIX_Sales_SalesOrderDetail_SpecialOfferID_Filtered'
, 'PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID')
AND ddips.index_level = 0;
name index_depth index_level page_count record_count
---------------------------------------------------------- ----------- ----------- ----------- --------------------
PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID 3 0 1234 121317
IX_Sales_SalesOrderDetail_SpecialOfferID 3 0 480 121317
FIX_Sales_SalesOrderDetail_SpecialOfferID_Filtered 2 0 19 5433
If you scroll over, you'll see that the clustered index consumes the most pages, naturally. The non-filtered NC index consumes less pages than the clustered index because it's narrower; however, it still consumes more pages than the filtered index because it's storing every data row. The filtered index, with only 5433 rows stored, is by far our smallest index, consuming 96% less space than our non-filtered NC index.
Because we're using less space to store this index, we should also see an equivalent performance boost. Let's verify that this is the case:
Set Statistics IO On;
DBCC DropCleanBuffers;
Select SalesOrderID
, Count(*) As 'CountOfLineItem'
, Sum(LineTotal) As 'SumOfLineTotal'
From Sales.SalesOrderDetail With (Index(IX_Sales_SalesOrderDetail_SpecialOfferID))
Where SpecialOfferID <> 1
Group By SalesOrderID;
DBCC DropCleanBuffers;
Select SalesOrderID
, Count(*) As 'CountOfLineItem'
, Sum(LineTotal) As 'SumOfLineTotal'
From Sales.SalesOrderDetail
Where SpecialOfferID <> 1
Group By SalesOrderID;
Set Statistics IO Off;
NonClustered Index Seek:
Table 'SalesOrderDetail'. Scan count 2, logical reads 30, physical reads 4, read-ahead reads 480
Filtered Index Scan:
Table 'SalesOrderDetail'. Scan count 1, logical reads 24, physical reads 2, read-ahead reads 22

Filtered Query Plan
As expected, we get the best results with our filtered index scan.
You'll notice that I did *not* create the index on the [SpecialOfferID] column like I did in [IX_Sales_SalesOrderDetail_SpecialOfferID]. This is because my query doesn't care what my [SpecialOfferID] value is, just as long as it's not equal to 1. My non-filtered NC index was created on [SpecialOfferID] because it needed to navigate the B-TREE to find the records where [SpecialOfferID] <> 1. With my filtered index, the query optimizer knows that all of my records already meet the criteria, so doesn't need to navigate through the index to find the matching results.
We could choose to include the [SpecialOfferID] data in our filtered index, but we'd most likely want to make it an included column rather than part of the index key. In fact, it's important to note that, if I don't add [SpecialOfferID] as an included column and I want to return it in the results, i.e.
Select SalesOrderID
, SpecialOfferID
, Count(*) As 'CountOfLineItem'
, Sum(LineTotal) As 'SumOfLineTotal'
From Sales.SalesOrderDetail
Where SpecialOfferID <> 1
Group By SalesOrderID
, SpecialOfferID;
my filtered index will not be used and I will instead scan on the clustered index once more (assuming [IX_Sales_SalesOrderDetail_SpecialOfferID] does not exist). This is because the filtering criteria is not included anywhere on the actual index page. This is actually good news, in my opinion, since it allows you to create even leaner indexes. And like I already mentioned, if you do need the data returned, you can always add the filtering criteria as included columns.
What if you're trying to find out whether or not an index is filtered, and what it's filtered on? The sys.indexes catalog view has been updated in 2008 to include this information:
Select name, has_filter, filter_definition
From sys.indexes
Where name In ('IX_Sales_SalesOrderDetail_SpecialOfferID'
, 'FIX_Sales_SalesOrderDetail_SpecialOfferID_Filtered'
, 'PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID');
name has_filter filter_definition
------------------------------------------------------ ---------- -------------------------
FIX_Sales_SalesOrderDetail_SpecialOfferID_Filtered 1 ([SpecialOfferID]<>(1))
IX_Sales_SalesOrderDetail_SpecialOfferID 0 NULL
PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID 0 NULL
I personally recommend Kimberly Tripp's system stored proc, sp_helpindex2. It returns a lot of good information about your indexes, such as included columns and filtering criteria.
That's all I have for today. Hopefully, you now understand how powerful filtered indexes can be. When used properly, filtered indexes can use less space, consume less IO, and improve overall query performance.
It still surprises me how many people don't know about some of the very things that make my job so much easier. So this next post is dedicated to sharing some of the tweaks and tools I've run across that will help anyone who works with SQL:
Indexes
Anyone who uses included columns is probably well aware of the frustrations that can come from having to look up information on which columns are included. I wrote a stored procedure, dba_indexLookup_sp, to help me with this, before discovering sp_helpindex2. If you haven't heard of sp_helpindex2, it's a re-write of sp_helpindex by Kimberly Tripp. You can find it on Kimberly's blog. The main difference is Kimberly's is a system stored procedure (mine is not) and my version returns partitioning information (Kimberly's does not). Check both out and use whichever one meets your needs best.
KeyBoard ShortCuts
In SQL Server Management Studio (SSMS), click on:
Tools --> Options... --> Environment --> Keyboard

Keyboard Shortcuts
For your copying convenience:
Ctrl+3 Select Top 100 * From
Ctrl+4 sp_tables @table_owner = 'dbo'
Ctrl+5 sp_columns
Ctrl+6 sp_stored_procedures @sp_owner = 'dbo'
Ctrl+7 sp_spaceused
Ctrl+8 sp_helptext
Ctrl+9 dba_indexLookup_sp or sp_helpindex2
Please note that these settings will not take effect until you open a new query window. Here's an example of how you could use this: use Ctrl+4 to find a list of tables, then copy one into your query window; to view a sample of that table's data, highlight the table name (I usually double-click on it) and press Ctrl+3. It's a thing of beauty. Oh, and you may want to remove/change the schema filters if you use schemas other than dbo.
Query Execution Settings
After having one too many issues arise from non-DBA's connecting to the production environment to run a devastating ad hoc, I've had all of our developers and analysts adopt the following settings. The only thing difference between my setting and theirs is that I have "Set Statistics IO" selected. FYI - you can also make these same setting changes in Visual Studio.
In SQL Server Management Studio (SSMS), click on:
Tools --> Options... --> Query Execution --> SQL Server --> Advanced

Query Execution Settings
Copy Behavior
This next tip actually has nothing to do with SQL Server, and can be done with any Microsoft product. However, I just learned about it a few weeks ago and already I use it quite frequently.
Holding down "Alt" while you drag your mouse will change your selection behavior to block selection.

Block Selection
Please note: The following tools requires SQL 2008 Management Studio. These tools will also work when you connect SQL 2008 SSMS to a 2005 instance.
Object Detail Explorer
Finally, there's a reason to use the Object Detail Explorer! My favorite use is to quickly find the table size and row counts of all the tables in a database. If these options are not currently available, you may just need to right click on the column headers and add it to the display.

Object Detail Explorer
Missing Indexes
And lastly, when using SSMS 2008 to execute Display Estimated Query Plan (Ctrl+L), it will show you if you're missing any indexes. This will even work if you connect SSMS 2008 to SQL 2005!

Missing Index
That pretty much covers it for now. HTH!
Michelle
Thanks to everyone who left a comment or sent me an e-mail regarding the Index Defrag Script. I've received some great feedback and requests for features. I've also had some questions regarding how to use it, which I will answer at the end of this post.
Changes include:
- separate version for both Enterprise and Standard editions
- Standard edition removes partitioning and online options
- output option to see fragmentation levels
- page_count added to the log table
I've also verified that this script works well in SQL 2008.
Enterprise Version:
IF EXISTS(SELECT OBJECT_ID FROM sys.tables
WHERE [name] = N'dba_indexDefragLog')
BEGIN
DROP TABLE dbo.dba_indexDefragLog;
PRINT 'dba_indexDefragLog table dropped!';
END
CREATE TABLE dbo.dba_indexDefragLog
(
indexDefrag_id INT IDENTITY(1,1) NOT NULL
, objectID INT NOT NULL
, objectName NVARCHAR(130) NOT NULL
, indexID INT NOT NULL
, indexName NVARCHAR(130) NOT NULL
, partitionNumber smallint not null
, fragmentation FLOAT NOT NULL
, page_count INT NOT NULL
, dateTimeStart DATETIME NOT NULL
, durationSeconds INT NOT NULL
CONSTRAINT PK_indexDefragLog
PRIMARY KEY CLUSTERED (indexDefrag_id)
);
PRINT 'dba_indexDefragLog Table Created';
IF OBJECTPROPERTY(OBJECT_ID('dbo.dba_indexDefrag_sp'),
N'IsProcedure') IS Null
BEGIN
EXECUTE ('Create Procedure dbo.dba_indexDefrag_sp
As Print ''Hello World!''');
RAISERROR('Procedure dba_indexDefrag_sp created.'
, 10, 1);
END;
Go
SET ANSI_Nulls ON;
SET Ansi_Padding ON;
SET Ansi_Warnings ON;
SET ArithAbort ON;
SET Concat_Null_Yields_Null ON;
SET NOCOUNT ON;
SET Numeric_RoundAbort OFF;
SET Quoted_Identifier ON;
Go
ALTER PROCEDURE dbo.dba_indexDefrag_sp
/* Declare Parameters */
@minFragmentation FLOAT = 10.0
/* in percent, will not defrag if fragmentation
less than specified */
, @rebuildThreshold FLOAT = 30.0
/* in percent, greater than @rebuildThreshold
will result in rebuild instead of reorg */
, @onlineRebuild bit = 1
/* 1 = online rebuild; 0 = offline rebuild */
, @executeSQL BIT = 1
/* 1 = execute; 0 = print command only */
, @tableName VARCHAR(4000) = Null
/* Option to specify a table name */
, @printCommands BIT = 0
/* 1 = print commands; 0 = do not print commands */
, @printFragmentation BIT = 0
/* 1 = print fragmentation prior to defrag;
0 = do not print */
, @defragDelay CHAR(8) = '00:00:05'
/* time to wait between defrag commands */
AS
/********************************************************************
Name: dba_indexDefrag_sp
Author: Michelle F. Ufford
Purpose: Defrags all indexes for the current database
Notes: This script was designed for SQL Server 2005
Enterprise Edition.
CAUTION: Monitor transaction log if executing for the first time!
@minFragmentation defaulted to 10%, will not defrag if
fragmentation if less than specified.
@rebuildThreshold defaulted to 30% as recommended by
Microsoft in BOL;
> than 30% will result in rebuild instead
@onlineRebuild 1 = online rebuild;
0 = offline rebuild
@executeSQL 1 = execute the SQL generated by this proc;
0 = print command only
@tableName Specify if you only want to defrag indexes
for a specific table
@printCommands 1 = print commands to screen;
0 = do not print commands
@printFragmentation 1 = print fragmentation to screen;
0 = do not print fragmentation
@defragDelay time to wait between defrag commands;
gives the server some time to catch up
Called by: SQL Agent Job or DBA
Date Initials Description
----------------------------------------------------------------
2008-10-27 MFU Initial Release
2008-11-17 MFU Added page_count to log table
, added @printFragmentation option
********************************************************************
Exec dbo.dba_indexDefrag_sp
@executeSQL = 1
, @printCommands = 1
, @minFragmentation = 0
, @printFragmentation = 1;
********************************************************************/
SET NOCOUNT ON;
SET XACT_Abort ON;
BEGIN
/* Declare our variables */
DECLARE @objectID INT
, @indexID INT
, @partitionCount BIGINT
, @schemaName NVARCHAR(130)
, @objectName NVARCHAR(130)
, @indexName NVARCHAR(130)
, @partitionNumber SMALLINT
, @partitions SMALLINT
, @fragmentation FLOAT
, @pageCount INT
, @sqlCommand NVARCHAR(4000)
, @rebuildCommand NVARCHAR(200)
, @dateTimeStart DATETIME
, @dateTimeEnd DATETIME
, @containsLOB BIT;
/* Just a little validation... */
IF @minFragmentation Not Between 0.00 And 100.0
SET @minFragmentation = 10.0;
IF @rebuildThreshold Not Between 0.00 And 100.0
SET @rebuildThreshold = 30.0;
IF @defragDelay Not Like '00:[0-5][0-9]:[0-5][0-9]'
SET @defragDelay = '00:00:05';
/* Determine which indexes to defrag using our
user-defined parameters */
SELECT
OBJECT_ID AS objectID
, index_id AS indexID
, partition_number AS partitionNumber
, avg_fragmentation_in_percent AS fragmentation
, page_count
, 0 AS 'defragStatus'
/* 0 = unprocessed, 1 = processed */
INTO #indexDefragList
FROM sys.dm_db_index_physical_stats
(DB_ID(), OBJECT_ID(@tableName), NULL , NULL, N'Limited')
WHERE avg_fragmentation_in_percent > @minFragmentation
And index_id > 0
OPTION (MaxDop 1);
/* Create a clustered index to boost performance a little */
CREATE CLUSTERED INDEX CIX_temp_indexDefragList
ON #indexDefragList(objectID, indexID, partitionNumber);
/* Begin our loop for defragging */
WHILE (SELECT COUNT(*) FROM #indexDefragList
WHERE defragStatus = 0) > 0
BEGIN
/* Grab the most fragmented index first to defrag */
SELECT TOP 1
@objectID = objectID
, @fragmentation = fragmentation
, @indexID = indexID
, @partitionNumber = partitionNumber
, @pageCount = page_count
FROM #indexDefragList
WHERE defragStatus = 0
ORDER BY fragmentation DESC;
/* Look up index information */
SELECT @objectName = QUOTENAME(o.name)
, @schemaName = QUOTENAME(s.name)
FROM sys.objects AS o
Inner Join sys.schemas AS s
ON s.schema_id = o.schema_id
WHERE o.OBJECT_ID = @objectID;
SELECT @indexName = QUOTENAME(name)
FROM sys.indexes
WHERE OBJECT_ID = @objectID
And index_id = @indexID
And type > 0;
/* Determine if the index is partitioned */
SELECT @partitionCount = COUNT(*)
FROM sys.partitions
WHERE OBJECT_ID = @objectID
And index_id = @indexID;
/* Look for LOBs */
SELECT TOP 1
@containsLOB = column_id
FROM sys.columns WITH (NOLOCK)
WHERE
[OBJECT_ID] = @objectID
And (system_type_id In (34, 35, 99)
-- 34 = image, 35 = text, 99 = ntext
Or max_length = -1);
-- varbinary(max), varchar(max), nvarchar(max), xml
/* See if we should rebuild or reorganize; handle thusly */
IF @fragmentation < @rebuildThreshold And @partitionCount <= 1
SET @sqlCommand = N'Alter Index ' + @indexName + N' On '
+ @schemaName + N'.' + @objectName + N' ReOrganize';
IF @fragmentation >= @rebuildThreshold
And IsNull(@containsLOB, 0) = 0
-- Cannot rebuild if the table has one or more LOB
And @partitionCount <= 1
BEGIN
/* We should always rebuild online if possible
(SQL 2005 Enterprise) */
IF @onlineRebuild = 0
SET @rebuildCommand = N' Rebuild With
(Online = Off, MaxDop = 1)';
ELSE
SET @rebuildCommand = N' Rebuild With
(Online = On, MaxDop = 1)';
SET @sqlCommand = N'Alter Index ' + @indexName + N' On '
+ @schemaName + N'.' + @objectName + @rebuildCommand;
END;
/* If our index is partitioned, we should always reorganize */
IF @partitionCount > 1
SET @sqlCommand = N'Alter Index ' + @indexName + N' On '
+ @schemaName + N'.' + @objectName + N' ReOrganize'
+ N' Partition = '
+ CAST(@partitionNumber AS NVARCHAR(10));
-- no MaxDop needed, single threaded operation
/* Are we executing the SQL? If so, do it */
IF @executeSQL = 1
BEGIN
/* Grab the time for logging purposes */
SET @dateTimeStart = GETDATE();
EXECUTE (@sqlCommand);
SET @dateTimeEnd = GETDATE();
/* Log our actions */
INSERT INTO dbo.dba_indexDefragLog
(
objectID
, objectName
, indexID
, indexName
, partitionNumber
, fragmentation
, page_count
, dateTimeStart
, durationSeconds
)
SELECT
@objectID
, @objectName
, @indexID
, @indexName
, @partitionNumber
, @fragmentation
, @pageCount
, @dateTimeStart
, DATEDIFF(SECOND, @dateTimeStart, @dateTimeEnd);
/* Just a little breather for the server */
WAITFOR Delay @defragDelay;
/* Print if specified to do so */
IF @printCommands = 1
PRINT N'Executed: ' + @sqlCommand;
END
ELSE
/* Looks like we're not executing, just print
the commands */
BEGIN
IF @printCommands = 1
PRINT @sqlCommand;
END
/* Update our index defrag list when we've
finished with that index */
UPDATE #indexDefragList
SET defragStatus = 1
WHERE objectID = @objectID
And indexID = @indexID
And partitionNumber = @partitionNumber;
END
/* Do we want to output our fragmentation results? */
If @printFragmentation = 1
Select idl.objectID
, o.name As 'tableName'
, idl.indexID
, i.name As 'indexName'
, idl.fragmentation
, idl.page_count
From #indexDefragList As idl
Join sys.objects AS o
On idl.objectID = o.object_id
Join sys.indexes As i
On idl.objectID = i.object_id
And idl.indexID = i.index_id;
/* When everything is done, make sure to get rid of
our temp table */
DROP TABLE #indexDefragList;
SET NOCOUNT OFF;
RETURN 0
END
Go
Standard Version:
IF EXISTS(SELECT OBJECT_ID FROM sys.tables
WHERE [name] = N'dba_indexDefragLog')
BEGIN
DROP TABLE dbo.dba_indexDefragLog;
PRINT 'dba_indexDefragLog table dropped!';
END
CREATE TABLE dbo.dba_indexDefragLog
(
indexDefrag_id INT IDENTITY(1,1) NOT NULL
, objectID INT NOT NULL
, objectName NVARCHAR(130) NOT NULL
, indexID INT NOT NULL
, indexName NVARCHAR(130) NOT NULL
, fragmentation FLOAT NOT NULL
, page_count INT NOT NULL
, dateTimeStart DATETIME NOT NULL
, durationSeconds INT NOT NULL
CONSTRAINT PK_indexDefragLog
PRIMARY KEY CLUSTERED (indexDefrag_id)
);
PRINT 'dba_indexDefragLog Table Created';
IF OBJECTPROPERTY(OBJECT_ID('dbo.dba_indexDefragStandard_sp'),
N'IsProcedure') IS Null
BEGIN
EXECUTE ('Create Procedure dbo.dba_indexDefragStandard_sp
As Print ''Hello World!''');
RAISERROR('Procedure dba_indexDefragStandard_sp created.'
, 10, 1);
END;
Go
SET ANSI_Nulls ON;
SET Ansi_Padding ON;
SET Ansi_Warnings ON;
SET ArithAbort ON;
SET Concat_Null_Yields_Null ON;
SET NOCOUNT ON;
SET Numeric_RoundAbort OFF;
SET Quoted_Identifier ON;
Go
ALTER PROCEDURE dbo.dba_indexDefragStandard_sp
/* Declare Parameters */
@minFragmentation FLOAT = 10.0
/* in percent, will not defrag if fragmentation
less than specified */
, @rebuildThreshold FLOAT = 30.0
/* in percent, greater than @rebuildThreshold
will result in rebuild instead of reorg */
, @executeSQL BIT = 1
/* 1 = execute; 0 = print command only */
, @tableName VARCHAR(4000) = Null
/* Option to specify a table name */
, @printCommands BIT = 0
/* 1 = print commands; 0 = do not print commands */
, @printFragmentation BIT = 0
/* 1 = print fragmentation prior to defrag;
0 = do not print */
, @defragDelay CHAR(8) = '00:00:05'
/* time to wait between defrag commands */
AS
/********************************************************************
Name: dba_indexDefragStandard_sp
Author: Michelle F. Ufford
Purpose: Defrags all indexes for the current database
Notes: This script was designed for SQL Server 2005
Standard edition.
CAUTION: Monitor transaction log if executing for the first time!
@minFragmentation defaulted to 10%, will not defrag if
fragmentation if less than specified.
@rebuildThreshold defaulted to 30% as recommended by
Microsoft in BOL;
> than 30% will result in rebuild instead
@executeSQL 1 = execute the SQL generated by this proc;
0 = print command only
@tableName Specify if you only want to defrag indexes
for a specific table
@printCommands 1 = print commands to screen;
0 = do not print commands
@printFragmentation 1 = print fragmentation to screen;
0 = do not print fragmentation
@defragDelay time to wait between defrag commands;
gives the server some time to catch up
Called by: SQL Agent Job or DBA
Date Initials Description
----------------------------------------------------------------
2008-10-27 MFU Initial Release
2008-11-17 MFU Added page_count to log table
, added @printFragmentation option
********************************************************************
Exec dbo.dba_indexDefragStandard_sp
@executeSQL = 1
, @printCommands = 1
, @minFragmentation = 0
, @printFragmentation = 1;
********************************************************************/
SET NOCOUNT ON;
SET XACT_Abort ON;
BEGIN
/* Declare our variables */
DECLARE @objectID INT
, @indexID INT
, @schemaName NVARCHAR(130)
, @objectName NVARCHAR(130)
, @indexName NVARCHAR(130)
, @fragmentation FLOAT
, @pageCount INT
, @sqlCommand NVARCHAR(4000)
, @rebuildCommand NVARCHAR(200)
, @dateTimeStart DATETIME
, @dateTimeEnd DATETIME
, @containsLOB BIT;
/* Just a little validation... */
IF @minFragmentation Not Between 0.00 And 100.0
SET @minFragmentation = 10.0;
IF @rebuildThreshold Not Between 0.00 And 100.0
SET @rebuildThreshold = 30.0;
IF @defragDelay Not Like '00:[0-5][0-9]:[0-5][0-9]'
SET @defragDelay = '00:00:05';
/* Determine which indexes to defrag using our
user-defined parameters */
SELECT
OBJECT_ID AS objectID
, index_id AS indexID
, avg_fragmentation_in_percent AS fragmentation
, page_count
, 0 AS 'defragStatus'
/* 0 = unprocessed, 1 = processed */
INTO #indexDefragList
FROM sys.dm_db_index_physical_stats
(DB_ID(), OBJECT_ID(@tableName), NULL , NULL, N'Limited')
WHERE avg_fragmentation_in_percent > @minFragmentation
And index_id > 0
OPTION (MaxDop 1);
/* Create a clustered index to boost performance a little */
CREATE CLUSTERED INDEX CIX_temp_indexDefragList
ON #indexDefragList(objectID, indexID);
/* Begin our loop for defragging */
WHILE (SELECT COUNT(*) FROM #indexDefragList
WHERE defragStatus = 0) > 0
BEGIN
/* Grab the most fragmented index first to defrag */
SELECT TOP 1
@objectID = objectID
, @fragmentation = fragmentation
, @indexID = indexID
, @pageCount = page_count
FROM #indexDefragList
WHERE defragStatus = 0
ORDER BY fragmentation DESC;
/* Look up index information */
SELECT @objectName = QUOTENAME(o.name)
, @schemaName = QUOTENAME(s.name)
FROM sys.objects AS o
Inner Join sys.schemas AS s
ON s.schema_id = o.schema_id
WHERE o.OBJECT_ID = @objectID;
SELECT @indexName = QUOTENAME(name)
FROM sys.indexes
WHERE OBJECT_ID = @objectID
And index_id = @indexID
And type > 0;
/* Look for LOBs */
SELECT TOP 1
@containsLOB = column_id
FROM sys.columns WITH (NOLOCK)
WHERE
[OBJECT_ID] = @objectID
And (system_type_id In (34, 35, 99)
-- 34 = image, 35 = text, 99 = ntext
Or max_length = -1);
-- varbinary(max), varchar(max), nvarchar(max), xml
/* See if we should rebuild or reorganize; handle thusly */
IF @fragmentation < @rebuildThreshold
Or IsNull(@containsLOB, 0) > 0
-- Cannot rebuild if the table has one or more LOB
SET @sqlCommand = N'Alter Index ' + @indexName + N' On '
+ @schemaName + N'.' + @objectName + N' ReOrganize;'
ELSE
SET @sqlCommand = N'Alter Index ' + @indexName + N' On '
+ @schemaName + N'.' + @objectName + ' Rebuild '
+ 'With (MaxDop = 1)'; -- minimize impact on server
/* Are we executing the SQL? If so, do it */
IF @executeSQL = 1
BEGIN
/* Grab the time for logging purposes */
SET @dateTimeStart = GETDATE();
EXECUTE (@sqlCommand);
SET @dateTimeEnd = GETDATE();
/* Log our actions */
INSERT INTO dbo.dba_indexDefragLog
(
objectID
, objectName
, indexID
, indexName
, fragmentation
, page_count
, dateTimeStart
, durationSeconds
)
SELECT
@objectID
, @objectName
, @indexID
, @indexName
, @fragmentation
, @pageCount
, @dateTimeStart
, DATEDIFF(SECOND, @dateTimeStart, @dateTimeEnd);
/* Just a little breather for the server */
WAITFOR Delay @defragDelay;
/* Print if specified to do so */
IF @printCommands = 1
PRINT N'Executed: ' + @sqlCommand;
END
ELSE
/* Looks like we're not executing, just print
the commands */
BEGIN
IF @printCommands = 1
PRINT @sqlCommand;
END
/* Update our index defrag list when we've
finished with that index */
UPDATE #indexDefragList
SET defragStatus = 1
WHERE objectID = @objectID
And indexID = @indexID;
END
/* Do we want to output our fragmentation results? */
IF @printFragmentation = 1
SELECT idl.objectID
, o.name As 'tableName'
, idl.indexID
, i.name As 'indexName'
, idl.fragmentation
, idl.page_count
FROM #indexDefragList AS idl
JOIN sys.objects AS o
ON idl.objectID = o.object_id
JOIN sys.indexes As i
ON idl.objectID = i.object_id
AND idl.indexID = i.index_id;
/* When everything is done, make sure to get rid of
our temp table */
DROP TABLE #indexDefragList;
SET NOCOUNT OFF;
RETURN 0
END
Go
For those who are having troubles with this script...
1) "Not all of my indexes were defragged!" or "Nothing happened when I executed this script."
This script will only defrag those indexes that surpass the specified threshold. If you're not seeing your index in the output, try executing this:
Exec dbo.dba_indexDefrag_sp
@executeSQL = 0
, @printCommands = 1
, @minFragmentation = 0
, @printFragmentation = 1;
Check to see what your index's fragmentation level is. Maybe it's not as fragmented as you feared.
2) "My indexes are still fragmented after running this script."
To quote The Powers That Be (aka Microsoft)...
"In general, fragmentation on small indexes is often not controllable. The pages of small indexes are stored on mixed extents. Mixed extents are shared by up to eight objects, so the fragmentation in a small index might not be reduced after reorganizing or rebuilding the index." -- Reorganizing and Rebuilding Indexes
3) "Can I use this in my production environment?"
That really depends on your environment. I've successfully used this in some very large production environments. However, I wouldn't exactly recommend executing the script in the middle of a business day on a billion+ row, heavily fragmented, unpartitioned table, either.
If you're not sure what the impact will be, execute the commands-only version of the script...
Exec dbo.dba_indexDefrag_sp
@executeSQL = 0
, @printCommands = 1
, @printFragmentation = 1;
... then execute the statements one at a time. Make sure you monitor tempdb and the transaction log to ensure you don't have any space issues.
If you have any additional questions or suggestions for this script, leave me a comment and I'll be happy to help.
As promised, today I took a look at the performance of bulk inserts using XML and Table-Valued Parameters. I also compared it against singleton inserts to show the value in the bulk-insert approach.
My tests were pretty simple: insert 100 records using each method. Each test was executed 10 times to ensure consistency. The duration was recorded in microseconds.
The goal was to compare the performance of the inserts. Because I was executing this entire test within SQL Server, I had to isolate only the actual insert transactions and ignore everything else, such as the loading of the data; that work would normally be performed by the calling application.
So without further ado... screenshots of the Profiler traces: (click to enlarge)

Single Insert Method

XML Method

Table-Valued Parameter Method
Summary
| Method |
Avg CPU |
Avg Reads |
Avg Writes |
Avg Duration (micro) |
| Singleton Method |
3 |
202 |
0 |
13378 |
| XML Method |
0 |
222 |
0 |
3124 |
| TVP Method |
1 |
207 |
0 |
780 |
As expected, both the XML and the TVP method performed significantly better than the single-insert method. As hoped, the table-valued parameter arguably performed the best of all 3.