Replication Bug with Partitioned Tables
Filed under: Miscellaneous, SQL Tips, Syndication
Recently, we came across a bug in SQL Server 2005 on one of our production servers. Apparently, if you execute an ALTER TABLE statement on a replicated table with more than 128 partitions, the log reader will fail. A relatively obscure bug, I know. Microsoft has recognized this as a confirmed bug, but I couldn’t find it anywhere on the intertubes, thus the inspiration for this blog post. Microsoft’s official solution for this issue is to upgrade to SQL Server 2008.
For various reasons, we were unable to execute an upgrade at the time. And since this was a 2 terabyte database, we wanted to come up with a solution that wouldn’t involve reinitializing the entire publication. Our quick-fix while we were troubleshooting the issue was to create a linked server to the production box. Not ideal, I know, but it worked in a pinch and minimized exposure of the issue. Fortunately for us, we were able to solve the problem on the publication database pretty easily. All of the affected partition functions had empty partitions created several months in the future, so we simply merged any empty partition ranges for future dates. Our solution to our now-out-of-date subscribers was to apply static row filtering to any table with more than 100 million records. While this would introduce some overhead with the replication of these tables, it would allow us a much faster recovery time. We decided to use the start of the most recent hour as our filtering criteria, just to give us a “clean” filter, so we had to delete data from any table where we were going to apply the filter. After that, it was simply a matter of resuming replication.
All things considered, it took us a little over a day to recover from the issue. Most of that time was spent troubleshooting the problem and identifying a workable solution; actual execution of the changes was pretty quick. Moral of the story? Upgrade to SQL Server 2008.
Partitioning Tricks
Filed under: Miscellaneous, Performance & Tuning, SQL 2008, SQL Tips, Syndication
For those of you who are using partitioning, or who are considering using partitioning, allow me to share some tips with you.
Easy Partition Staging Tables
Switching partitions (or more specifically, hobts) in and out of a partitioned table requires the use of a staging table. The staging table has very specific requirements: it must be completely identical to the partitioned table, including indexing structures, and it must have a check constraint that limits data to the partitioning range. Thanks to my co-worker Jeff, I’ve recently started using the SQL Server Partition Management tool on CodePlex. I haven’t used the automatic partition switching feature — frankly, using any sort of data modification tool in a production environment makes me nervous — but I’ve been using the scripting option to create staging tables in my development environment, which I then copy to production for use. It’s nothing you can’t do yourself, but it does make the whole process easy and painless, plus it saves you from annoying typos. But be careful when using this tool to just create the table and check constraints automatically, because you may need to…
Add Check Constraints After Loading Data
Most of the time, I add the check constraint when I create the staging table, then I load data and perform the partition switch. However, for some reason, I was receiving the following error:
.Net SqlClient Data Provider: Msg 4972, Level 16, State 1, Line 1
ALTER TABLE SWITCH statement failed. Check constraints or partition function of source table ‘myStagingTable’ allows values that are not allowed by check constraints or partition function on target table ‘myDestinationTable’.
This drove me crazy. I confirmed my check constraints were correct, that I had the correct partition number, and that all schema and indexes matched identically. After about 30 minutes of this, I decided to drop and recreate the constraint. For some reason, it fixed the issue. Repeat tests produced the same results: the check constraint needed to be added *after* data was loaded. This error is occurring on a SQL Server 2008 SP1 box; to be honest, I’m not sure what’s causing the error, so if you know, please leave me a comment. But I figured I’d share so that anyone else running into this issue can hopefully save some time and headache.
Replicating Into Partitioned and Non-Partitioned Tables
Recently, we needed to replicate a non-partitioned table to two different destinations. We wanted to use partitioning for Server A, which has 2008 Enterprise; Server B, which is on 2005 Standard, could not take advantage of partitioning. The solution was really easy: create a pre-snapshot and post-snapshot script for the publication, then modify to handle each server group differently. Using pseudo-code, it looked something like this:
/* Identify which servers get the partitioned version */ IF @@SERVERNAME In ('yourServerNameList') BEGIN /* Create your partitioning scheme if necessary */ IF Not Exists(SELECT * FROM sys.partition_schemes WHERE name = 'InsertPartitionScheme') CREATE PARTITION SCHEME InsertPartitionScheme AS PARTITION InsertPartitionFunction ALL TO ([PRIMARY]); /* Create your partitioning function if necessary */ IF Not Exists(SELECT * FROM sys.partition_functions WHERE name = 'InsertPartitionFunction') CREATE PARTITION FUNCTION InsertPartitionFunction (SMALLDATETIME) AS RANGE RIGHT FOR VALUES ('insertValues'); /* Create a partitioned version of your table */ CREATE TABLE [dbo].[yourTableName] ( [yourTableSchema] ) ON InsertPartitionScheme([partitioningKey]); END ELSE BEGIN /* Create a non-partitioned version of your table */ CREATE TABLE [dbo].[yourTableName] ( [yourTableSchema] ) ON [PRIMARY]; END
You could also use an edition check instead of a server name check, if you prefer. The post-snapshot script basically looked the same, except you create partitioned indexes instead.
Compress Old Partitions
Did you know you can set different compression levels for individual partitions? It’s true! I’ve just completed doing this on our largest partitioned table. Here’s how:
/* Apply compression to your partitioned table */ ALTER TABLE dbo.yourTableName Rebuild Partition = All WITH ( Data_Compression = Page ON Partitions(1 TO 9) , Data_Compression = ROW ON Partitions(10 TO 11) , Data_Compression = NONE ON Partitions(12) ); /* Apply compression to your partitioned index */ ALTER INDEX YourPartitionedIndex ON dbo.yourTableName Rebuild Partition = All WITH ( Data_Compression = Page ON Partitions(1 TO 9) , Data_Compression = ROW ON Partitions(10 TO 11) , Data_Compression = NONE ON Partitions(12) ); /* Apply compression to your unpartitioned index */ ALTER INDEX YourUnpartitionedIndex ON dbo.yourTableName Rebuild WITH (Data_Compression = ROW);
A couple of things to note. In all of our proof-of-concept testing, we found that compression significantly reduced query execution time, reads (IO), and storage. However, CPU was also increased significantly. The results were more dramatic, both good and bad, with page compression versus row compression. Still, for our older partitions, which aren’t queried regularly, it made sense to turn on page compression. The newer partitions receive row compression, and the newest partitions, which are still queried very regularly by routine processes, were left completely uncompressed. This seems to strike a nice balance in our environment, but of course, results will vary depending on how you use your data.
Something to be aware of is that compressing your clustered index does *not* compress your non-clustered indexes; those are separate operations. Lastly, for those who are curious, it took us about 1 minute to apply row compression and about 7 minutes to apply page compression to partitions averaging 30 million rows.
Looking for more information on table partitioning? Check out my overview of partitioning, my example code, and my article on indexing on partitioned tables.
Automated Index Defrag Script
Filed under: Performance & Tuning, SQL 2008, SQL Tips, T-SQL Scripts
Note: This script has been updated. You can find the latest version here: http://sqlfool.com/2009/06/index-defrag-script-v30/
So after much delay, here’s the latest and greatest version of my index defrag script.
A summary of the changes:
- Added support for centralized execution. Option to defrag indexes for a single database from another database, or for all non-system databases.
- Consolidated Enterprise and Standard versions of the script with new @editionCheck variable.
- Added parametrization for MaxDop restrictions during rebuilds; validates that the value does not exceed the actual number of processors on the server.
- Changed minimum fragmentation default value from 10 to 5 to match BOL recommendations.
- Limited defrags to objects with more than 8 pages.
- Added a debug option to give a little more insight into what’s happening and to assist with troubleshooting.
- Updated logic for handling partitions and LOBs.
And a couple of notes and explanations:
Don’t you know you can just pass NULL to sys.dm_db_index_physical_stats for the DatabaseID?
Yes, I realize you can do this. But I don’t want to defrag the system databases, i.e. tempdb, so I opted to handle it this way instead.
How long will this take?
It depends. I don’t necessarily recommend running it without specifying a database; at least, not unmonitored. You *can* do that, but it could take a while. For example, to run sys.dm_db_index_physical_stats for all databases and tables, totaling 2TB, took me 4.5 hours; that doesn’t even count the actual defrags.
Where should I put this?
It’s up to you. If you have a database for items like centralized maintenance or scratch tables, that may be a good place for it. If you prefer, you could also put this in each individual database and call it locally, too. I would not put this in the master or msdb databases.
This is pretty close to a complete rewrite, so please let me know if you encounter any bugs. And now… the code!
IF Not Exists(SELECT [OBJECT_ID] FROM sys.tables WHERE name = N'dba_indexDefragLog') BEGIN -- Drop Table dbo.dba_indexDefragLog CREATE TABLE dbo.dba_indexDefragLog ( indexDefrag_id INT IDENTITY(1,1) Not Null , databaseID INT Not Null , databaseName NVARCHAR(128) Not Null , objectID INT Not Null , objectName NVARCHAR(128) Not Null , indexID INT Not Null , indexName NVARCHAR(128) Not Null , partitionNumber SMALLINT Not Null , fragmentation FLOAT Not Null , page_count INT Not Null , dateTimeStart DATETIME Not Null , durationSeconds INT Not Null CONSTRAINT PK_indexDefragLog PRIMARY KEY CLUSTERED (indexDefrag_id) ) PRINT 'dba_indexDefragLog Table Created'; END IF OBJECTPROPERTY(OBJECT_ID('dbo.dba_indexDefrag_sp'), N'IsProcedure') = 1 BEGIN DROP PROCEDURE dbo.dba_indexDefrag_sp; PRINT 'Procedure dba_indexDefrag_sp dropped'; END; Go CREATE PROCEDURE dbo.dba_indexDefrag_sp /* Declare Parameters */ @minFragmentation FLOAT = 5.0 /* in percent, will not defrag if fragmentation less than specified */ , @rebuildThreshold FLOAT = 30.0 /* in percent, greater than @rebuildThreshold will result in rebuild instead of reorg */ , @executeSQL BIT = 1 /* 1 = execute; 0 = print command only */ , @DATABASE VARCHAR(128) = Null /* Option to specify a database name; null will return all */ , @tableName VARCHAR(4000) = Null -- databaseName.schema.tableName /* Option to specify a table name; null will return all */ , @onlineRebuild BIT = 1 /* 1 = online rebuild; 0 = offline rebuild; only in Enterprise */ , @maxDopRestriction TINYINT = Null /* Option to restrict the number of processors for the operation; only in Enterprise */ , @printCommands BIT = 0 /* 1 = print commands; 0 = do not print commands */ , @printFragmentation BIT = 0 /* 1 = print fragmentation prior to defrag; 0 = do not print */ , @defragDelay CHAR(8) = '00:00:05' /* time to wait between defrag commands */ , @debugMode BIT = 0 /* display some useful comments to help determine if/where issues occur */ AS /********************************************************************************* Name: dba_indexDefrag_sp Author: Michelle Ufford, http://sqlfool.com Purpose: Defrags all indexes for one or more databases Notes: CAUTION: TRANSACTION LOG SIZE MUST BE MONITORED CLOSELY WHEN DEFRAGMENTING. @minFragmentation defaulted to 10%, will not defrag if fragmentation is less than that @rebuildThreshold defaulted to 30% as recommended by Microsoft in BOL; greater than 30% will result in rebuild instead @executeSQL 1 = execute the SQL generated by this proc; 0 = print command only @database Optional, specify specific database name to defrag; If not specified, all non-system databases will be defragged. @tableName Specify if you only want to defrag indexes for a specific table, format = databaseName.schema.tableName; if not specified, all tables will be defragged. @onlineRebuild 1 = online rebuild; 0 = offline rebuild @maxDopRestriction Option to specify a processor limit for index rebuilds @printCommands 1 = print commands to screen; 0 = do not print commands @printFragmentation 1 = print fragmentation to screen; 0 = do not print fragmentation @defragDelay time to wait between defrag commands; gives the server a little time to catch up @debugMode 1 = display debug comments; helps with troubleshooting 0 = do not display debug comments Called by: SQL Agent Job or DBA Date Initials Description ---------------------------------------------------------------------------- 2008-10-27 MFU Initial Release for public consumption 2008-11-17 MFU Added page-count to log table , added @printFragmentation option 2009-03-17 MFU Provided support for centralized execution, , consolidated Enterprise & Standard versions , added @debugMode, @maxDopRestriction , modified LOB and partition logic ********************************************************************************* Exec dbo.dba_indexDefrag_sp @executeSQL = 0 , @minFragmentation = 80 , @printCommands = 1 , @debugMode = 1 , @printFragmentation = 1 , @database = 'AdventureWorks' , @tableName = 'AdventureWorks.Sales.SalesOrderDetail'; *********************************************************************************/ SET NOCOUNT ON; SET XACT_Abort ON; SET Quoted_Identifier ON; BEGIN IF @debugMode = 1 RAISERROR('Dusting off the spiderwebs and starting up...', 0, 42) WITH NoWait; /* Declare our variables */ DECLARE @objectID INT , @databaseID INT , @databaseName NVARCHAR(128) , @indexID INT , @partitionCount BIGINT , @schemaName NVARCHAR(128) , @objectName NVARCHAR(128) , @indexName NVARCHAR(128) , @partitionNumber SMALLINT , @partitions SMALLINT , @fragmentation FLOAT , @pageCount INT , @sqlCommand NVARCHAR(4000) , @rebuildCommand NVARCHAR(200) , @dateTimeStart DATETIME , @dateTimeEnd DATETIME , @containsLOB BIT , @editionCheck BIT , @debugMessage VARCHAR(128) , @updateSQL NVARCHAR(4000) , @partitionSQL NVARCHAR(4000) , @partitionSQL_Param NVARCHAR(1000) , @LOB_SQL NVARCHAR(4000) , @LOB_SQL_Param NVARCHAR(1000); /* Create our temporary tables */ CREATE TABLE #indexDefragList ( databaseID INT , databaseName NVARCHAR(128) , objectID INT , indexID INT , partitionNumber SMALLINT , fragmentation FLOAT , page_count INT , defragStatus BIT , schemaName NVARCHAR(128) Null , objectName NVARCHAR(128) Null , indexName NVARCHAR(128) Null ); CREATE TABLE #databaseList ( databaseID INT , databaseName VARCHAR(128) ); CREATE TABLE #processor ( [INDEX] INT , Name VARCHAR(128) , Internal_Value INT , Character_Value INT ); IF @debugMode = 1 RAISERROR('Beginning validation...', 0, 42) WITH NoWait; /* Just a little validation... */ IF @minFragmentation Not Between 0.00 And 100.0 SET @minFragmentation = 5.0; IF @rebuildThreshold Not Between 0.00 And 100.0 SET @rebuildThreshold = 30.0; IF @defragDelay Not Like '00:[0-5][0-9]:[0-5][0-9]' SET @defragDelay = '00:00:05'; /* Make sure we're not exceeding the number of processors we have available */ INSERT INTO #processor EXECUTE XP_MSVER 'ProcessorCount'; IF @maxDopRestriction IS Not Null And @maxDopRestriction > (SELECT Internal_Value FROM #processor) SELECT @maxDopRestriction = Internal_Value FROM #processor; /* Check our server version; 1804890536 = Enterprise, 610778273 = Enterprise Evaluation, -2117995310 = Developer */ IF (SELECT SERVERPROPERTY('EditionID')) In (1804890536, 610778273, -2117995310) SET @editionCheck = 1 -- supports online rebuilds ELSE SET @editionCheck = 0; -- does not support online rebuilds IF @debugMode = 1 RAISERROR('Grabbing a list of our databases...', 0, 42) WITH NoWait; /* Retrieve the list of databases to investigate */ INSERT INTO #databaseList SELECT database_id , name FROM sys.databases WHERE name = IsNull(@DATABASE, name) And database_id > 4 -- exclude system databases And [STATE] = 0; -- state must be ONLINE IF @debugMode = 1 RAISERROR('Looping through our list of databases and checking for fragmentation...', 0, 42) WITH NoWait; /* Loop through our list of databases */ WHILE (SELECT COUNT(*) FROM #databaseList) > 0 BEGIN SELECT TOP 1 @databaseID = databaseID FROM #databaseList; SELECT @debugMessage = ' working on ' + DB_NAME(@databaseID) + '...'; IF @debugMode = 1 RAISERROR(@debugMessage, 0, 42) WITH NoWait; /* Determine which indexes to defrag using our user-defined parameters */ INSERT INTO #indexDefragList SELECT database_id AS databaseID , QUOTENAME(DB_NAME(database_id)) AS 'databaseName' , [OBJECT_ID] AS objectID , index_id AS indexID , partition_number AS partitionNumber , avg_fragmentation_in_percent AS fragmentation , page_count , 0 AS 'defragStatus' /* 0 = unprocessed, 1 = processed */ , Null AS 'schemaName' , Null AS 'objectName' , Null AS 'indexName' FROM sys.dm_db_index_physical_stats (@databaseID, OBJECT_ID(@tableName), Null , Null, N'Limited') WHERE avg_fragmentation_in_percent >= @minFragmentation And index_id > 0 -- ignore heaps And page_count > 8 -- ignore objects with less than 1 extent OPTION (MaxDop 1); DELETE FROM #databaseList WHERE databaseID = @databaseID; END CREATE CLUSTERED INDEX CIX_temp_indexDefragList ON #indexDefragList(databaseID, objectID, indexID, partitionNumber); SELECT @debugMessage = 'Looping through our list... there''s ' + CAST(COUNT(*) AS VARCHAR(10)) + ' indexes to defrag!' FROM #indexDefragList; IF @debugMode = 1 RAISERROR(@debugMessage, 0, 42) WITH NoWait; /* Begin our loop for defragging */ WHILE (SELECT COUNT(*) FROM #indexDefragList WHERE defragStatus = 0) > 0 BEGIN IF @debugMode = 1 RAISERROR(' Picking an index to beat into shape...', 0, 42) WITH NoWait; /* Grab the most fragmented index first to defrag */ SELECT TOP 1 @objectID = objectID , @indexID = indexID , @databaseID = databaseID , @databaseName = databaseName , @fragmentation = fragmentation , @partitionNumber = partitionNumber , @pageCount = page_count FROM #indexDefragList WHERE defragStatus = 0 ORDER BY fragmentation DESC; IF @debugMode = 1 RAISERROR(' Looking up the specifics for our index...', 0, 42) WITH NoWait; /* Look up index information */ SELECT @updateSQL = N'Update idl Set schemaName = QuoteName(s.name) , objectName = QuoteName(o.name) , indexName = QuoteName(i.name) From #indexDefragList As idl Inner Join ' + @databaseName + '.sys.objects As o On idl.objectID = o.object_id Inner Join ' + @databaseName + '.sys.indexes As i On o.object_id = i.object_id Inner Join ' + @databaseName + '.sys.schemas As s On o.schema_id = s.schema_id Where o.object_id = ' + CAST(@objectID AS VARCHAR(10)) + ' And i.index_id = ' + CAST(@indexID AS VARCHAR(10)) + ' And i.type > 0 And idl.databaseID = ' + CAST(@databaseID AS VARCHAR(10)); EXECUTE SP_EXECUTESQL @updateSQL; /* Grab our object names */ SELECT @objectName = objectName , @schemaName = schemaName , @indexName = indexName FROM #indexDefragList WHERE objectID = @objectID And indexID = @indexID And databaseID = @databaseID; IF @debugMode = 1 RAISERROR(' Grabbing the partition count...', 0, 42) WITH NoWait; /* Determine if the index is partitioned */ SELECT @partitionSQL = 'Select @partitionCount_OUT = Count(*) From ' + @databaseName + '.sys.partitions Where object_id = ' + CAST(@objectID AS VARCHAR(10)) + ' And index_id = ' + CAST(@indexID AS VARCHAR(10)) + ';' , @partitionSQL_Param = '@partitionCount_OUT int OutPut'; EXECUTE SP_EXECUTESQL @partitionSQL, @partitionSQL_Param, @partitionCount_OUT = @partitionCount OUTPUT; IF @debugMode = 1 RAISERROR(' Seeing if there''s any LOBs to be handled...', 0, 42) WITH NoWait; /* Determine if the table contains LOBs */ SELECT @LOB_SQL = ' Select Top 1 @containsLOB_OUT = column_id From ' + @databaseName + '.sys.columns With (NoLock) Where [object_id] = ' + CAST(@objectID AS VARCHAR(10)) + ' And (system_type_id In (34, 35, 99) Or max_length = -1);' /* system_type_id --> 34 = image, 35 = text, 99 = ntext max_length = -1 --> varbinary(max), varchar(max), nvarchar(max), xml */ , @LOB_SQL_Param = '@containsLOB_OUT int OutPut'; EXECUTE SP_EXECUTESQL @LOB_SQL, @LOB_SQL_Param, @containsLOB_OUT = @containsLOB OUTPUT; IF @debugMode = 1 RAISERROR(' Building our SQL statements...', 0, 42) WITH NoWait; /* If there's not a lot of fragmentation, or if we have a LOB, we should reorganize */ IF @fragmentation < @rebuildThreshold Or @containsLOB = 1 Or @partitionCount > 1 BEGIN SET @sqlCommand = N'Alter Index ' + @indexName + N' On ' + @databaseName + N'.' + @schemaName + N'.' + @objectName + N' ReOrganize'; /* If our index is partitioned, we should always reorganize */ IF @partitionCount > 1 SET @sqlCommand = @sqlCommand + N' Partition = ' + CAST(@partitionNumber AS NVARCHAR(10)); END; /* If the index is heavily fragmented and doesn't contain any partitions or LOB's, rebuild it */ IF @fragmentation >= @rebuildThreshold And IsNull(@containsLOB, 0) != 1 And @partitionCount <= 1 BEGIN /* Set online rebuild options; requires Enterprise Edition */ IF @onlineRebuild = 1 And @editionCheck = 1 SET @rebuildCommand = N' Rebuild With (Online = On'; ELSE SET @rebuildCommand = N' Rebuild With (Online = Off'; /* Set processor restriction options; requires Enterprise Edition */ IF @maxDopRestriction IS Not Null And @editionCheck = 1 SET @rebuildCommand = @rebuildCommand + N', MaxDop = ' + CAST(@maxDopRestriction AS VARCHAR(2)) + N')'; ELSE SET @rebuildCommand = @rebuildCommand + N')'; SET @sqlCommand = N'Alter Index ' + @indexName + N' On ' + @databaseName + N'.' + @schemaName + N'.' + @objectName + @rebuildCommand; END; /* Are we executing the SQL? If so, do it */ IF @executeSQL = 1 BEGIN IF @debugMode = 1 RAISERROR(' Executing SQL statements...', 0, 42) WITH NoWait; /* Grab the time for logging purposes */ SET @dateTimeStart = GETDATE(); EXECUTE SP_EXECUTESQL @sqlCommand; SET @dateTimeEnd = GETDATE(); /* Log our actions */ INSERT INTO dbo.dba_indexDefragLog ( databaseID , databaseName , objectID , objectName , indexID , indexName , partitionNumber , fragmentation , page_count , dateTimeStart , durationSeconds ) SELECT @databaseID , @databaseName , @objectID , @objectName , @indexID , @indexName , @partitionNumber , @fragmentation , @pageCount , @dateTimeStart , DATEDIFF(SECOND, @dateTimeStart, @dateTimeEnd); /* Just a little breather for the server */ WAITFOR Delay @defragDelay; /* Print if specified to do so */ IF @printCommands = 1 PRINT N'Executed: ' + @sqlCommand; END ELSE /* Looks like we're not executing, just printing the commands */ BEGIN IF @debugMode = 1 RAISERROR(' Printing SQL statements...', 0, 42) WITH NoWait; IF @printCommands = 1 PRINT IsNull(@sqlCommand, 'error!'); END IF @debugMode = 1 RAISERROR(' Updating our index defrag status...', 0, 42) WITH NoWait; /* Update our index defrag list so we know we've finished with that index */ UPDATE #indexDefragList SET defragStatus = 1 WHERE databaseID = @databaseID And objectID = @objectID And indexID = @indexID And partitionNumber = @partitionNumber; END /* Do we want to output our fragmentation results? */ IF @printFragmentation = 1 BEGIN IF @debugMode = 1 RAISERROR(' Displaying fragmentation results...', 0, 42) WITH NoWait; SELECT databaseID , databaseName , objectID , objectName , indexID , indexName , fragmentation , page_count FROM #indexDefragList; END; /* When everything is said and done, make sure to get rid of our temp table */ DROP TABLE #indexDefragList; DROP TABLE #databaseList; DROP TABLE #processor; IF @debugMode = 1 RAISERROR('DONE! Thank you for taking care of your indexes! :)', 0, 42) WITH NoWait; SET NOCOUNT OFF; RETURN 0 END Go
Thanks to my beta testers, @scoinva, @davidmtate, @jdanton, and @SuperCoolMoss!
Special thanks to SCM for keeping on me to finish this.
Happy Defragging!
Michelle
Source: http://sqlfool.com/2009/03/automated-index-defrag-script/
Creating a 60 GB Index
Recently, I needed to create an index on a 1.5 billion row table. I’ve created some large indexes before, but this was the largest, so I thought I’d share my experience in case anyone was interested.
The plan was to create the following index:
CREATE NONCLUSTERED INDEX IX_indexName_unpartitioned ON databaseName.dbo.tableName(columnList) Include (includedColumnList) WITH (MaxDop = 1, Online = ON, Sort_In_TempDB = ON) ON [PRIMARY];
This is an unpartitioned, non-clustered index being built on a partitioned table. Normally, when you build an aligned partitioned index, the index creation process requires less memory and has less noticeable impact on the system, because each partition is handled individually, one at a time. But as this is an unpartitioned (unaligned) index, each partition was built concurrently, requiring more memory and causing a greater impact on performance. Because of this, I needed to restrict the process to MaxDop 1; otherwise, the server would suffer because of too much memory pressure.
I chose Sort_In_TempDB = On because:
- I’m building this index online on a very busy table and cannot afford to impact normal oeprations. By using Sort_In_TempDB = On, index transactions are separated from user transactions, allowing the user transaction log to be truncated. *
- TempDB is on a different volume and therefore should reduce the duration of the operation.
- The recovery for the user database is full, and the recovery for the TempDB is simple. Sorting in TempDB would minimize logging.
* Note: the transaction log for the user database still grew at a much faster rate than normal and had to be closely monitored during this operation to ensure enough free space remained.
The size of the indexed columns is 25 bytes. So I ran my calculations and came up with 36gb space requirement. We increased TempDB to 50gb and gave it a go. An hour later… ERROR. The process terminated because there would not be enough space free in TempDB to complete the operation. Obviously, my calculations were incorrect. After speaking with Chris Leonard, a man who is way too smart for his own good, I realized I had not included my clustered index in the size calculations. Doh.
Re-running my estimates, here’s what I came up with:
| Index Size | 25 | bytes |
| Clustered Index Size | 16 | bytes |
| Records per Page | 197 | |
| Est. Rows | 1,575,000,000 | |
| Est. Number of Pages | 7,995,000 | |
| Space Requirements | 59 | GB |
Obviously, 50gb of free space just wasn’t going to cut it. I decided to give TempDB a little wiggle room and bumped up the space in TempDB to 70gb (not as easy as you’d imagine, I had to requisition more space on the SAN), then re-ran the operation. Success! The process completed in 3 hours and 24 minutes. There was a mild increase in CPU, but no applications or users experienced any issues.
For those interested in the particulars: this was used for a single-record look-up and could not be filtered by the partitioning key. The non-partitioned version of this index has 80% less reads and 11% less CPU than its partitioned counterpart.
If you’re interested in learning more about indexes, here’s some recommended reading:
- TempDB and Index Creation
- Transaction Log Disk Space for Index Operations
- Special Guidelines for Partitioned Indexes
Indexing for Partitioned Tables
So you’ve partitioned your table, and now you’re ready to performance tune. As with any table, indexing is a great place to start. And if you’re like most people new to partitioning, you probably created all of your indexes on the partitioned scheme, either by design or unintentionally.
Let’s take a step back here and discuss what a partitioned index is. A partitioned index, like a partitioned table, separates data into different physical structures (partitions) under one logical name. Specifically, each partition in a nonclustered index contains its own B-tree structure with a subset of rows, based upon the partitioning scheme. By default, an unpartitioned nonclustered index has just one partition.
Keep in mind, when you create an index on a partitioned table, i.e.
CREATE NONCLUSTERED INDEX IX_myIndex ON dbo.myTable(myColumn);
… you are creating the index on the partitioned scheme by default. In order to create a NON-partitioned index on that same table, you would need to explicitly declare “On [FileGroup]“, i.e.
CREATE NONCLUSTERED INDEX IX_myIndex ON dbo.myTable(myColumn) ON [PRIMARY];
But should you partition your index? That depends on how you use it. In fact, most environments will probably want to use a mix of partitioned and non-partitioned indexes. I’ve found that that partitioned indexes perform better when aggregating data or scanning partitions. Conversely, you’ll probably find that, if you need to locate a single, specific record, nothing performs better than a non-partitioned index on that column.
Let’s walk through some examples and see how they perform. I’ll bring back my trusty ol’ orders table for this.
/* Create a partition function. */ CREATE Partition FUNCTION [test_monthlyDateRange_pf] (DATETIME) AS Range RIGHT FOR VALUES ('2009-01-01', '2009-01-08', '2009-01-15' , '2009-01-22', '2009-01-29'); Go /* Associate the partition function with a partition scheme. */ CREATE Partition Scheme test_monthlyDateRange_ps AS Partition test_monthlyDateRange_pf All TO ([PRIMARY]); Go /* Create a partitioned table. */ CREATE TABLE dbo.orders ( order_id INT IDENTITY(1,1) Not Null , orderDate DATETIME Not Null , orderData SMALLDATETIME Not Null CONSTRAINT PK_orders PRIMARY KEY CLUSTERED ( order_id , orderDate ) ) ON test_monthlyDateRange_ps(orderDate); Go /* Create some records to play with. */ SET NOCOUNT ON; DECLARE @endDate DATETIME = '2009-01-01'; WHILE @endDate < '2009-02-01' BEGIN INSERT INTO dbo.orders SELECT @endDate, @endDate; SET @endDate = DATEADD(MINUTE, 1, @endDate); END; SET NOCOUNT OFF; /* Let’s create an aligned, partitioned index. */ CREATE NONCLUSTERED INDEX IX_orders_aligned ON dbo.orders(order_id) ON test_monthlyDateRange_ps(orderDate); /* you don't actually need to declare the last line of this unless you want to create the index on a different partitioning scheme. */ /* Now let’s create an unpartitioned index. */ CREATE NONCLUSTERED INDEX IX_orders_unpartitioned ON dbo.orders(order_id) ON [PRIMARY];
Now that we have both a partitioned and an unpartitioned index, let’s take a look at our sys.partitions table:
/* Let's take a look at our index partitions */ SELECT i.name , i.index_id , p.partition_number , p.ROWS FROM sys.partitions AS p Join sys.indexes AS i ON p.OBJECT_ID = i.OBJECT_ID And p.index_id = i.index_id WHERE p.OBJECT_ID = OBJECT_ID('orders') ORDER BY i.index_id, p.partition_number;
As expected, both of our partitioned indexes, PK_orders and IX_orders_aligned, have 6 partitions, with a subset of rows on each partition. Our unpartitioned non-clustered index, IX_orders_unpartitioned, on the other hand has just 1 partition containing all of the rows.
Now that we have our environment set up, let’s run through some different queries and see the performance impact of each type of index.
/* Query 1, specific record look-up, covered */ SELECT order_id, orderDate FROM dbo.orders WITH (INDEX(IX_orders_aligned)) WHERE order_id = 25000; SELECT order_id, orderDate FROM dbo.orders WITH (INDEX(IX_orders_unpartitioned)) WHERE order_id = 25000;
The unpartitioned index performs significantly better when given a specific record to look-up. Now let’s try the same query, but utilizing a scan instead of a seek:
/* Query 2, specific record look-up, uncovered */ SELECT order_id, orderDate, orderData FROM dbo.orders WITH (INDEX(IX_orders_aligned)) WHERE order_id = 30000; SELECT order_id, orderDate, orderData FROM dbo.orders WITH (INDEX(IX_orders_unpartitioned)) WHERE order_id = 30000;
Again we see that the non-partitioned index performs better with the single-record look-up. This can lead to some pretty dramatic performance implications. So when *would* we want to use a partitioned index? Two instances immediately pop to mind. First, partition switching can only be performed when all indexes on a table are aligned. Secondly, partitioned indexes perform better when manipulating large data sets. To see this in action, let’s try some simple aggregation…
/* Query 3, aggregation */ SELECT CAST(ROUND(CAST(orderdate AS FLOAT), 0, 1) AS SMALLDATETIME) AS 'order_date' , COUNT(*) FROM dbo.orders WITH (INDEX(IX_orders_aligned)) WHERE orderDate Between '2009-01-01' And '2009-01-07 23:59' GROUP BY CAST(ROUND(CAST(orderdate AS FLOAT), 0, 1) AS SMALLDATETIME) ORDER BY CAST(ROUND(CAST(orderdate AS FLOAT), 0, 1) AS SMALLDATETIME); SELECT CAST(ROUND(CAST(orderdate AS FLOAT), 0, 1) AS SMALLDATETIME) AS 'order_date' , COUNT(*) FROM dbo.orders WITH (INDEX(IX_orders_unpartitioned)) WHERE orderDate Between '2009-01-01' And '2009-01-07 23:59' GROUP BY CAST(ROUND(CAST(orderdate AS FLOAT), 0, 1) AS SMALLDATETIME) ORDER BY CAST(ROUND(CAST(orderdate AS FLOAT), 0, 1) AS SMALLDATETIME);
As you can see, partitioned indexes perform better when aggregating data. This is just a simple example, but the results can be even more dramatic in a large production environment. This is one of the reasons why partitioned tables and indexes are especially beneficial in data warehouses.
So now you have a general idea of what a partitioned index is and when to use a partitioned index vs a non-partitioned index. Ultimately, your indexing needs will depend largely on the application and how the data is used. When in doubt, test, test, test! So to recap…
- Specify “On [FileGroup]“ to create an unpartitioned index on a partitioned table
- Consider using non-partitioned indexes for single-record look-ups
- Use partitioned indexes for multiple records and data aggregations
- To enable partition switching, all indexes on the table must be aligned.
For more information on partitioning, check out my other partitioning articles:
Partitioning Example
Partitioning 101
Tips for Large Data Stores
Updated Index Defrag Script (2005, 2008)
Filed under: Performance & Tuning, SQL 2008, T-SQL Scripts
Thanks to everyone who left a comment or sent me an e-mail regarding the Index Defrag Script. I’ve received some great feedback and requests for features. I’ve also had some questions regarding how to use it, which I will answer at the end of this post.
Changes include:
– separate version for both Enterprise and Standard editions
- Standard edition removes partitioning and online options
– output option to see fragmentation levels
– page_count added to the log table
I’ve also verified that this script works well in SQL 2008.
Enterprise Version:
IF EXISTS(SELECT OBJECT_ID FROM sys.tables WHERE [name] = N'dba_indexDefragLog') BEGIN DROP TABLE dbo.dba_indexDefragLog; PRINT 'dba_indexDefragLog table dropped!'; END CREATE TABLE dbo.dba_indexDefragLog ( indexDefrag_id INT IDENTITY(1,1) NOT NULL , objectID INT NOT NULL , objectName NVARCHAR(130) NOT NULL , indexID INT NOT NULL , indexName NVARCHAR(130) NOT NULL , partitionNumber SMALLINT not null , fragmentation FLOAT NOT NULL , page_count INT NOT NULL , dateTimeStart DATETIME NOT NULL , durationSeconds INT NOT NULL CONSTRAINT PK_indexDefragLog PRIMARY KEY CLUSTERED (indexDefrag_id) ); PRINT 'dba_indexDefragLog Table Created'; IF OBJECTPROPERTY(OBJECT_ID('dbo.dba_indexDefrag_sp'), N'IsProcedure') IS Null BEGIN EXECUTE ('Create Procedure dbo.dba_indexDefrag_sp As Print ''Hello World!'''); RAISERROR('Procedure dba_indexDefrag_sp created.' , 10, 1); END; Go SET ANSI_Nulls ON; SET Ansi_Padding ON; SET Ansi_Warnings ON; SET ArithAbort ON; SET Concat_Null_Yields_Null ON; SET NOCOUNT ON; SET Numeric_RoundAbort OFF; SET Quoted_Identifier ON; Go ALTER PROCEDURE dbo.dba_indexDefrag_sp /* Declare Parameters */ @minFragmentation FLOAT = 10.0 /* in percent, will not defrag if fragmentation less than specified */ , @rebuildThreshold FLOAT = 30.0 /* in percent, greater than @rebuildThreshold will result in rebuild instead of reorg */ , @onlineRebuild BIT = 1 /* 1 = online rebuild; 0 = offline rebuild */ , @executeSQL BIT = 1 /* 1 = execute; 0 = print command only */ , @tableName VARCHAR(4000) = Null /* Option to specify a table name */ , @printCommands BIT = 0 /* 1 = print commands; 0 = do not print commands */ , @printFragmentation BIT = 0 /* 1 = print fragmentation prior to defrag; 0 = do not print */ , @defragDelay CHAR(8) = '00:00:05' /* time to wait between defrag commands */ AS /******************************************************************** Name: dba_indexDefrag_sp Author: Michelle F. Ufford Purpose: Defrags all indexes for the current database Notes: This script was designed for SQL Server 2005 Enterprise Edition. CAUTION: Monitor transaction log if executing for the first time! @minFragmentation defaulted to 10%, will not defrag if fragmentation if less than specified. @rebuildThreshold defaulted to 30% as recommended by Microsoft in BOL; > than 30% will result in rebuild instead @onlineRebuild 1 = online rebuild; 0 = offline rebuild @executeSQL 1 = execute the SQL generated by this proc; 0 = print command only @tableName Specify if you only want to defrag indexes for a specific table @printCommands 1 = print commands to screen; 0 = do not print commands @printFragmentation 1 = print fragmentation to screen; 0 = do not print fragmentation @defragDelay time to wait between defrag commands; gives the server some time to catch up Called by: SQL Agent Job or DBA Date Initials Description ---------------------------------------------------------------- 2008-10-27 MFU Initial Release 2008-11-17 MFU Added page_count to log table , added @printFragmentation option ******************************************************************** Exec dbo.dba_indexDefrag_sp @executeSQL = 1 , @printCommands = 1 , @minFragmentation = 0 , @printFragmentation = 1; ********************************************************************/ SET NOCOUNT ON; SET XACT_Abort ON; BEGIN /* Declare our variables */ DECLARE @objectID INT , @indexID INT , @partitionCount BIGINT , @schemaName NVARCHAR(130) , @objectName NVARCHAR(130) , @indexName NVARCHAR(130) , @partitionNumber SMALLINT , @partitions SMALLINT , @fragmentation FLOAT , @pageCount INT , @sqlCommand NVARCHAR(4000) , @rebuildCommand NVARCHAR(200) , @dateTimeStart DATETIME , @dateTimeEnd DATETIME , @containsLOB BIT; /* Just a little validation... */ IF @minFragmentation Not Between 0.00 And 100.0 SET @minFragmentation = 10.0; IF @rebuildThreshold Not Between 0.00 And 100.0 SET @rebuildThreshold = 30.0; IF @defragDelay Not Like '00:[0-5][0-9]:[0-5][0-9]' SET @defragDelay = '00:00:05'; /* Determine which indexes to defrag using our user-defined parameters */ SELECT OBJECT_ID AS objectID , index_id AS indexID , partition_number AS partitionNumber , avg_fragmentation_in_percent AS fragmentation , page_count , 0 AS 'defragStatus' /* 0 = unprocessed, 1 = processed */ INTO #indexDefragList FROM sys.dm_db_index_physical_stats (DB_ID(), OBJECT_ID(@tableName), NULL , NULL, N'Limited') WHERE avg_fragmentation_in_percent > @minFragmentation And index_id > 0 OPTION (MaxDop 1); /* Create a clustered index to boost performance a little */ CREATE CLUSTERED INDEX CIX_temp_indexDefragList ON #indexDefragList(objectID, indexID, partitionNumber); /* Begin our loop for defragging */ WHILE (SELECT COUNT(*) FROM #indexDefragList WHERE defragStatus = 0) > 0 BEGIN /* Grab the most fragmented index first to defrag */ SELECT TOP 1 @objectID = objectID , @fragmentation = fragmentation , @indexID = indexID , @partitionNumber = partitionNumber , @pageCount = page_count FROM #indexDefragList WHERE defragStatus = 0 ORDER BY fragmentation DESC; /* Look up index information */ SELECT @objectName = QUOTENAME(o.name) , @schemaName = QUOTENAME(s.name) FROM sys.objects AS o INNER Join sys.schemas AS s ON s.schema_id = o.schema_id WHERE o.OBJECT_ID = @objectID; SELECT @indexName = QUOTENAME(name) FROM sys.indexes WHERE OBJECT_ID = @objectID And index_id = @indexID And type > 0; /* Determine if the index is partitioned */ SELECT @partitionCount = COUNT(*) FROM sys.partitions WHERE OBJECT_ID = @objectID And index_id = @indexID; /* Look for LOBs */ SELECT TOP 1 @containsLOB = column_id FROM sys.columns WITH (NOLOCK) WHERE [OBJECT_ID] = @objectID And (system_type_id In (34, 35, 99) -- 34 = image, 35 = text, 99 = ntext Or max_length = -1); -- varbinary(max), varchar(max), nvarchar(max), xml /* See if we should rebuild or reorganize; handle thusly */ IF @fragmentation < @rebuildThreshold And @partitionCount <= 1 SET @sqlCommand = N'Alter Index ' + @indexName + N' On ' + @schemaName + N'.' + @objectName + N' ReOrganize'; IF @fragmentation >= @rebuildThreshold And IsNull(@containsLOB, 0) = 0 -- Cannot rebuild if the table has one or more LOB And @partitionCount <= 1 BEGIN /* We should always rebuild online if possible (SQL 2005 Enterprise) */ IF @onlineRebuild = 0 SET @rebuildCommand = N' Rebuild With (Online = Off, MaxDop = 1)'; ELSE SET @rebuildCommand = N' Rebuild With (Online = On, MaxDop = 1)'; SET @sqlCommand = N'Alter Index ' + @indexName + N' On ' + @schemaName + N'.' + @objectName + @rebuildCommand; END; /* If our index is partitioned, we should always reorganize */ IF @partitionCount > 1 SET @sqlCommand = N'Alter Index ' + @indexName + N' On ' + @schemaName + N'.' + @objectName + N' ReOrganize' + N' Partition = ' + CAST(@partitionNumber AS NVARCHAR(10)); -- no MaxDop needed, single threaded operation /* Are we executing the SQL? If so, do it */ IF @executeSQL = 1 BEGIN /* Grab the time for logging purposes */ SET @dateTimeStart = GETDATE(); EXECUTE (@sqlCommand); SET @dateTimeEnd = GETDATE(); /* Log our actions */ INSERT INTO dbo.dba_indexDefragLog ( objectID , objectName , indexID , indexName , partitionNumber , fragmentation , page_count , dateTimeStart , durationSeconds ) SELECT @objectID , @objectName , @indexID , @indexName , @partitionNumber , @fragmentation , @pageCount , @dateTimeStart , DATEDIFF(SECOND, @dateTimeStart, @dateTimeEnd); /* Just a little breather for the server */ WAITFOR Delay @defragDelay; /* Print if specified to do so */ IF @printCommands = 1 PRINT N'Executed: ' + @sqlCommand; END ELSE /* Looks like we're not executing, just print the commands */ BEGIN IF @printCommands = 1 PRINT @sqlCommand; END /* Update our index defrag list when we've finished with that index */ UPDATE #indexDefragList SET defragStatus = 1 WHERE objectID = @objectID And indexID = @indexID And partitionNumber = @partitionNumber; END /* Do we want to output our fragmentation results? */ IF @printFragmentation = 1 SELECT idl.objectID , o.name AS 'tableName' , idl.indexID , i.name AS 'indexName' , idl.fragmentation , idl.page_count FROM #indexDefragList AS idl Join sys.objects AS o ON idl.objectID = o.OBJECT_ID Join sys.indexes AS i ON idl.objectID = i.OBJECT_ID And idl.indexID = i.index_id; /* When everything is done, make sure to get rid of our temp table */ DROP TABLE #indexDefragList; SET NOCOUNT OFF; RETURN 0 END Go
Standard Version:
IF EXISTS(SELECT OBJECT_ID FROM sys.tables WHERE [name] = N'dba_indexDefragLog') BEGIN DROP TABLE dbo.dba_indexDefragLog; PRINT 'dba_indexDefragLog table dropped!'; END CREATE TABLE dbo.dba_indexDefragLog ( indexDefrag_id INT IDENTITY(1,1) NOT NULL , objectID INT NOT NULL , objectName NVARCHAR(130) NOT NULL , indexID INT NOT NULL , indexName NVARCHAR(130) NOT NULL , fragmentation FLOAT NOT NULL , page_count INT NOT NULL , dateTimeStart DATETIME NOT NULL , durationSeconds INT NOT NULL CONSTRAINT PK_indexDefragLog PRIMARY KEY CLUSTERED (indexDefrag_id) ); PRINT 'dba_indexDefragLog Table Created'; IF OBJECTPROPERTY(OBJECT_ID('dbo.dba_indexDefragStandard_sp'), N'IsProcedure') IS Null BEGIN EXECUTE ('Create Procedure dbo.dba_indexDefragStandard_sp As Print ''Hello World!'''); RAISERROR('Procedure dba_indexDefragStandard_sp created.' , 10, 1); END; Go SET ANSI_Nulls ON; SET Ansi_Padding ON; SET Ansi_Warnings ON; SET ArithAbort ON; SET Concat_Null_Yields_Null ON; SET NOCOUNT ON; SET Numeric_RoundAbort OFF; SET Quoted_Identifier ON; Go ALTER PROCEDURE dbo.dba_indexDefragStandard_sp /* Declare Parameters */ @minFragmentation FLOAT = 10.0 /* in percent, will not defrag if fragmentation less than specified */ , @rebuildThreshold FLOAT = 30.0 /* in percent, greater than @rebuildThreshold will result in rebuild instead of reorg */ , @executeSQL BIT = 1 /* 1 = execute; 0 = print command only */ , @tableName VARCHAR(4000) = Null /* Option to specify a table name */ , @printCommands BIT = 0 /* 1 = print commands; 0 = do not print commands */ , @printFragmentation BIT = 0 /* 1 = print fragmentation prior to defrag; 0 = do not print */ , @defragDelay CHAR(8) = '00:00:05' /* time to wait between defrag commands */ AS /******************************************************************** Name: dba_indexDefragStandard_sp Author: Michelle F. Ufford Purpose: Defrags all indexes for the current database Notes: This script was designed for SQL Server 2005 Standard edition. CAUTION: Monitor transaction log if executing for the first time! @minFragmentation defaulted to 10%, will not defrag if fragmentation if less than specified. @rebuildThreshold defaulted to 30% as recommended by Microsoft in BOL; > than 30% will result in rebuild instead @executeSQL 1 = execute the SQL generated by this proc; 0 = print command only @tableName Specify if you only want to defrag indexes for a specific table @printCommands 1 = print commands to screen; 0 = do not print commands @printFragmentation 1 = print fragmentation to screen; 0 = do not print fragmentation @defragDelay time to wait between defrag commands; gives the server some time to catch up Called by: SQL Agent Job or DBA Date Initials Description ---------------------------------------------------------------- 2008-10-27 MFU Initial Release 2008-11-17 MFU Added page_count to log table , added @printFragmentation option ******************************************************************** Exec dbo.dba_indexDefragStandard_sp @executeSQL = 1 , @printCommands = 1 , @minFragmentation = 0 , @printFragmentation = 1; ********************************************************************/ SET NOCOUNT ON; SET XACT_Abort ON; BEGIN /* Declare our variables */ DECLARE @objectID INT , @indexID INT , @schemaName NVARCHAR(130) , @objectName NVARCHAR(130) , @indexName NVARCHAR(130) , @fragmentation FLOAT , @pageCount INT , @sqlCommand NVARCHAR(4000) , @rebuildCommand NVARCHAR(200) , @dateTimeStart DATETIME , @dateTimeEnd DATETIME , @containsLOB BIT; /* Just a little validation... */ IF @minFragmentation Not Between 0.00 And 100.0 SET @minFragmentation = 10.0; IF @rebuildThreshold Not Between 0.00 And 100.0 SET @rebuildThreshold = 30.0; IF @defragDelay Not Like '00:[0-5][0-9]:[0-5][0-9]' SET @defragDelay = '00:00:05'; /* Determine which indexes to defrag using our user-defined parameters */ SELECT OBJECT_ID AS objectID , index_id AS indexID , avg_fragmentation_in_percent AS fragmentation , page_count , 0 AS 'defragStatus' /* 0 = unprocessed, 1 = processed */ INTO #indexDefragList FROM sys.dm_db_index_physical_stats (DB_ID(), OBJECT_ID(@tableName), NULL , NULL, N'Limited') WHERE avg_fragmentation_in_percent > @minFragmentation And index_id > 0 OPTION (MaxDop 1); /* Create a clustered index to boost performance a little */ CREATE CLUSTERED INDEX CIX_temp_indexDefragList ON #indexDefragList(objectID, indexID); /* Begin our loop for defragging */ WHILE (SELECT COUNT(*) FROM #indexDefragList WHERE defragStatus = 0) > 0 BEGIN /* Grab the most fragmented index first to defrag */ SELECT TOP 1 @objectID = objectID , @fragmentation = fragmentation , @indexID = indexID , @pageCount = page_count FROM #indexDefragList WHERE defragStatus = 0 ORDER BY fragmentation DESC; /* Look up index information */ SELECT @objectName = QUOTENAME(o.name) , @schemaName = QUOTENAME(s.name) FROM sys.objects AS o INNER Join sys.schemas AS s ON s.schema_id = o.schema_id WHERE o.OBJECT_ID = @objectID; SELECT @indexName = QUOTENAME(name) FROM sys.indexes WHERE OBJECT_ID = @objectID And index_id = @indexID And type > 0; /* Look for LOBs */ SELECT TOP 1 @containsLOB = column_id FROM sys.columns WITH (NOLOCK) WHERE [OBJECT_ID] = @objectID And (system_type_id In (34, 35, 99) -- 34 = image, 35 = text, 99 = ntext Or max_length = -1); -- varbinary(max), varchar(max), nvarchar(max), xml /* See if we should rebuild or reorganize; handle thusly */ IF @fragmentation < @rebuildThreshold Or IsNull(@containsLOB, 0) > 0 -- Cannot rebuild if the table has one or more LOB SET @sqlCommand = N'Alter Index ' + @indexName + N' On ' + @schemaName + N'.' + @objectName + N' ReOrganize;' ELSE SET @sqlCommand = N'Alter Index ' + @indexName + N' On ' + @schemaName + N'.' + @objectName + ' Rebuild ' + 'With (MaxDop = 1)'; -- minimize impact on server /* Are we executing the SQL? If so, do it */ IF @executeSQL = 1 BEGIN /* Grab the time for logging purposes */ SET @dateTimeStart = GETDATE(); EXECUTE (@sqlCommand); SET @dateTimeEnd = GETDATE(); /* Log our actions */ INSERT INTO dbo.dba_indexDefragLog ( objectID , objectName , indexID , indexName , fragmentation , page_count , dateTimeStart , durationSeconds ) SELECT @objectID , @objectName , @indexID , @indexName , @fragmentation , @pageCount , @dateTimeStart , DATEDIFF(SECOND, @dateTimeStart, @dateTimeEnd); /* Just a little breather for the server */ WAITFOR Delay @defragDelay; /* Print if specified to do so */ IF @printCommands = 1 PRINT N'Executed: ' + @sqlCommand; END ELSE /* Looks like we're not executing, just print the commands */ BEGIN IF @printCommands = 1 PRINT @sqlCommand; END /* Update our index defrag list when we've finished with that index */ UPDATE #indexDefragList SET defragStatus = 1 WHERE objectID = @objectID And indexID = @indexID; END /* Do we want to output our fragmentation results? */ IF @printFragmentation = 1 SELECT idl.objectID , o.name AS 'tableName' , idl.indexID , i.name AS 'indexName' , idl.fragmentation , idl.page_count FROM #indexDefragList AS idl JOIN sys.objects AS o ON idl.objectID = o.OBJECT_ID JOIN sys.indexes AS i ON idl.objectID = i.OBJECT_ID AND idl.indexID = i.index_id; /* When everything is done, make sure to get rid of our temp table */ DROP TABLE #indexDefragList; SET NOCOUNT OFF; RETURN 0 END Go
For those who are having troubles with this script…
1) “Not all of my indexes were defragged!” or “Nothing happened when I executed this script.”
This script will only defrag those indexes that surpass the specified threshold. If you’re not seeing your index in the output, try executing this:
EXEC dbo.dba_indexDefrag_sp @executeSQL = 0 , @printCommands = 1 , @minFragmentation = 0 , @printFragmentation = 1;
Check to see what your index’s fragmentation level is. Maybe it’s not as fragmented as you feared.
2) “My indexes are still fragmented after running this script.”
To quote The Powers That Be (aka Microsoft)…
“In general, fragmentation on small indexes is often not controllable. The pages of small indexes are stored on mixed extents. Mixed extents are shared by up to eight objects, so the fragmentation in a small index might not be reduced after reorganizing or rebuilding the index.” — Reorganizing and Rebuilding Indexes
3) “Can I use this in my production environment?”
That really depends on your environment. I’ve successfully used this in some very large production environments. However, I wouldn’t exactly recommend executing the script in the middle of a business day on a billion+ row, heavily fragmented, unpartitioned table, either.
If you’re not sure what the impact will be, execute the commands-only version of the script…
EXEC dbo.dba_indexDefrag_sp @executeSQL = 0 , @printCommands = 1 , @printFragmentation = 1;
… then execute the statements one at a time. Make sure you monitor tempdb and the transaction log to ensure you don’t have any space issues.
If you have any additional questions or suggestions for this script, leave me a comment and I’ll be happy to help.
Partitioning Example
The following code will walk you through the process of creating a partitioned table:
-------------------------------- -- Create A Partitioned Table -- -------------------------------- /* Create a partition function. */ CREATE Partition FUNCTION [test_monthlyDateRange_pf] (SMALLDATETIME) AS Range RIGHT FOR VALUES ('2008-01-01', '2008-02-01', '2008-03-01'); Go /* Associate the partition function with a partition scheme. */ CREATE Partition Scheme test_monthlyDateRange_ps AS Partition test_monthlyDateRange_pf All TO ([PRIMARY]); Go /* Create your first partitioned table! Make sure the data types match. */ CREATE TABLE dbo.orders ( order_id INT IDENTITY(1,1) Not Null , orderDate SMALLDATETIME Not Null CONSTRAINT PK_orders PRIMARY KEY CLUSTERED ( orderDate , order_id ) ) ON test_monthlyDateRange_ps(orderDate); Go /* Create some records to play with. */ INSERT INTO dbo.orders SELECT '2007-12-31' UNION All SELECT '2008-01-02' UNION All SELECT '2008-01-03' UNION All SELECT '2008-01-04' UNION All SELECT '2008-02-01' UNION All SELECT '2008-02-02' UNION All SELECT '2008-03-01' UNION All SELECT '2008-03-02'; /* The $partition function can be used to interrogate partition data. Let's use it to see where those records are physically located. */ SELECT $partition.test_monthlyDateRange_pf(orderDate) AS 'partition_number' , * FROM dbo.orders; /* By default, all new indexes are created on the partition. Let's create an aligned index */ CREATE NONCLUSTERED INDEX IX_orders_aligned ON dbo.orders(order_id) ON test_monthlyDateRange_ps(orderDate); /* Now let's create an un-aligned index. We'll need to specify the filegroup. */ CREATE NONCLUSTERED INDEX IX_orders_nonaligned ON dbo.orders(order_id) ON [PRIMARY]; -- can be any filegroup /* Review your indexes */ EXECUTE SP_HELPINDEX orders;
Using the previous code as a building block, let’s try swapping partitions:
-------------------------- -- Swap Out A Partition -- -------------------------- /* We need to drop our un-aligned index; otherwise we'll get an error when we attempt to do the switch. */ DROP INDEX IX_orders_nonaligned ON dbo.orders; /* Create the table to hold the data you're swapping out. The table structures must match identically; however, DO NOT partition this table. */ CREATE TABLE dbo.orders_stage_swapOut ( order_id INT Not Null , orderDate SMALLDATETIME Not Null CONSTRAINT PK_orders_stage_swapOut PRIMARY KEY CLUSTERED ( orderDate , order_id ) ) ON [PRIMARY]; Go /* Create the table to hold the data you're swapping in. The table structures must match identically; however, DO NOT partition this table. */ CREATE TABLE dbo.orders_stage_swapIn ( order_id INT Not Null , orderDate SMALLDATETIME Not Null CONSTRAINT PK_orders_stage_swapIn PRIMARY KEY CLUSTERED ( orderDate , order_id ) ) ON [PRIMARY]; Go /* Populate the table you're swapping in. */ INSERT INTO dbo.orders_stage_swapIn SELECT -5, '2008-02-02' UNION All SELECT -4, '2008-02-03' UNION All SELECT -3, '2008-02-04' UNION All SELECT -2, '2008-02-05' UNION All SELECT -1, '2008-02-06'; /* Create any indexes on your table to match the indexes on your partitioned table. */ CREATE NONCLUSTERED INDEX IX_orders_stage_swapIn ON dbo.orders_stage_swapIn(order_id); /* Add a check constraint for the partition to be swapped in. This step is required. */ ALTER TABLE dbo.orders_stage_swapIn WITH CHECK ADD CONSTRAINT orders_stage_swapIn_orderDateCK CHECK (orderDate >= '2008-02-01' And orderDate < '2008-03-01'); Go /* Swap out the old partition. */ ALTER TABLE dbo.orders Switch Partition 3 TO dbo.orders_stage_swapOut; Go /* Swap in the new partition. */ ALTER TABLE dbo.orders_stage_swapIn Switch TO dbo.orders Partition 3; Go --------------------- -- Check your data -- --------------------- /* You should have 2 records in here. */ SELECT * FROM dbo.orders_stage_swapOut; /* You should have 5 records here. */ SELECT * FROM dbo.orders WHERE orderDate >= '2008-02-01' And orderDate < '2008-03-01'; /* There should be no records in this table. */ SELECT * FROM dbo.orders_stage_swapIn; SELECT $partition.test_monthlyDateRange_pf(orderDate) AS 'partition_number' , * FROM dbo.orders; /* Clean-up time! Drop Table dbo.orders Drop Table dbo.orders_stage_swapOut Drop Table dbo.orders_stage_swapIn Drop Partition Scheme test_monthlyDateRange_ps Drop Partition Function [test_monthlyDateRange_pf] */
Pretty easy, huh?
Partitioning 101
A few posts ago, I briefly discussed partitioning. This article will continue that discussion and focus on horizontal partitioning within SQL Server 2005.
What is partitioning?
As I’ve mentioned before, horizontal partitioning is the physical segregation of a single logical table name into multiple, *identical* physical structures. In SQL 2005, all table and index pages actually exist on a partition. However, if you’ve never created a partitioning scheme, then by default each of your tables and indexes contain just a single partition.
I’ll also be talking about aligned indexes and tables, so let’s go ahead and define that here. Two tables sharing the same partitioning function and the same partitioning key are aligned. Similarly, a partitioned index on a partitioned table sharing the same partitioning key is called an aligned index. Having tables and indexes in alignment can reduce CPU and improve parallel plan execution.
Why partitioning?
The two most common reasons to implement partitioning are performance improvement and ease of archiving. Let’s talk about performance improvement first. We made the decision to implement partitioning in one of our data warehouses because of the size of the tables. We had a couple of tables that exceeded a billion rows, and several tables that were fast approaching that line. Upon deploying partitioning, CPU utilization dropped by ~25%, and our regular CPU and disk spikes caused by reporting processes were all but eliminated. Many of our stored procedures experienced reduced durations and reads in the range of 30-60%, and some had even greater improvements.
Why were we seeing this improvement? Many of the stored procedures were accessing data for recent dates. Knowing this helped us to decide on a weekly partitioning scheme. Stored procedures joining on aligned tables were now only accessing a small portion of the data, resulting in the aforementioned improvements. In addition, ad hocs that performed index scans now completed in a fraction of the time because the amount of data being scanned was significantly reduced.
Table maintenance also saw drastic improvement. It was previously unheard of to attempt an online index defrag on some of the largest tables without first scheduling down-time. Now, these same tables were being defragged nightly in one minute or less, without issue. This is because index defrag operations can be performed by partition. This is a great resource save; obviously, once a partition is no longer written to and is fully defragmented, it will not become re-fragmented, so there is no need to waste resources on it. Refer to my Index Defrag Script for an example on how to automate this process.
Another great feature of partitioned tables is the ability to swap partitions in and out. I took advantage of this feature with great success when backfilling data. My backfill strategy primarily consisted of bcp’ing data in weekly chunks to a staging database. The table I was backfilling had a text column where one record happened to contain the column deliminator for the bcp file. This of course wasn’t caught until auditing data counts prior to deployment, when it was noticed that one week was short about ~2mm records. Now, this was ~2mm records out of ~800mm total records. The solution? Re-run the bcp process, using a different deliminator, dump the data in a staging table, and swap out the partition. The actual swap out took 2 seconds and there was no impact to the server. It was beautiful.
Swapping out partitions is particularly beneficial for archiving. Instead of writing a batch process to loop through and delete expired data, which is both time consuming and resource intensive, a partition can be swapped out and dropped in a matter of seconds. Keep in mind that, in SQL Server 2005, partitions can only be swapped out on non-replicated tables. SQL 2008 introduces functionality to allow partition swapping on a replicated table. Check out Replicating Partitioned Tables and Indexes.
That sounds great! Where do I sign up?
There are a few things to consider prior to implementing partitioning in your environment. One is the performance hit to writes. I’ve noticed around a 10% decrease in write performance in my environment. This is partly due to the overhead required to look up which partition the record should be written to. For this reason, I only partition tables that I expect to grow quickly (my rule of thumb is >10mm records per week), or tables that have regular archiving needs.
There’s also a performance hit when querying over many partitions. Basically, think of a giant UNION ALL statement that occurs every time more than one partition is accessed. With our production hardware, we noticed that duration was still less until you hit around 30 partitions. After that, the duration was greater for partitioned tables versus non-partitioned, but reads were still less overall. This will probably vary depending on your environment, so run some tests and make sure you know what kind of impact to expect. This will also help you to determine your partitioning scheme. For example, you wouldn’t want to partition on a date column using a weekly scheme if every report needs to access data for the previous 12 months.
Also make sure you have a good candidate for partitioning. That is, do not partition on a column that is frequently changed or is written non-sequentially. Both can cause overhead to your environment and may negate the benefits of partitioning. Keep in mind that when the value of the partitioning key is changed, the record is not relocated to the correct partition. Some ideal keys include surrogate identity and sequential datetime columns.
Other things to know…
• The partitioning function requires the declaration of a parameter data type. To partition a table, the partitioning key is defined during new table creation. The data types of the partitioning keys must match. For example, if you create a partitioning function using a datetime data type and try to apply it to a table with a smalldatetime column as the partitioning key, the CREATE TABLE statement will fail.
• Ensure that the partitioning key is *always* used in a query on a partitioned table. If the partitioning key is not included in the query search criteria, then the partitioned table will perform worse than a non-partitioned table.
• By default, any index created on a partitioned table is also partitioned. This is useful for creating aligned indexes, which are typically desired. Sometimes, however, you may want to create a non-aligned or non-partitioned index. In these cases, make sure to explicitly declare the desired filegroup during the index creation process.
• Temporary tables can also be partitioned; however, this would probably not make sense in most circumstances.
• Partitioning functions need to be monitored to ensure you don’t run out of defined ranges; if that were to happen, you’d be storing all new data in one really large partition, which wouldn’t be pleasant.
Thanks for wading through this long post. I hope the information contained proves informative and helpful. I’ll be following this post with some actual code to help illustrate some of what I’ve covered here.
Regards,
Michelle
Tips for Large Data Stores
I’m currently working on projects that involve the collection of massive amounts of data (i.e. terabyte-class, billion-row tables). One of the challenges of collecting so much data is the ability to report on the data quickly. To this end, I plan to spend the next several blog posts discussing tips for designing and managing large relational data stores. These are primarily design practices that have performed well for me within the various environments in which I’ve worked.
Partitioning
• Partitioning is the segregation of a single logical table into separate, identical physical structures. Basically, every table is stored in at least one partition; “partitioned tables” are merely comprised of many partitions.
• Implemented correctly, partitioning can have dramatic improvement on read performance, index maintenance cost reduction, storage of large fact tables on separate disks, and data archiving.
• If you’re not familiar with partitioning, check out Kimberly Tripp’s excellent white paper, Partitioned Tables and Indexes in SQL Server 2005.
• I cannot rave enough about the impact partitioning has had within our environment. To take one of the most extreme examples of improvement, the execution time of one BI report decreased from 2.5+ hours to 20 minutes after we implemented partitioning. While not all stored procedures experienced such phenomenal improvement, improvement in the range of 30% – 60% was very common.
• While partitioning does have many benefits, there are some negatives. Namely, more expensive writes (I’ve noticed around 10%), increased duration of queries that span many partitions (i.e. queries on long time spans, such as a year), and increased maintenance needs.
• My rule of thumb is to partition any table with growth rates > 10mm records per week or with regular archiving needs.
• Much more to come on partitioning in the near future.
Indexing
• Any experienced DBA can tell you that effective index management is critical.
• Ensure your stored procedures are using index seeks instead of scans. If scans cannot be avoided, consider using partitioning with aligned indexes to limit the amount of pages scanned.
• Periodically look for un-used indexes and remove them to improve write performance.
• Look for missing indexes, create one or two, and evaluate. Try composite indexes and included columns as a way to limit the amount of indexes you need to create.
• Defrag your indexes! Nightly, if you can get away with it.
Stored Procedure Tuning
• When querying on large amounts of data, try breaking your queries up into individual components and storing in temporary tables, then performing the joins.
• When querying on large date ranges, try looping through the days and inserting the data into a temp table. This can be especially beneficial when querying partitioned tables.
• Create indexes on your temporary tables!
• Utilize index seeks whenever possible.
• If permitted to use dirty reads and your environment’s isolation level is not defaulted to read uncommitted, use With (NoLock) hints.
• Temporary tables can be partitioned! This is usually not necessary but can sometimes be beneficial.
• Try replacing table variables with temporary tables. My rule of thumb is this: if I’m performing joins on the temporary data or if I have more than 100 records, use a temporary table. Table variables can decrease stored procedure recompiles but can sometimes have a negative impact on performance. When in doubt, test each method and evaluate which works better for your needs.
• *Always* include the partitioning key when querying partitioned tables.









