Replication Bug with Partitioned Tables

November 23, 2009 by Michelle Ufford · Leave a Comment
Filed under: Miscellaneous, SQL Tips, Syndication 

Recently, we came across a bug in SQL Server 2005 on one of our production servers. Apparently, if you execute an ALTER TABLE statement on a replicated table with more than 128 partitions, the log reader will fail. A relatively obscure bug, I know. Microsoft has recognized this as a confirmed bug, but I couldn’t find it anywhere on the intertubes, thus the inspiration for this blog post. Microsoft’s official solution for this issue is to upgrade to SQL Server 2008.

For various reasons, we were unable to execute an upgrade at the time. And since this was a 2 terabyte database, we wanted to come up with a solution that wouldn’t involve reinitializing the entire publication. Our quick-fix while we were troubleshooting the issue was to create a linked server to the production box. Not ideal, I know, but it worked in a pinch and minimized exposure of the issue. Fortunately for us, we were able to solve the problem on the publication database pretty easily. All of the affected partition functions had empty partitions created several months in the future, so we simply merged any empty partition ranges for future dates. Our solution to our now-out-of-date subscribers was to apply static row filtering to any table with more than 100 million records. While this would introduce some overhead with the replication of these tables, it would allow us a much faster recovery time. We decided to use the start of the most recent hour as our filtering criteria, just to give us a “clean” filter, so we had to delete data from any table where we were going to apply the filter. After that, it was simply a matter of resuming replication.

All things considered, it took us a little over a day to recover from the issue. Most of that time was spent troubleshooting the problem and identifying a workable solution; actual execution of the changes was pretty quick. Moral of the story? Upgrade to SQL Server 2008. :)

Partitioning Tricks

For those of you who are using partitioning, or who are considering using partitioning, allow me to share some tips with you.

Easy Partition Staging Tables

Switching partitions (or more specifically, hobts) in and out of a partitioned table requires the use of a staging table. The staging table has very specific requirements: it must be completely identical to the partitioned table, including indexing structures, and it must have a check constraint that limits data to the partitioning range. Thanks to my co-worker Jeff, I’ve recently started using the SQL Server Partition Management tool on CodePlex. I haven’t used the automatic partition switching feature — frankly, using any sort of data modification tool in a production environment makes me nervous — but I’ve been using the scripting option to create staging tables in my development environment, which I then copy to production for use. It’s nothing you can’t do yourself, but it does make the whole process easy and painless, plus it saves you from annoying typos. But be careful when using this tool to just create the table and check constraints automatically, because you may need to…

Add Check Constraints After Loading Data

Most of the time, I add the check constraint when I create the staging table, then I load data and perform the partition switch. However, for some reason, I was receiving the following error:

.Net SqlClient Data Provider: Msg 4972, Level 16, State 1, Line 1
ALTER TABLE SWITCH statement failed. Check constraints or partition function of source table ‘myStagingTable’ allows values that are not allowed by check constraints or partition function on target table ‘myDestinationTable’.

This drove me crazy. I confirmed my check constraints were correct, that I had the correct partition number, and that all schema and indexes matched identically. After about 30 minutes of this, I decided to drop and recreate the constraint. For some reason, it fixed the issue. Repeat tests produced the same results: the check constraint needed to be added *after* data was loaded. This error is occurring on a SQL Server 2008 SP1 box; to be honest, I’m not sure what’s causing the error, so if you know, please leave me a comment. But I figured I’d share so that anyone else running into this issue can hopefully save some time and headache. :)

Replicating Into Partitioned and Non-Partitioned Tables

Recently, we needed to replicate a non-partitioned table to two different destinations. We wanted to use partitioning for Server A, which has 2008 Enterprise; Server B, which is on 2005 Standard, could not take advantage of partitioning. The solution was really easy: create a pre-snapshot and post-snapshot script for the publication, then modify to handle each server group differently. Using pseudo-code, it looked something like this:

/* Identify which servers get the partitioned version */
IF @@SERVERNAME In ('yourServerNameList') 
BEGIN
 
    /* Create your partitioning scheme if necessary */
    IF Not Exists(SELECT * FROM sys.partition_schemes WHERE name = 'InsertPartitionScheme')
        CREATE PARTITION SCHEME InsertPartitionScheme 
            AS PARTITION InsertPartitionFunction ALL TO ([PRIMARY]);    
 
    /* Create your partitioning function if necessary */
    IF Not Exists(SELECT * FROM sys.partition_functions WHERE name = 'InsertPartitionFunction')
        CREATE PARTITION FUNCTION InsertPartitionFunction (SMALLDATETIME) 
            AS RANGE RIGHT FOR VALUES ('insertValues');    
 
    /* Create a partitioned version of your table */
    CREATE TABLE [dbo].[yourTableName] (
	    [yourTableSchema]
    ) ON InsertPartitionScheme([partitioningKey]);
 
END
ELSE
BEGIN
 
    /* Create a non-partitioned version of your table */
    CREATE TABLE [dbo].[yourTableName] (
	    [yourTableSchema]
    ) ON [PRIMARY];
 
END

You could also use an edition check instead of a server name check, if you prefer. The post-snapshot script basically looked the same, except you create partitioned indexes instead.

Compress Old Partitions

Did you know you can set different compression levels for individual partitions? It’s true! I’ve just completed doing this on our largest partitioned table. Here’s how:

/* Apply compression to your partitioned table */
ALTER TABLE dbo.yourTableName
Rebuild Partition = All
WITH 
(
      Data_Compression = Page ON Partitions(1 TO 9)
    , Data_Compression = ROW  ON Partitions(10 TO 11) 
    , Data_Compression = NONE ON Partitions(12)
);
 
/* Apply compression to your partitioned index */
ALTER INDEX YourPartitionedIndex
    ON dbo.yourTableName
    Rebuild Partition = All
    WITH 
    (
      Data_Compression = Page ON Partitions(1 TO 9)
    , Data_Compression = ROW  ON Partitions(10 TO 11) 
    , Data_Compression = NONE ON Partitions(12)
    );
 
/* Apply compression to your unpartitioned index */
ALTER INDEX YourUnpartitionedIndex
    ON dbo.yourTableName
    Rebuild WITH (Data_Compression = ROW);

A couple of things to note. In all of our proof-of-concept testing, we found that compression significantly reduced query execution time, reads (IO), and storage. However, CPU was also increased significantly. The results were more dramatic, both good and bad, with page compression versus row compression. Still, for our older partitions, which aren’t queried regularly, it made sense to turn on page compression. The newer partitions receive row compression, and the newest partitions, which are still queried very regularly by routine processes, were left completely uncompressed. This seems to strike a nice balance in our environment, but of course, results will vary depending on how you use your data.

Something to be aware of is that compressing your clustered index does *not* compress your non-clustered indexes; those are separate operations. Lastly, for those who are curious, it took us about 1 minute to apply row compression and about 7 minutes to apply page compression to partitions averaging 30 million rows.

Looking for more information on table partitioning? Check out my overview of partitioning, my example code, and my article on indexing on partitioned tables.

Automated Index Defrag Script

Note: This script has been updated. You can find the latest version here: http://sqlfool.com/2009/06/index-defrag-script-v30/

So after much delay, here’s the latest and greatest version of my index defrag script.

A summary of the changes:

  • Added support for centralized execution. Option to defrag indexes for a single database from another database, or for all non-system databases.
  • Consolidated Enterprise and Standard versions of the script with new @editionCheck variable.
  • Added parametrization for MaxDop restrictions during rebuilds; validates that the value does not exceed the actual number of processors on the server.
  • Changed minimum fragmentation default value from 10 to 5 to match BOL recommendations.
  • Limited defrags to objects with more than 8 pages.
  • Added a debug option to give a little more insight into what’s happening and to assist with troubleshooting.
  • Updated logic for handling partitions and LOBs.

And a couple of notes and explanations:

Don’t you know you can just pass NULL to sys.dm_db_index_physical_stats for the DatabaseID?
Yes, I realize you can do this. But I don’t want to defrag the system databases, i.e. tempdb, so I opted to handle it this way instead.

How long will this take?
It depends. I don’t necessarily recommend running it without specifying a database; at least, not unmonitored. You *can* do that, but it could take a while. For example, to run sys.dm_db_index_physical_stats for all databases and tables, totaling 2TB, took me 4.5 hours; that doesn’t even count the actual defrags.

Where should I put this?
It’s up to you. If you have a database for items like centralized maintenance or scratch tables, that may be a good place for it. If you prefer, you could also put this in each individual database and call it locally, too. I would not put this in the master or msdb databases.

This is pretty close to a complete rewrite, so please let me know if you encounter any bugs. And now… the code!

IF Not Exists(SELECT [OBJECT_ID] FROM sys.tables WHERE name = N'dba_indexDefragLog')
BEGIN
    -- Drop Table dbo.dba_indexDefragLog
    CREATE TABLE dbo.dba_indexDefragLog
    (
          indexDefrag_id    INT IDENTITY(1,1)   Not Null
        , databaseID        INT                 Not Null
        , databaseName      NVARCHAR(128)       Not Null
        , objectID          INT                 Not Null
        , objectName        NVARCHAR(128)       Not Null
        , indexID           INT                 Not Null
        , indexName         NVARCHAR(128)       Not Null
        , partitionNumber   SMALLINT            Not Null
        , fragmentation     FLOAT               Not Null
        , page_count        INT                 Not Null
        , dateTimeStart     DATETIME            Not Null
        , durationSeconds   INT                 Not Null
        CONSTRAINT PK_indexDefragLog PRIMARY KEY CLUSTERED (indexDefrag_id)
    )
 
    PRINT 'dba_indexDefragLog Table Created';
END
 
IF OBJECTPROPERTY(OBJECT_ID('dbo.dba_indexDefrag_sp'), N'IsProcedure') = 1
BEGIN
    DROP PROCEDURE dbo.dba_indexDefrag_sp;
    PRINT 'Procedure dba_indexDefrag_sp dropped';
END;
Go
 
 
CREATE PROCEDURE dbo.dba_indexDefrag_sp
 
    /* Declare Parameters */
      @minFragmentation     FLOAT           = 5.0  
        /* in percent, will not defrag if fragmentation less than specified */
    , @rebuildThreshold     FLOAT           = 30.0  
        /* in percent, greater than @rebuildThreshold will result in rebuild instead of reorg */
    , @executeSQL           BIT             = 1     
        /* 1 = execute; 0 = print command only */
    , @DATABASE             VARCHAR(128)    = Null
        /* Option to specify a database name; null will return all */
    , @tableName            VARCHAR(4000)   = Null  -- databaseName.schema.tableName
        /* Option to specify a table name; null will return all */
    , @onlineRebuild        BIT             = 1     
        /* 1 = online rebuild; 0 = offline rebuild; only in Enterprise */
    , @maxDopRestriction    TINYINT         = Null
        /* Option to restrict the number of processors for the operation; only in Enterprise */
    , @printCommands        BIT             = 0     
        /* 1 = print commands; 0 = do not print commands */
    , @printFragmentation   BIT             = 0
        /* 1 = print fragmentation prior to defrag; 
           0 = do not print */
    , @defragDelay          CHAR(8)         = '00:00:05'
        /* time to wait between defrag commands */
    , @debugMode            BIT             = 0
        /* display some useful comments to help determine if/where issues occur */
AS
/*********************************************************************************
    Name:       dba_indexDefrag_sp
 
    Author:     Michelle Ufford, http://sqlfool.com
 
    Purpose:    Defrags all indexes for one or more databases
 
    Notes:
 
    CAUTION: TRANSACTION LOG SIZE MUST BE MONITORED CLOSELY WHEN DEFRAGMENTING.
 
      @minFragmentation     defaulted to 10%, will not defrag if fragmentation 
                            is less than that
 
      @rebuildThreshold     defaulted to 30% as recommended by Microsoft in BOL;
                            greater than 30% will result in rebuild instead
 
      @executeSQL           1 = execute the SQL generated by this proc; 
                            0 = print command only
 
      @database             Optional, specify specific database name to defrag;
                            If not specified, all non-system databases will
                            be defragged.
 
      @tableName            Specify if you only want to defrag indexes for a 
                            specific table, format = databaseName.schema.tableName;
                            if not specified, all tables will be defragged.
 
      @onlineRebuild        1 = online rebuild; 
                            0 = offline rebuild
 
      @maxDopRestriction    Option to specify a processor limit for index rebuilds
 
      @printCommands        1 = print commands to screen; 
                            0 = do not print commands
 
      @printFragmentation   1 = print fragmentation to screen;
                            0 = do not print fragmentation
 
      @defragDelay          time to wait between defrag commands; gives the
                            server a little time to catch up 
 
      @debugMode            1 = display debug comments; helps with troubleshooting
                            0 = do not display debug comments
 
    Called by:  SQL Agent Job or DBA
 
    Date        Initials	Description
    ----------------------------------------------------------------------------
    2008-10-27  MFU         Initial Release for public consumption
    2008-11-17  MFU         Added page-count to log table
                            , added @printFragmentation option
    2009-03-17  MFU         Provided support for centralized execution, 
                            , consolidated Enterprise & Standard versions
                            , added @debugMode, @maxDopRestriction
                            , modified LOB and partition logic                            
*********************************************************************************
    Exec dbo.dba_indexDefrag_sp
          @executeSQL           = 0
        , @minFragmentation     = 80
        , @printCommands        = 1
        , @debugMode            = 1
        , @printFragmentation   = 1
        , @database             = 'AdventureWorks'
        , @tableName            = 'AdventureWorks.Sales.SalesOrderDetail';
*********************************************************************************/																
 
SET NOCOUNT ON;
SET XACT_Abort ON;
SET Quoted_Identifier ON;
 
BEGIN
 
    IF @debugMode = 1 RAISERROR('Dusting off the spiderwebs and starting up...', 0, 42) WITH NoWait;
 
    /* Declare our variables */
    DECLARE   @objectID             INT
            , @databaseID           INT
            , @databaseName         NVARCHAR(128)
            , @indexID              INT
            , @partitionCount       BIGINT
            , @schemaName           NVARCHAR(128)
            , @objectName           NVARCHAR(128)
            , @indexName            NVARCHAR(128)
            , @partitionNumber      SMALLINT
            , @partitions           SMALLINT
            , @fragmentation        FLOAT
            , @pageCount            INT
            , @sqlCommand           NVARCHAR(4000)
            , @rebuildCommand       NVARCHAR(200)
            , @dateTimeStart        DATETIME
            , @dateTimeEnd          DATETIME
            , @containsLOB          BIT
            , @editionCheck         BIT
            , @debugMessage         VARCHAR(128)
            , @updateSQL            NVARCHAR(4000)
            , @partitionSQL         NVARCHAR(4000)
            , @partitionSQL_Param   NVARCHAR(1000)
            , @LOB_SQL              NVARCHAR(4000)
            , @LOB_SQL_Param        NVARCHAR(1000);
 
    /* Create our temporary tables */
    CREATE TABLE #indexDefragList
    (
          databaseID        INT
        , databaseName      NVARCHAR(128)
        , objectID          INT
        , indexID           INT
        , partitionNumber   SMALLINT
        , fragmentation     FLOAT
        , page_count        INT
        , defragStatus      BIT
        , schemaName        NVARCHAR(128)   Null
        , objectName        NVARCHAR(128)   Null
        , indexName         NVARCHAR(128)   Null
    );
 
    CREATE TABLE #databaseList
    (
          databaseID        INT
        , databaseName      VARCHAR(128)
    );
 
    CREATE TABLE #processor 
    (
          [INDEX]           INT
        , Name              VARCHAR(128)
        , Internal_Value    INT
        , Character_Value   INT
    );
 
    IF @debugMode = 1 RAISERROR('Beginning validation...', 0, 42) WITH NoWait;
 
    /* Just a little validation... */
    IF @minFragmentation Not Between 0.00 And 100.0
        SET @minFragmentation = 5.0;
 
    IF @rebuildThreshold Not Between 0.00 And 100.0
        SET @rebuildThreshold = 30.0;
 
    IF @defragDelay Not Like '00:[0-5][0-9]:[0-5][0-9]'
        SET @defragDelay = '00:00:05';
 
    /* Make sure we're not exceeding the number of processors we have available */
    INSERT INTO #processor
    EXECUTE XP_MSVER 'ProcessorCount';
 
    IF @maxDopRestriction IS Not Null And @maxDopRestriction > (SELECT Internal_Value FROM #processor)
        SELECT @maxDopRestriction = Internal_Value
        FROM #processor;
 
    /* Check our server version; 1804890536 = Enterprise, 610778273 = Enterprise Evaluation, -2117995310 = Developer */
    IF (SELECT SERVERPROPERTY('EditionID')) In (1804890536, 610778273, -2117995310) 
        SET @editionCheck = 1 -- supports online rebuilds
    ELSE
        SET @editionCheck = 0; -- does not support online rebuilds
 
    IF @debugMode = 1 RAISERROR('Grabbing a list of our databases...', 0, 42) WITH NoWait;
 
    /* Retrieve the list of databases to investigate */
    INSERT INTO #databaseList
    SELECT database_id
        , name
    FROM sys.databases
    WHERE name = IsNull(@DATABASE, name)
        And database_id > 4 -- exclude system databases
        And [STATE] = 0; -- state must be ONLINE
 
    IF @debugMode = 1 RAISERROR('Looping through our list of databases and checking for fragmentation...', 0, 42) WITH NoWait;
 
    /* Loop through our list of databases */
    WHILE (SELECT COUNT(*) FROM #databaseList) > 0
    BEGIN
 
        SELECT TOP 1 @databaseID = databaseID
        FROM #databaseList;
 
        SELECT @debugMessage = '  working on ' + DB_NAME(@databaseID) + '...';
 
        IF @debugMode = 1
            RAISERROR(@debugMessage, 0, 42) WITH NoWait;
 
       /* Determine which indexes to defrag using our user-defined parameters */
        INSERT INTO #indexDefragList
        SELECT
              database_id AS databaseID
            , QUOTENAME(DB_NAME(database_id)) AS 'databaseName'
            , [OBJECT_ID] AS objectID
            , index_id AS indexID
            , partition_number AS partitionNumber
            , avg_fragmentation_in_percent AS fragmentation
            , page_count 
            , 0 AS 'defragStatus' /* 0 = unprocessed, 1 = processed */
            , Null AS 'schemaName'
            , Null AS 'objectName'
            , Null AS 'indexName'
        FROM sys.dm_db_index_physical_stats (@databaseID, OBJECT_ID(@tableName), Null , Null, N'Limited')
        WHERE avg_fragmentation_in_percent >= @minFragmentation 
            And index_id > 0 -- ignore heaps
            And page_count > 8 -- ignore objects with less than 1 extent
        OPTION (MaxDop 1);
 
        DELETE FROM #databaseList
        WHERE databaseID = @databaseID;
 
    END
 
    CREATE CLUSTERED INDEX CIX_temp_indexDefragList
        ON #indexDefragList(databaseID, objectID, indexID, partitionNumber);
 
    SELECT @debugMessage = 'Looping through our list... there''s ' + CAST(COUNT(*) AS VARCHAR(10)) + ' indexes to defrag!'
    FROM #indexDefragList;
 
    IF @debugMode = 1 RAISERROR(@debugMessage, 0, 42) WITH NoWait;
 
    /* Begin our loop for defragging */
    WHILE (SELECT COUNT(*) FROM #indexDefragList WHERE defragStatus = 0) > 0
    BEGIN
 
        IF @debugMode = 1 RAISERROR('  Picking an index to beat into shape...', 0, 42) WITH NoWait;
 
        /* Grab the most fragmented index first to defrag */
        SELECT TOP 1 
              @objectID         = objectID
            , @indexID          = indexID
            , @databaseID       = databaseID
            , @databaseName     = databaseName
            , @fragmentation    = fragmentation
            , @partitionNumber  = partitionNumber
            , @pageCount        = page_count
        FROM #indexDefragList
        WHERE defragStatus = 0
        ORDER BY fragmentation DESC;
 
        IF @debugMode = 1 RAISERROR('  Looking up the specifics for our index...', 0, 42) WITH NoWait;
 
        /* Look up index information */
        SELECT @updateSQL = N'Update idl
            Set schemaName = QuoteName(s.name)
                , objectName = QuoteName(o.name)
                , indexName = QuoteName(i.name)
            From #indexDefragList As idl
            Inner Join ' + @databaseName + '.sys.objects As o
                On idl.objectID = o.object_id
            Inner Join ' + @databaseName + '.sys.indexes As i
                On o.object_id = i.object_id
            Inner Join ' + @databaseName + '.sys.schemas As s
                On o.schema_id = s.schema_id
            Where o.object_id = ' + CAST(@objectID AS VARCHAR(10)) + '
                And i.index_id = ' + CAST(@indexID AS VARCHAR(10)) + '
                And i.type > 0
                And idl.databaseID = ' + CAST(@databaseID AS VARCHAR(10));
 
        EXECUTE SP_EXECUTESQL @updateSQL;
 
        /* Grab our object names */
        SELECT @objectName  = objectName
            , @schemaName   = schemaName
            , @indexName    = indexName
        FROM #indexDefragList
        WHERE objectID = @objectID
            And indexID = @indexID
            And databaseID = @databaseID;
 
        IF @debugMode = 1 RAISERROR('  Grabbing the partition count...', 0, 42) WITH NoWait;
 
        /* Determine if the index is partitioned */
        SELECT @partitionSQL = 'Select @partitionCount_OUT = Count(*)
                                    From ' + @databaseName + '.sys.partitions
                                    Where object_id = ' + CAST(@objectID AS VARCHAR(10)) + '
                                        And index_id = ' + CAST(@indexID AS VARCHAR(10)) + ';'
            , @partitionSQL_Param = '@partitionCount_OUT int OutPut';
 
        EXECUTE SP_EXECUTESQL @partitionSQL, @partitionSQL_Param, @partitionCount_OUT = @partitionCount OUTPUT;
 
        IF @debugMode = 1 RAISERROR('  Seeing if there''s any LOBs to be handled...', 0, 42) WITH NoWait;
 
        /* Determine if the table contains LOBs */
        SELECT @LOB_SQL = ' Select Top 1 @containsLOB_OUT = column_id
                            From ' + @databaseName + '.sys.columns With (NoLock) 
                            Where [object_id] = ' + CAST(@objectID AS VARCHAR(10)) + '
                                And (system_type_id In (34, 35, 99)
                                        Or max_length = -1);'
                            /*  system_type_id --> 34 = image, 35 = text, 99 = ntext
                                max_length = -1 --> varbinary(max), varchar(max), nvarchar(max), xml */
                , @LOB_SQL_Param = '@containsLOB_OUT int OutPut';
 
        EXECUTE SP_EXECUTESQL @LOB_SQL, @LOB_SQL_Param, @containsLOB_OUT = @containsLOB OUTPUT;
 
        IF @debugMode = 1 RAISERROR('  Building our SQL statements...', 0, 42) WITH NoWait;
 
        /* If there's not a lot of fragmentation, or if we have a LOB, we should reorganize */
        IF @fragmentation < @rebuildThreshold Or @containsLOB = 1 Or @partitionCount > 1
        BEGIN
 
            SET @sqlCommand = N'Alter Index ' + @indexName + N' On ' + @databaseName + N'.' 
                                + @schemaName + N'.' + @objectName + N' ReOrganize';
 
            /* If our index is partitioned, we should always reorganize */
            IF @partitionCount > 1
                SET @sqlCommand = @sqlCommand + N' Partition = ' 
                                + CAST(@partitionNumber AS NVARCHAR(10));
 
        END;
 
        /* If the index is heavily fragmented and doesn't contain any partitions or LOB's, rebuild it */
        IF @fragmentation >= @rebuildThreshold And IsNull(@containsLOB, 0) != 1 And @partitionCount <= 1
        BEGIN
 
            /* Set online rebuild options; requires Enterprise Edition */
            IF @onlineRebuild = 1 And @editionCheck = 1 
                SET @rebuildCommand = N' Rebuild With (Online = On';
            ELSE
                SET @rebuildCommand = N' Rebuild With (Online = Off';
 
            /* Set processor restriction options; requires Enterprise Edition */
            IF @maxDopRestriction IS Not Null And @editionCheck = 1
                SET @rebuildCommand = @rebuildCommand + N', MaxDop = ' + CAST(@maxDopRestriction AS VARCHAR(2)) + N')';
            ELSE
                SET @rebuildCommand = @rebuildCommand + N')';
 
            SET @sqlCommand = N'Alter Index ' + @indexName + N' On ' + @databaseName + N'.'
                            + @schemaName + N'.' + @objectName + @rebuildCommand;
 
        END;
 
        /* Are we executing the SQL?  If so, do it */
        IF @executeSQL = 1
        BEGIN
 
            IF @debugMode = 1 RAISERROR('  Executing SQL statements...', 0, 42) WITH NoWait;
 
            /* Grab the time for logging purposes */
            SET @dateTimeStart  = GETDATE();
            EXECUTE SP_EXECUTESQL @sqlCommand;
            SET @dateTimeEnd  = GETDATE();
 
            /* Log our actions */
            INSERT INTO dbo.dba_indexDefragLog
            (
                  databaseID
                , databaseName
                , objectID
                , objectName
                , indexID
                , indexName
                , partitionNumber
                , fragmentation
                , page_count
                , dateTimeStart
                , durationSeconds
            )
            SELECT
                  @databaseID
                , @databaseName
                , @objectID
                , @objectName
                , @indexID
                , @indexName
                , @partitionNumber
                , @fragmentation
                , @pageCount
                , @dateTimeStart
                , DATEDIFF(SECOND, @dateTimeStart, @dateTimeEnd);
 
            /* Just a little breather for the server */
            WAITFOR Delay @defragDelay;
 
            /* Print if specified to do so */
            IF @printCommands = 1
                PRINT N'Executed: ' + @sqlCommand;
        END
        ELSE
        /* Looks like we're not executing, just printing the commands */
        BEGIN
            IF @debugMode = 1 RAISERROR('  Printing SQL statements...', 0, 42) WITH NoWait;
 
            IF @printCommands = 1 PRINT IsNull(@sqlCommand, 'error!');
        END
 
        IF @debugMode = 1 RAISERROR('  Updating our index defrag status...', 0, 42) WITH NoWait;
 
        /* Update our index defrag list so we know we've finished with that index */
        UPDATE #indexDefragList
        SET defragStatus = 1
        WHERE databaseID       = @databaseID
          And objectID         = @objectID
          And indexID          = @indexID
          And partitionNumber  = @partitionNumber;
 
    END
 
    /* Do we want to output our fragmentation results? */
    IF @printFragmentation = 1
    BEGIN
 
        IF @debugMode = 1 RAISERROR('  Displaying fragmentation results...', 0, 42) WITH NoWait;
 
        SELECT databaseID
            , databaseName
            , objectID
            , objectName
            , indexID
            , indexName
            , fragmentation
            , page_count
        FROM #indexDefragList;
 
    END;
 
    /* When everything is said and done, make sure to get rid of our temp table */
    DROP TABLE #indexDefragList;
    DROP TABLE #databaseList;
    DROP TABLE #processor;
 
    IF @debugMode = 1 RAISERROR('DONE!  Thank you for taking care of your indexes!  :)', 0, 42) WITH NoWait;
 
    SET NOCOUNT OFF;
	RETURN 0
END
Go

Thanks to my beta testers, @scoinva, @davidmtate, @jdanton, and @SuperCoolMoss!
Special thanks to SCM for keeping on me to finish this.

Happy Defragging!

Michelle

Source: http://sqlfool.com/2009/03/automated-index-defrag-script/

Creating a 60 GB Index

January 24, 2009 by Michelle Ufford · 1 Comment
Filed under: Performance & Tuning, SQL Tips 

Recently, I needed to create an index on a 1.5 billion row table. I’ve created some large indexes before, but this was the largest, so I thought I’d share my experience in case anyone was interested.

The plan was to create the following index:

CREATE NONCLUSTERED INDEX IX_indexName_unpartitioned
    ON databaseName.dbo.tableName(columnList)
    Include (includedColumnList)
    WITH (MaxDop = 1, Online = ON, Sort_In_TempDB = ON)
    ON [PRIMARY];

This is an unpartitioned, non-clustered index being built on a partitioned table. Normally, when you build an aligned partitioned index, the index creation process requires less memory and has less noticeable impact on the system, because each partition is handled individually, one at a time. But as this is an unpartitioned (unaligned) index, each partition was built concurrently, requiring more memory and causing a greater impact on performance. Because of this, I needed to restrict the process to MaxDop 1; otherwise, the server would suffer because of too much memory pressure.

I chose Sort_In_TempDB = On because:

  • I’m building this index online on a very busy table and cannot afford to impact normal oeprations. By using Sort_In_TempDB = On, index transactions are separated from user transactions, allowing the user transaction log to be truncated. *
  • TempDB is on a different volume and therefore should reduce the duration of the operation.
  • The recovery for the user database is full, and the recovery for the TempDB is simple. Sorting in TempDB would minimize logging.

* Note: the transaction log for the user database still grew at a much faster rate than normal and had to be closely monitored during this operation to ensure enough free space remained.

The size of the indexed columns is 25 bytes. So I ran my calculations and came up with 36gb space requirement. We increased TempDB to 50gb and gave it a go. An hour later… ERROR. The process terminated because there would not be enough space free in TempDB to complete the operation. Obviously, my calculations were incorrect. After speaking with Chris Leonard, a man who is way too smart for his own good, I realized I had not included my clustered index in the size calculations. Doh.

Re-running my estimates, here’s what I came up with:

Index Size 25   bytes
Clustered Index Size 16   bytes
Records per Page 197
Est. Rows 1,575,000,000
Est. Number of Pages 7,995,000
Space Requirements 59   GB

 

Obviously, 50gb of free space just wasn’t going to cut it. I decided to give TempDB a little wiggle room and bumped up the space in TempDB to 70gb (not as easy as you’d imagine, I had to requisition more space on the SAN), then re-ran the operation. Success! The process completed in 3 hours and 24 minutes. There was a mild increase in CPU, but no applications or users experienced any issues.

For those interested in the particulars: this was used for a single-record look-up and could not be filtered by the partitioning key. The non-partitioned version of this index has 80% less reads and 11% less CPU than its partitioned counterpart.

If you’re interested in learning more about indexes, here’s some recommended reading:

Indexing for Partitioned Tables

December 23, 2008 by Michelle Ufford · Leave a Comment
Filed under: Performance & Tuning, SQL Tips 

So you’ve partitioned your table, and now you’re ready to performance tune. As with any table, indexing is a great place to start. And if you’re like most people new to partitioning, you probably created all of your indexes on the partitioned scheme, either by design or unintentionally.

Let’s take a step back here and discuss what a partitioned index is. A partitioned index, like a partitioned table, separates data into different physical structures (partitions) under one logical name. Specifically, each partition in a nonclustered index contains its own B-tree structure with a subset of rows, based upon the partitioning scheme. By default, an unpartitioned nonclustered index has just one partition.

Keep in mind, when you create an index on a partitioned table, i.e.

CREATE NONCLUSTERED INDEX IX_myIndex
    ON dbo.myTable(myColumn);

… you are creating the index on the partitioned scheme by default. In order to create a NON-partitioned index on that same table, you would need to explicitly declare “On [FileGroup]“, i.e.

CREATE NONCLUSTERED INDEX IX_myIndex
    ON dbo.myTable(myColumn)
    ON [PRIMARY];

 

But should you partition your index? That depends on how you use it. In fact, most environments will probably want to use a mix of partitioned and non-partitioned indexes. I’ve found that that partitioned indexes perform better when aggregating data or scanning partitions. Conversely, you’ll probably find that, if you need to locate a single, specific record, nothing performs better than a non-partitioned index on that column.

Let’s walk through some examples and see how they perform. I’ll bring back my trusty ol’ orders table for this.

/* Create a partition function. */
CREATE Partition FUNCTION 
    [test_monthlyDateRange_pf] (DATETIME)
    AS Range RIGHT FOR VALUES
    ('2009-01-01', '2009-01-08', '2009-01-15'
    , '2009-01-22', '2009-01-29');
Go
 
/* Associate the partition function with a partition scheme. */
CREATE Partition Scheme test_monthlyDateRange_ps
    AS Partition test_monthlyDateRange_pf
    All TO ([PRIMARY]);
Go
 
/* Create a partitioned table. */
CREATE TABLE dbo.orders
(
      order_id  INT IDENTITY(1,1)   Not Null
    , orderDate DATETIME            Not Null
    , orderData SMALLDATETIME       Not Null
 
    CONSTRAINT PK_orders PRIMARY KEY CLUSTERED
    (
        order_id
      , orderDate
    )
) ON test_monthlyDateRange_ps(orderDate);
Go
 
/* Create some records to play with. */
SET NOCOUNT ON;
 
DECLARE @endDate DATETIME = '2009-01-01';
 
WHILE @endDate < '2009-02-01'
BEGIN
 
    INSERT INTO dbo.orders
    SELECT @endDate, @endDate;
 
    SET @endDate = DATEADD(MINUTE, 1, @endDate);
 
END;
 
SET NOCOUNT OFF;
 
 
/* Let’s create an aligned, partitioned index. */
CREATE NONCLUSTERED INDEX IX_orders_aligned
    ON dbo.orders(order_id)
    ON test_monthlyDateRange_ps(orderDate); 
    /* you don't actually need to declare the last
       line of this unless you want to create the
       index on a different partitioning scheme.   */
 
/* Now let’s create an unpartitioned index. */
CREATE NONCLUSTERED INDEX IX_orders_unpartitioned
    ON dbo.orders(order_id)
    ON [PRIMARY];

 

Now that we have both a partitioned and an unpartitioned index, let’s take a look at our sys.partitions table:

/* Let's take a look at our index partitions */
SELECT i.name
    , i.index_id
    , p.partition_number
    , p.ROWS
FROM sys.partitions AS p
Join sys.indexes AS i
    ON p.OBJECT_ID = i.OBJECT_ID 
   And p.index_id = i.index_id
WHERE p.OBJECT_ID = OBJECT_ID('orders')
ORDER BY i.index_id, p.partition_number;

 

sys.partitions

sys.partitions

As expected, both of our partitioned indexes, PK_orders and IX_orders_aligned, have 6 partitions, with a subset of rows on each partition. Our unpartitioned non-clustered index, IX_orders_unpartitioned, on the other hand has just 1 partition containing all of the rows.

Now that we have our environment set up, let’s run through some different queries and see the performance impact of each type of index.

/* Query 1, specific record look-up, covered */
SELECT order_id, orderDate
FROM dbo.orders WITH (INDEX(IX_orders_aligned))
WHERE order_id = 25000;
 
SELECT order_id, orderDate
FROM dbo.orders WITH (INDEX(IX_orders_unpartitioned))
WHERE order_id = 25000;

 

Query 1

Query 1

The unpartitioned index performs significantly better when given a specific record to look-up. Now let’s try the same query, but utilizing a scan instead of a seek:

/* Query 2, specific record look-up, uncovered */
SELECT order_id, orderDate, orderData
FROM dbo.orders WITH (INDEX(IX_orders_aligned))
WHERE order_id = 30000;
 
SELECT order_id, orderDate, orderData
FROM dbo.orders WITH (INDEX(IX_orders_unpartitioned))
WHERE order_id = 30000;

 

Query 2

Query 2

Again we see that the non-partitioned index performs better with the single-record look-up. This can lead to some pretty dramatic performance implications. So when *would* we want to use a partitioned index? Two instances immediately pop to mind. First, partition switching can only be performed when all indexes on a table are aligned. Secondly, partitioned indexes perform better when manipulating large data sets. To see this in action, let’s try some simple aggregation…

/* Query 3, aggregation */
SELECT CAST(ROUND(CAST(orderdate AS FLOAT), 0, 1) AS SMALLDATETIME) AS 'order_date'
    , COUNT(*)
FROM dbo.orders WITH (INDEX(IX_orders_aligned))
WHERE orderDate Between '2009-01-01' And '2009-01-07 23:59'
GROUP BY CAST(ROUND(CAST(orderdate AS FLOAT), 0, 1) AS SMALLDATETIME) 
ORDER BY CAST(ROUND(CAST(orderdate AS FLOAT), 0, 1) AS SMALLDATETIME);
 
SELECT CAST(ROUND(CAST(orderdate AS FLOAT), 0, 1) AS SMALLDATETIME) AS 'order_date'
    , COUNT(*)
FROM dbo.orders WITH (INDEX(IX_orders_unpartitioned))
WHERE orderDate Between '2009-01-01' And '2009-01-07 23:59'
GROUP BY CAST(ROUND(CAST(orderdate AS FLOAT), 0, 1) AS SMALLDATETIME) 
ORDER BY CAST(ROUND(CAST(orderdate AS FLOAT), 0, 1) AS SMALLDATETIME);

 

Query 3

Query 3

As you can see, partitioned indexes perform better when aggregating data. This is just a simple example, but the results can be even more dramatic in a large production environment. This is one of the reasons why partitioned tables and indexes are especially beneficial in data warehouses.

So now you have a general idea of what a partitioned index is and when to use a partitioned index vs a non-partitioned index. Ultimately, your indexing needs will depend largely on the application and how the data is used. When in doubt, test, test, test! So to recap…

  • Specify “On [FileGroup]“ to create an unpartitioned index on a partitioned table
  • Consider using non-partitioned indexes for single-record look-ups
  • Use partitioned indexes for multiple records and data aggregations
  • To enable partition switching, all indexes on the table must be aligned.

For more information on partitioning, check out my other partitioning articles:

Partitioning Example
Partitioning 101
Tips for Large Data Stores

Updated Index Defrag Script (2005, 2008)

November 17, 2008 by Michelle Ufford · 18 Comments
Filed under: Performance & Tuning, SQL 2008, T-SQL Scripts 

Thanks to everyone who left a comment or sent me an e-mail regarding the Index Defrag Script. I’ve received some great feedback and requests for features. I’ve also had some questions regarding how to use it, which I will answer at the end of this post.

Changes include:
– separate version for both Enterprise and Standard editions
    - Standard edition removes partitioning and online options
– output option to see fragmentation levels
– page_count added to the log table

I’ve also verified that this script works well in SQL 2008.

Enterprise Version:

IF EXISTS(SELECT OBJECT_ID FROM sys.tables
                WHERE [name] = N'dba_indexDefragLog')
BEGIN
 
    DROP TABLE dbo.dba_indexDefragLog;
 
    PRINT 'dba_indexDefragLog table dropped!';
 
END
 
CREATE TABLE dbo.dba_indexDefragLog
(
      indexDefrag_id    INT IDENTITY(1,1)   NOT NULL
    , objectID          INT                 NOT NULL
    , objectName        NVARCHAR(130)       NOT NULL
    , indexID           INT                 NOT NULL
    , indexName         NVARCHAR(130)       NOT NULL
    , partitionNumber   SMALLINT            not null
    , fragmentation     FLOAT               NOT NULL
    , page_count        INT                 NOT NULL
    , dateTimeStart     DATETIME            NOT NULL
    , durationSeconds   INT                 NOT NULL
    CONSTRAINT PK_indexDefragLog
        PRIMARY KEY CLUSTERED (indexDefrag_id)
);
 
PRINT 'dba_indexDefragLog Table Created';
 
IF OBJECTPROPERTY(OBJECT_ID('dbo.dba_indexDefrag_sp'),
        N'IsProcedure') IS Null
BEGIN
    EXECUTE ('Create Procedure dbo.dba_indexDefrag_sp
        As Print ''Hello World!''');
    RAISERROR('Procedure dba_indexDefrag_sp created.'
        , 10, 1);
END;
Go
 
SET ANSI_Nulls ON;
SET Ansi_Padding ON;
SET Ansi_Warnings ON;
SET ArithAbort ON;
SET Concat_Null_Yields_Null ON;
SET NOCOUNT ON;
SET Numeric_RoundAbort OFF;
SET Quoted_Identifier ON;
Go
 
ALTER PROCEDURE dbo.dba_indexDefrag_sp
 
    /* Declare Parameters */
      @minFragmentation     FLOAT           = 10.0
        /* in percent, will not defrag if fragmentation
           less than specified */
    , @rebuildThreshold     FLOAT           = 30.0
        /* in percent, greater than @rebuildThreshold
           will result in rebuild instead of reorg */
    , @onlineRebuild        BIT             = 1     
        /* 1 = online rebuild; 0 = offline rebuild */
    , @executeSQL           BIT             = 1
        /* 1 = execute; 0 = print command only */
    , @tableName            VARCHAR(4000)   = Null 
        /* Option to specify a table name */
    , @printCommands        BIT             = 0
        /* 1 = print commands; 0 = do not print commands */
    , @printFragmentation   BIT             = 0
        /* 1 = print fragmentation prior to defrag; 
           0 = do not print */
    , @defragDelay          CHAR(8)         = '00:00:05'
        /* time to wait between defrag commands */
AS
/********************************************************************
    Name:       dba_indexDefrag_sp
 
    Author:     Michelle F. Ufford
 
    Purpose:    Defrags all indexes for the current database
 
    Notes:      This script was designed for SQL Server 2005
                Enterprise Edition.
 
    CAUTION: Monitor transaction log if executing for the first time!
 
      @minFragmentation     defaulted to 10%, will not defrag if
                            fragmentation if less than specified.
 
      @rebuildThreshold     defaulted to 30% as recommended by
                            Microsoft in BOL;
                            > than 30% will result in rebuild instead
 
      @onlineRebuild        1 = online rebuild; 
                            0 = offline rebuild
 
      @executeSQL           1 = execute the SQL generated by this proc;
                            0 = print command only
 
      @tableName            Specify if you only want to defrag indexes
                            for a specific table
 
      @printCommands        1 = print commands to screen;
                            0 = do not print commands
 
      @printFragmentation   1 = print fragmentation to screen;
                            0 = do not print fragmentation
 
      @defragDelay          time to wait between defrag commands;
                            gives the server some time to catch up
 
    Called by:  SQL Agent Job or DBA
 
    Date        Initials  Description
    ----------------------------------------------------------------
    2008-10-27  MFU       Initial Release
    2008-11-17  MFU       Added page_count to log table
                          , added @printFragmentation option
********************************************************************
    Exec dbo.dba_indexDefrag_sp
          @executeSQL         = 1
        , @printCommands      = 1
        , @minFragmentation   = 0
        , @printFragmentation = 1;
********************************************************************/
 
SET NOCOUNT ON;
SET XACT_Abort ON;
 
BEGIN
 
    /* Declare our variables */
    DECLARE   @objectID         INT
            , @indexID          INT
            , @partitionCount   BIGINT
            , @schemaName       NVARCHAR(130)
            , @objectName       NVARCHAR(130)
            , @indexName        NVARCHAR(130)
            , @partitionNumber  SMALLINT
            , @partitions       SMALLINT
            , @fragmentation    FLOAT
            , @pageCount        INT
            , @sqlCommand       NVARCHAR(4000)
            , @rebuildCommand   NVARCHAR(200)
            , @dateTimeStart    DATETIME
            , @dateTimeEnd      DATETIME
            , @containsLOB      BIT;
 
    /* Just a little validation... */
    IF @minFragmentation Not Between 0.00 And 100.0
        SET @minFragmentation = 10.0;
 
    IF @rebuildThreshold Not Between 0.00 And 100.0
        SET @rebuildThreshold = 30.0;
 
    IF @defragDelay Not Like '00:[0-5][0-9]:[0-5][0-9]'
        SET @defragDelay = '00:00:05';
 
    /* Determine which indexes to defrag using our
       user-defined parameters */
    SELECT
          OBJECT_ID AS objectID
        , index_id AS indexID
        , partition_number AS partitionNumber
        , avg_fragmentation_in_percent AS fragmentation
        , page_count 
        , 0 AS 'defragStatus'
            /* 0 = unprocessed, 1 = processed */
    INTO #indexDefragList
    FROM sys.dm_db_index_physical_stats
        (DB_ID(), OBJECT_ID(@tableName), NULL , NULL, N'Limited')
    WHERE avg_fragmentation_in_percent > @minFragmentation
        And index_id > 0
    OPTION (MaxDop 1);
 
    /* Create a clustered index to boost performance a little */
    CREATE CLUSTERED INDEX CIX_temp_indexDefragList
        ON #indexDefragList(objectID, indexID, partitionNumber);
 
    /* Begin our loop for defragging */
    WHILE (SELECT COUNT(*) FROM #indexDefragList
            WHERE defragStatus = 0) > 0
    BEGIN
 
        /* Grab the most fragmented index first to defrag */
        SELECT TOP 1
              @objectID         = objectID
            , @fragmentation    = fragmentation
            , @indexID          = indexID
            , @partitionNumber  = partitionNumber
            , @pageCount        = page_count
        FROM #indexDefragList
        WHERE defragStatus = 0
        ORDER BY fragmentation DESC;
 
        /* Look up index information */
        SELECT @objectName = QUOTENAME(o.name)
             , @schemaName = QUOTENAME(s.name)
        FROM sys.objects AS o
        INNER Join sys.schemas AS s
            ON s.schema_id = o.schema_id
        WHERE o.OBJECT_ID = @objectID;
 
        SELECT @indexName = QUOTENAME(name)
        FROM sys.indexes
        WHERE OBJECT_ID = @objectID
            And index_id = @indexID
            And type > 0;
 
        /* Determine if the index is partitioned */
        SELECT @partitionCount = COUNT(*)
        FROM sys.partitions
        WHERE OBJECT_ID = @objectID
            And index_id = @indexID;
 
        /* Look for LOBs */
        SELECT TOP 1
            @containsLOB = column_id
        FROM sys.columns WITH (NOLOCK)
        WHERE 
            [OBJECT_ID] = @objectID
            And (system_type_id In (34, 35, 99)
            -- 34 = image, 35 = text, 99 = ntext
                    Or max_length = -1);
            -- varbinary(max), varchar(max), nvarchar(max), xml
 
        /* See if we should rebuild or reorganize; handle thusly */
        IF @fragmentation < @rebuildThreshold And @partitionCount <= 1
            SET @sqlCommand = N'Alter Index ' + @indexName + N' On '
                + @schemaName + N'.' + @objectName + N' ReOrganize';
 
        IF @fragmentation >= @rebuildThreshold
            And IsNull(@containsLOB, 0) = 0
                -- Cannot rebuild if the table has one or more LOB
            And @partitionCount <= 1
        BEGIN
 
            /* We should always rebuild online if possible
                (SQL 2005 Enterprise) */
            IF @onlineRebuild = 0
                SET @rebuildCommand = N' Rebuild With
                    (Online = Off, MaxDop = 1)';
            ELSE
                SET @rebuildCommand = N' Rebuild With
                    (Online = On, MaxDop = 1)';
 
            SET @sqlCommand = N'Alter Index ' + @indexName + N' On '
                + @schemaName + N'.' + @objectName + @rebuildCommand;
        END;
 
        /* If our index is partitioned, we should always reorganize */
        IF @partitionCount > 1
            SET @sqlCommand = N'Alter Index ' + @indexName + N' On '
                + @schemaName + N'.' + @objectName + N' ReOrganize'
                + N' Partition = '
                + CAST(@partitionNumber AS NVARCHAR(10));
                -- no MaxDop needed, single threaded operation
 
        /* Are we executing the SQL?  If so, do it */
        IF @executeSQL = 1
        BEGIN
 
            /* Grab the time for logging purposes */
            SET @dateTimeStart  = GETDATE();
            EXECUTE (@sqlCommand);
            SET @dateTimeEnd  = GETDATE();
 
            /* Log our actions */
            INSERT INTO dbo.dba_indexDefragLog
            (
                  objectID
                , objectName
                , indexID
                , indexName
                , partitionNumber
                , fragmentation
                , page_count
                , dateTimeStart
                , durationSeconds
            )
            SELECT
                  @objectID
                , @objectName
                , @indexID
                , @indexName
                , @partitionNumber
                , @fragmentation
                , @pageCount
                , @dateTimeStart
                , DATEDIFF(SECOND, @dateTimeStart, @dateTimeEnd);
 
            /* Just a little breather for the server */
            WAITFOR Delay @defragDelay;
 
            /* Print if specified to do so */
            IF @printCommands = 1
                PRINT N'Executed: ' + @sqlCommand;
        END
        ELSE
        /* Looks like we're not executing, just print
            the commands */
        BEGIN
            IF @printCommands = 1
                PRINT @sqlCommand;
        END
 
        /* Update our index defrag list when we've
            finished with that index */
        UPDATE #indexDefragList
        SET defragStatus = 1
        WHERE objectID         = @objectID
          And indexID          = @indexID
          And partitionNumber  = @partitionNumber;
 
    END
 
    /* Do we want to output our fragmentation results? */
    IF @printFragmentation = 1
        SELECT idl.objectID
            , o.name AS 'tableName'
            , idl.indexID
            , i.name AS 'indexName'
            , idl.fragmentation
            , idl.page_count
        FROM #indexDefragList AS idl
        Join sys.objects AS o
            ON idl.objectID = o.OBJECT_ID
        Join sys.indexes AS i
            ON idl.objectID = i.OBJECT_ID
            And idl.indexID = i.index_id;
 
    /* When everything is done, make sure to get rid of
        our temp table */
    DROP TABLE #indexDefragList;
 
    SET NOCOUNT OFF;
    RETURN 0
END
Go

 

Standard Version:

IF EXISTS(SELECT OBJECT_ID FROM sys.tables
                WHERE [name] = N'dba_indexDefragLog')
BEGIN
 
    DROP TABLE dbo.dba_indexDefragLog;
 
    PRINT 'dba_indexDefragLog table dropped!';
 
END
 
CREATE TABLE dbo.dba_indexDefragLog
(
      indexDefrag_id    INT IDENTITY(1,1)   NOT NULL
    , objectID          INT                 NOT NULL
    , objectName        NVARCHAR(130)       NOT NULL
    , indexID           INT                 NOT NULL
    , indexName         NVARCHAR(130)       NOT NULL
    , fragmentation     FLOAT               NOT NULL
    , page_count        INT                 NOT NULL
    , dateTimeStart     DATETIME            NOT NULL
    , durationSeconds   INT                 NOT NULL
    CONSTRAINT PK_indexDefragLog
        PRIMARY KEY CLUSTERED (indexDefrag_id)
);
 
PRINT 'dba_indexDefragLog Table Created';
 
IF OBJECTPROPERTY(OBJECT_ID('dbo.dba_indexDefragStandard_sp'),
        N'IsProcedure') IS Null
BEGIN
    EXECUTE ('Create Procedure dbo.dba_indexDefragStandard_sp
        As Print ''Hello World!''');
    RAISERROR('Procedure dba_indexDefragStandard_sp created.'
        , 10, 1);
END;
Go
 
SET ANSI_Nulls ON;
SET Ansi_Padding ON;
SET Ansi_Warnings ON;
SET ArithAbort ON;
SET Concat_Null_Yields_Null ON;
SET NOCOUNT ON;
SET Numeric_RoundAbort OFF;
SET Quoted_Identifier ON;
Go
 
ALTER PROCEDURE dbo.dba_indexDefragStandard_sp
 
    /* Declare Parameters */
      @minFragmentation     FLOAT           = 10.0
        /* in percent, will not defrag if fragmentation
           less than specified */
    , @rebuildThreshold     FLOAT           = 30.0
        /* in percent, greater than @rebuildThreshold
           will result in rebuild instead of reorg */
    , @executeSQL           BIT             = 1
        /* 1 = execute; 0 = print command only */
    , @tableName            VARCHAR(4000)   = Null 
        /* Option to specify a table name */
    , @printCommands        BIT             = 0
        /* 1 = print commands; 0 = do not print commands */
    , @printFragmentation   BIT             = 0
        /* 1 = print fragmentation prior to defrag; 
           0 = do not print */
    , @defragDelay          CHAR(8)         = '00:00:05'
        /* time to wait between defrag commands */
AS
/********************************************************************
    Name:       dba_indexDefragStandard_sp
 
    Author:     Michelle F. Ufford
 
    Purpose:    Defrags all indexes for the current database
 
    Notes:      This script was designed for SQL Server 2005
                Standard edition.
 
    CAUTION: Monitor transaction log if executing for the first time!
 
      @minFragmentation     defaulted to 10%, will not defrag if
                            fragmentation if less than specified.
 
      @rebuildThreshold     defaulted to 30% as recommended by
                            Microsoft in BOL;
                            > than 30% will result in rebuild instead
 
      @executeSQL           1 = execute the SQL generated by this proc;
                            0 = print command only
 
      @tableName            Specify if you only want to defrag indexes
                            for a specific table
 
      @printCommands        1 = print commands to screen;
                            0 = do not print commands
 
      @printFragmentation   1 = print fragmentation to screen;
                            0 = do not print fragmentation
 
      @defragDelay          time to wait between defrag commands;
                            gives the server some time to catch up
 
    Called by:  SQL Agent Job or DBA
 
    Date        Initials  Description
    ----------------------------------------------------------------
    2008-10-27  MFU       Initial Release
    2008-11-17  MFU       Added page_count to log table
                          , added @printFragmentation option
********************************************************************
    Exec dbo.dba_indexDefragStandard_sp
          @executeSQL         = 1
        , @printCommands      = 1
        , @minFragmentation   = 0
        , @printFragmentation = 1;
********************************************************************/
 
SET NOCOUNT ON;
SET XACT_Abort ON;
 
BEGIN
 
    /* Declare our variables */
    DECLARE   @objectID         INT
            , @indexID          INT
            , @schemaName       NVARCHAR(130)
            , @objectName       NVARCHAR(130)
            , @indexName        NVARCHAR(130)
            , @fragmentation    FLOAT
            , @pageCount        INT
            , @sqlCommand       NVARCHAR(4000)
            , @rebuildCommand   NVARCHAR(200)
            , @dateTimeStart    DATETIME
            , @dateTimeEnd      DATETIME
            , @containsLOB      BIT;
 
    /* Just a little validation... */
    IF @minFragmentation Not Between 0.00 And 100.0
        SET @minFragmentation = 10.0;
 
    IF @rebuildThreshold Not Between 0.00 And 100.0
        SET @rebuildThreshold = 30.0;
 
    IF @defragDelay Not Like '00:[0-5][0-9]:[0-5][0-9]'
        SET @defragDelay = '00:00:05';
 
    /* Determine which indexes to defrag using our
       user-defined parameters */
    SELECT
          OBJECT_ID AS objectID
        , index_id AS indexID
        , avg_fragmentation_in_percent AS fragmentation
        , page_count 
        , 0 AS 'defragStatus'
            /* 0 = unprocessed, 1 = processed */
    INTO #indexDefragList
    FROM sys.dm_db_index_physical_stats
        (DB_ID(), OBJECT_ID(@tableName), NULL , NULL, N'Limited')
    WHERE avg_fragmentation_in_percent > @minFragmentation
        And index_id > 0
    OPTION (MaxDop 1);
 
    /* Create a clustered index to boost performance a little */
    CREATE CLUSTERED INDEX CIX_temp_indexDefragList
        ON #indexDefragList(objectID, indexID);
 
    /* Begin our loop for defragging */
    WHILE (SELECT COUNT(*) FROM #indexDefragList
            WHERE defragStatus = 0) > 0
    BEGIN
 
        /* Grab the most fragmented index first to defrag */
        SELECT TOP 1
              @objectID         = objectID
            , @fragmentation    = fragmentation
            , @indexID          = indexID
            , @pageCount        = page_count
        FROM #indexDefragList
        WHERE defragStatus = 0
        ORDER BY fragmentation DESC;
 
        /* Look up index information */
        SELECT @objectName = QUOTENAME(o.name)
             , @schemaName = QUOTENAME(s.name)
        FROM sys.objects AS o
        INNER Join sys.schemas AS s
            ON s.schema_id = o.schema_id
        WHERE o.OBJECT_ID = @objectID;
 
        SELECT @indexName = QUOTENAME(name)
        FROM sys.indexes
        WHERE OBJECT_ID = @objectID
            And index_id = @indexID
            And type > 0;
 
        /* Look for LOBs */
        SELECT TOP 1
            @containsLOB = column_id
        FROM sys.columns WITH (NOLOCK)
        WHERE 
            [OBJECT_ID] = @objectID
            And (system_type_id In (34, 35, 99)
            -- 34 = image, 35 = text, 99 = ntext
                    Or max_length = -1);
            -- varbinary(max), varchar(max), nvarchar(max), xml
 
        /* See if we should rebuild or reorganize; handle thusly */
        IF @fragmentation < @rebuildThreshold 
            Or IsNull(@containsLOB, 0) > 0 
            -- Cannot rebuild if the table has one or more LOB
            SET @sqlCommand = N'Alter Index ' + @indexName + N' On '
                + @schemaName + N'.' + @objectName + N' ReOrganize;'
        ELSE
            SET @sqlCommand = N'Alter Index ' + @indexName + N' On '
                + @schemaName + N'.' + @objectName +  ' Rebuild '
                + 'With (MaxDop = 1)'; -- minimize impact on server
 
        /* Are we executing the SQL?  If so, do it */
        IF @executeSQL = 1
        BEGIN
 
            /* Grab the time for logging purposes */
            SET @dateTimeStart  = GETDATE();
            EXECUTE (@sqlCommand);
            SET @dateTimeEnd  = GETDATE();
 
            /* Log our actions */
            INSERT INTO dbo.dba_indexDefragLog
            (
                  objectID
                , objectName
                , indexID
                , indexName
                , fragmentation
                , page_count
                , dateTimeStart
                , durationSeconds
            )
            SELECT
                  @objectID
                , @objectName
                , @indexID
                , @indexName
                , @fragmentation
                , @pageCount
                , @dateTimeStart
                , DATEDIFF(SECOND, @dateTimeStart, @dateTimeEnd);
 
            /* Just a little breather for the server */
            WAITFOR Delay @defragDelay;
 
            /* Print if specified to do so */
            IF @printCommands = 1
                PRINT N'Executed: ' + @sqlCommand;
        END
        ELSE
        /* Looks like we're not executing, just print
            the commands */
        BEGIN
            IF @printCommands = 1
                PRINT @sqlCommand;
        END
 
        /* Update our index defrag list when we've
            finished with that index */
        UPDATE #indexDefragList
        SET defragStatus = 1
        WHERE objectID  = @objectID
          And indexID   = @indexID;
 
    END
 
    /* Do we want to output our fragmentation results? */
    IF @printFragmentation = 1
        SELECT idl.objectID
            , o.name AS 'tableName'
            , idl.indexID
            , i.name AS 'indexName'
            , idl.fragmentation
            , idl.page_count
        FROM #indexDefragList AS idl
        JOIN sys.objects AS o
            ON idl.objectID = o.OBJECT_ID
        JOIN sys.indexes AS i
            ON idl.objectID = i.OBJECT_ID
            AND idl.indexID = i.index_id;
 
    /* When everything is done, make sure to get rid of
        our temp table */
    DROP TABLE #indexDefragList;
 
    SET NOCOUNT OFF;
    RETURN 0
END
Go

 

For those who are having troubles with this script…

1) “Not all of my indexes were defragged!” or “Nothing happened when I executed this script.”

This script will only defrag those indexes that surpass the specified threshold. If you’re not seeing your index in the output, try executing this:

    EXEC dbo.dba_indexDefrag_sp
          @executeSQL    = 0
        , @printCommands = 1
        , @minFragmentation = 0
        , @printFragmentation = 1;

Check to see what your index’s fragmentation level is. Maybe it’s not as fragmented as you feared. :)

2) “My indexes are still fragmented after running this script.”

To quote The Powers That Be (aka Microsoft)…

“In general, fragmentation on small indexes is often not controllable. The pages of small indexes are stored on mixed extents. Mixed extents are shared by up to eight objects, so the fragmentation in a small index might not be reduced after reorganizing or rebuilding the index.” — Reorganizing and Rebuilding Indexes

3) “Can I use this in my production environment?”

That really depends on your environment. I’ve successfully used this in some very large production environments. However, I wouldn’t exactly recommend executing the script in the middle of a business day on a billion+ row, heavily fragmented, unpartitioned table, either. :)

If you’re not sure what the impact will be, execute the commands-only version of the script…

    EXEC dbo.dba_indexDefrag_sp
	      @executeSQL    = 0
        , @printCommands = 1
        , @printFragmentation = 1;

… then execute the statements one at a time. Make sure you monitor tempdb and the transaction log to ensure you don’t have any space issues.

If you have any additional questions or suggestions for this script, leave me a comment and I’ll be happy to help. :)

Partitioning Example

November 3, 2008 by Michelle Ufford · 4 Comments
Filed under: T-SQL Scripts 

The following code will walk you through the process of creating a partitioned table:

--------------------------------
-- Create A Partitioned Table --
--------------------------------
 
/* Create a partition function. */
CREATE Partition FUNCTION [test_monthlyDateRange_pf] (SMALLDATETIME) 
    AS Range RIGHT FOR VALUES 
    ('2008-01-01', '2008-02-01', '2008-03-01');
Go
 
/* Associate the partition function with a partition scheme. */
CREATE Partition Scheme test_monthlyDateRange_ps 
    AS Partition test_monthlyDateRange_pf 
    All TO ([PRIMARY]);
Go
 
/* Create your first partitioned table! 
   Make sure the data types match.  */
CREATE TABLE dbo.orders
(
      order_id  INT IDENTITY(1,1)   Not Null
    , orderDate SMALLDATETIME       Not Null
 
    CONSTRAINT PK_orders PRIMARY KEY CLUSTERED
    (
        orderDate
      , order_id
    )
) ON test_monthlyDateRange_ps(orderDate);
Go
 
/* Create some records to play with. */
INSERT INTO dbo.orders
SELECT '2007-12-31' UNION All
SELECT '2008-01-02' UNION All
SELECT '2008-01-03' UNION All
SELECT '2008-01-04' UNION All
SELECT '2008-02-01' UNION All
SELECT '2008-02-02' UNION All
SELECT '2008-03-01' UNION All
SELECT '2008-03-02';
 
/* The $partition function can be used to interrogate partition data.
   Let's use it to see where those records are physically located. */
SELECT $partition.test_monthlyDateRange_pf(orderDate) 
    AS 'partition_number'
    , *
FROM dbo.orders;
 
/* By default, all new indexes are created on the partition.
   Let's create an aligned index */
CREATE NONCLUSTERED INDEX IX_orders_aligned
    ON dbo.orders(order_id)
    ON test_monthlyDateRange_ps(orderDate);
 
/* Now let's create an un-aligned index.
   We'll need to specify the filegroup. */
CREATE NONCLUSTERED INDEX IX_orders_nonaligned
    ON dbo.orders(order_id)
    ON [PRIMARY]; -- can be any filegroup
 
/* Review your indexes */
EXECUTE SP_HELPINDEX orders;

Using the previous code as a building block, let’s try swapping partitions:

--------------------------
-- Swap Out A Partition --
--------------------------
 
/* We need to drop our un-aligned index; otherwise we'll 
   get an error when we attempt to do the switch. */
DROP INDEX IX_orders_nonaligned ON dbo.orders;
 
/* Create the table to hold the data you're swapping out.
   The table structures must match identically; however, 
   DO NOT partition this table. */
CREATE TABLE dbo.orders_stage_swapOut
(
      order_id  INT             Not Null
    , orderDate SMALLDATETIME   Not Null
 
    CONSTRAINT PK_orders_stage_swapOut PRIMARY KEY CLUSTERED
    (
        orderDate
      , order_id
    )
) ON [PRIMARY];
Go
 
/* Create the table to hold the data you're swapping in.
   The table structures must match identically; however,
   DO NOT partition this table. */
CREATE TABLE dbo.orders_stage_swapIn
(
      order_id  INT             Not Null
    , orderDate SMALLDATETIME   Not Null
 
    CONSTRAINT PK_orders_stage_swapIn PRIMARY KEY CLUSTERED
    (
        orderDate
      , order_id
    )
) ON [PRIMARY];
Go
 
/* Populate the table you're swapping in. */
INSERT INTO dbo.orders_stage_swapIn
SELECT -5, '2008-02-02' UNION All
SELECT -4, '2008-02-03' UNION All
SELECT -3, '2008-02-04' UNION All
SELECT -2, '2008-02-05' UNION All
SELECT -1, '2008-02-06';
 
/* Create any indexes on your table to match the 
   indexes on your partitioned table. */
CREATE NONCLUSTERED INDEX IX_orders_stage_swapIn
    ON dbo.orders_stage_swapIn(order_id);
 
/* Add a check constraint for the 
   partition to be swapped in. 
   This step is required. */
ALTER TABLE dbo.orders_stage_swapIn
WITH CHECK
ADD CONSTRAINT orders_stage_swapIn_orderDateCK
	CHECK (orderDate >= '2008-02-01' 
		And orderDate < '2008-03-01');
Go
 
/* Swap out the old partition. */
ALTER TABLE dbo.orders 
    Switch Partition 3 TO dbo.orders_stage_swapOut;
Go
 
/* Swap in the new partition. */
ALTER TABLE dbo.orders_stage_swapIn
    Switch TO dbo.orders Partition 3;
Go
 
---------------------
-- Check your data --
---------------------
 
/* You should have 2 records in here. */
SELECT * FROM dbo.orders_stage_swapOut;
 
/* You should have 5 records here. */
SELECT * 
FROM dbo.orders
WHERE orderDate >= '2008-02-01' 
And orderDate < '2008-03-01';
 
/* There should be no records in this table. */
SELECT * FROM dbo.orders_stage_swapIn;
 
SELECT $partition.test_monthlyDateRange_pf(orderDate) 
    AS 'partition_number'
    , *
FROM dbo.orders;
 
/* Clean-up time!
Drop Table dbo.orders
Drop Table dbo.orders_stage_swapOut
Drop Table dbo.orders_stage_swapIn
Drop Partition Scheme test_monthlyDateRange_ps
Drop Partition Function [test_monthlyDateRange_pf]
*/

Pretty easy, huh? :)

Partitioning 101

November 3, 2008 by Michelle Ufford · 2 Comments
Filed under: Performance & Tuning 

A few posts ago, I briefly discussed partitioning. This article will continue that discussion and focus on horizontal partitioning within SQL Server 2005.

What is partitioning?

As I’ve mentioned before, horizontal partitioning is the physical segregation of a single logical table name into multiple, *identical* physical structures. In SQL 2005, all table and index pages actually exist on a partition. However, if you’ve never created a partitioning scheme, then by default each of your tables and indexes contain just a single partition.

I’ll also be talking about aligned indexes and tables, so let’s go ahead and define that here. Two tables sharing the same partitioning function and the same partitioning key are aligned. Similarly, a partitioned index on a partitioned table sharing the same partitioning key is called an aligned index. Having tables and indexes in alignment can reduce CPU and improve parallel plan execution.

Why partitioning?

The two most common reasons to implement partitioning are performance improvement and ease of archiving. Let’s talk about performance improvement first. We made the decision to implement partitioning in one of our data warehouses because of the size of the tables. We had a couple of tables that exceeded a billion rows, and several tables that were fast approaching that line. Upon deploying partitioning, CPU utilization dropped by ~25%, and our regular CPU and disk spikes caused by reporting processes were all but eliminated. Many of our stored procedures experienced reduced durations and reads in the range of 30-60%, and some had even greater improvements.

Why were we seeing this improvement? Many of the stored procedures were accessing data for recent dates. Knowing this helped us to decide on a weekly partitioning scheme. Stored procedures joining on aligned tables were now only accessing a small portion of the data, resulting in the aforementioned improvements. In addition, ad hocs that performed index scans now completed in a fraction of the time because the amount of data being scanned was significantly reduced.

Table maintenance also saw drastic improvement. It was previously unheard of to attempt an online index defrag on some of the largest tables without first scheduling down-time. Now, these same tables were being defragged nightly in one minute or less, without issue. This is because index defrag operations can be performed by partition.  This is a great resource save; obviously, once a partition is no longer written to and is fully defragmented, it will not become re-fragmented, so there is no need to waste resources on it. Refer to my Index Defrag Script for an example on how to automate this process.

Another great feature of partitioned tables is the ability to swap partitions in and out. I took advantage of this feature with great success when backfilling data. My backfill strategy primarily consisted of bcp’ing data in weekly chunks to a staging database. The table I was backfilling had a text column where one record happened to contain the column deliminator for the bcp file. This of course wasn’t caught until auditing data counts prior to deployment, when it was noticed that one week was short about ~2mm records. Now, this was ~2mm records out of ~800mm total records. The solution? Re-run the bcp process, using a different deliminator, dump the data in a staging table, and swap out the partition. The actual swap out took 2 seconds and there was no impact to the server. It was beautiful.

Swapping out partitions is particularly beneficial for archiving. Instead of writing a batch process to loop through and delete expired data, which is both time consuming and resource intensive, a partition can be swapped out and dropped in a matter of seconds. Keep in mind that, in SQL Server 2005, partitions can only be swapped out on non-replicated tables. SQL 2008 introduces functionality to allow partition swapping on a replicated table.  Check out Replicating Partitioned Tables and Indexes.

That sounds great! Where do I sign up?

There are a few things to consider prior to implementing partitioning in your environment. One is the performance hit to writes. I’ve noticed around a 10% decrease in write performance in my environment. This is partly due to the overhead required to look up which partition the record should be written to. For this reason, I only partition tables that I expect to grow quickly (my rule of thumb is >10mm records per week), or tables that have regular archiving needs.

There’s also a performance hit when querying over many partitions. Basically, think of a giant UNION ALL statement that occurs every time more than one partition is accessed. With our production hardware, we noticed that duration was still less until you hit around 30 partitions. After that, the duration was greater for partitioned tables versus non-partitioned, but reads were still less overall. This will probably vary depending on your environment, so run some tests and make sure you know what kind of impact to expect. This will also help you to determine your partitioning scheme. For example, you wouldn’t want to partition on a date column using a weekly scheme if every report needs to access data for the previous 12 months.

Also make sure you have a good candidate for partitioning. That is, do not partition on a column that is frequently changed or is written non-sequentially. Both can cause overhead to your environment and may negate the benefits of partitioning. Keep in mind that when the value of the partitioning key is changed, the record is not relocated to the correct partition. Some ideal keys include surrogate identity and sequential datetime columns.

Other things to know…

• The partitioning function requires the declaration of a parameter data type. To partition a table, the partitioning key is defined during new table creation. The data types of the partitioning keys must match. For example, if you create a partitioning function using a datetime data type and try to apply it to a table with a smalldatetime column as the partitioning key, the CREATE TABLE statement will fail.

• Ensure that the partitioning key is *always* used in a query on a partitioned table. If the partitioning key is not included in the query search criteria, then the partitioned table will perform worse than a non-partitioned table.

• By default, any index created on a partitioned table is also partitioned. This is useful for creating aligned indexes, which are typically desired. Sometimes, however, you may want to create a non-aligned or non-partitioned index. In these cases, make sure to explicitly declare the desired filegroup during the index creation process.

• Temporary tables can also be partitioned; however, this would probably not make sense in most circumstances.

• Partitioning functions need to be monitored to ensure you don’t run out of defined ranges; if that were to happen, you’d be storing all new data in one really large partition, which wouldn’t be pleasant.

Thanks for wading through this long post. I hope the information contained proves informative and helpful. I’ll be following this post with some actual code to help illustrate some of what I’ve covered here.

Regards,

Michelle

Tips for Large Data Stores

October 20, 2008 by Michelle Ufford · Leave a Comment
Filed under: Performance & Tuning 

I’m currently working on projects that involve the collection of massive amounts of data (i.e. terabyte-class, billion-row tables). One of the challenges of collecting so much data is the ability to report on the data quickly.  To this end, I plan to spend the next several blog posts discussing tips for designing and managing large relational data stores.  These are primarily design practices that have performed well for me within the various environments in which I’ve worked.

Partitioning

• Partitioning is the segregation of a single logical table into separate, identical physical structures. Basically, every table is stored in at least one partition; “partitioned tables” are merely comprised of many partitions.

• Implemented correctly, partitioning can have dramatic improvement on read performance, index maintenance cost reduction, storage of large fact tables on separate disks, and data archiving.

• If you’re not familiar with partitioning, check out Kimberly Tripp’s excellent white paper, Partitioned Tables and Indexes in SQL Server 2005.

• I cannot rave enough about the impact partitioning has had within our environment. To take one of the most extreme examples of improvement, the execution time of one BI report decreased from 2.5+ hours to 20 minutes after we implemented partitioning. While not all stored procedures experienced such phenomenal improvement, improvement in the range of 30% – 60% was very common.

• While partitioning does have many benefits, there are some negatives. Namely, more expensive writes (I’ve noticed around 10%), increased duration of queries that span many partitions (i.e. queries on long time spans, such as a year), and increased maintenance needs.

• My rule of thumb is to partition any table with growth rates > 10mm records per week or with regular archiving needs.

• Much more to come on partitioning in the near future. :)

Indexing

• Any experienced DBA can tell you that effective index management is critical.

• Ensure your stored procedures are using index seeks instead of scans. If scans cannot be avoided, consider using partitioning with aligned indexes to limit the amount of pages scanned.

• Periodically look for un-used indexes and remove them to improve write performance.

• Look for missing indexes, create one or two, and evaluate. Try composite indexes and included columns as a way to limit the amount of indexes you need to create.

• Defrag your indexes! Nightly, if you can get away with it.

Stored Procedure Tuning

• When querying on large amounts of data, try breaking your queries up into individual components and storing in temporary tables, then performing the joins.

• When querying on large date ranges, try looping through the days and inserting the data into a temp table. This can be especially beneficial when querying partitioned tables.

• Create indexes on your temporary tables!

• Utilize index seeks whenever possible.

• If permitted to use dirty reads and your environment’s isolation level is not defaulted to read uncommitted, use With (NoLock) hints.

• Temporary tables can be partitioned! This is usually not necessary but can sometimes be beneficial.

• Try replacing table variables with temporary tables. My rule of thumb is this: if I’m performing joins on the temporary data or if I have more than 100 records, use a temporary table. Table variables can decrease stored procedure recompiles but can sometimes have a negative impact on performance. When in doubt, test each method and evaluate which works better for your needs.

• *Always* include the partitioning key when querying partitioned tables.