Indexing for Partitioned Tables

So you’ve partitioned your table, and now you’re ready to performance tune. As with any table, indexing is a great place to start. And if you’re like most people new to partitioning, you probably created all of your indexes on the partitioned scheme, either by design or unintentionally.

Let’s take a step back here and discuss what a partitioned index is. A partitioned index, like a partitioned table, separates data into different physical structures (partitions) under one logical name. Specifically, each partition in a nonclustered index contains its own B-tree structure with a subset of rows, based upon the partitioning scheme. By default, an unpartitioned nonclustered index has just one partition.

Keep in mind, when you create an index on a partitioned table, i.e.

Create NonClustered Index IX_myIndex
    On dbo.myTable(myColumn);

… you are creating the index on the partitioned scheme by default. In order to create a NON-partitioned index on that same table, you would need to explicitly declare “On [FileGroup]“, i.e.

Create NonClustered Index IX_myIndex
    On dbo.myTable(myColumn)
    On [Primary];

 

But should you partition your index? That depends on how you use it. In fact, most environments will probably want to use a mix of partitioned and non-partitioned indexes. I’ve found that that partitioned indexes perform better when aggregating data or scanning partitions. Conversely, you’ll probably find that, if you need to locate a single, specific record, nothing performs better than a non-partitioned index on that column.

Let’s walk through some examples and see how they perform. I’ll bring back my trusty ol’ orders table for this.

/* Create a partition function. */
Create Partition Function 
    [test_monthlyDateRange_pf] (datetime)
    As Range Right For Values
    ('2009-01-01', '2009-01-08', '2009-01-15'
    , '2009-01-22', '2009-01-29');
Go
 
/* Associate the partition function with a partition scheme. */
Create Partition Scheme test_monthlyDateRange_ps
    As Partition test_monthlyDateRange_pf
    All To ([Primary]);
Go
 
/* Create a partitioned table. */
Create Table dbo.orders
(
      order_id  int Identity(1,1)   Not Null
    , orderDate datetime            Not Null
    , orderData smalldatetime       Not Null
 
    Constraint PK_orders Primary Key Clustered
    (
        order_id
      , orderDate
    )
) On test_monthlyDateRange_ps(orderDate);
Go
 
/* Create some records to play with. */
Set NoCount On;
 
Declare @endDate datetime = '2009-01-01';
 
While @endDate < '2009-02-01'
Begin
 
    Insert Into dbo.orders
    Select @endDate, @endDate;
 
    Set @endDate = DATEADD(minute, 1, @endDate);
 
End;
 
Set NoCount Off;
 
 
/* Let’s create an aligned, partitioned index. */
Create NonClustered Index IX_orders_aligned
    On dbo.orders(order_id)
    On test_monthlyDateRange_ps(orderDate); 
    /* you don't actually need to declare the last
       line of this unless you want to create the
       index on a different partitioning scheme.   */
 
/* Now let’s create an unpartitioned index. */
Create NonClustered Index IX_orders_unpartitioned
    On dbo.orders(order_id)
    On [Primary];

 

Now that we have both a partitioned and an unpartitioned index, let’s take a look at our sys.partitions table:

/* Let's take a look at our index partitions */
Select i.name
    , i.index_id
    , p.partition_number
    , p.rows
From sys.partitions As p
Join sys.indexes As i
    On p.object_id = i.object_id 
   And p.index_id = i.index_id
Where p.object_id = object_id('orders')
Order By i.index_id, p.partition_number;

 

sys.partitions

sys.partitions

As expected, both of our partitioned indexes, PK_orders and IX_orders_aligned, have 6 partitions, with a subset of rows on each partition. Our unpartitioned non-clustered index, IX_orders_unpartitioned, on the other hand has just 1 partition containing all of the rows.

Now that we have our environment set up, let’s run through some different queries and see the performance impact of each type of index.

/* Query 1, specific record look-up, covered */
Select order_id, orderDate
From dbo.orders With (Index(IX_orders_aligned))
Where order_id = 25000;
 
Select order_id, orderDate
From dbo.orders With (Index(IX_orders_unpartitioned))
Where order_id = 25000;

 

Query 1

Query 1

The unpartitioned index performs significantly better when given a specific record to look-up. Now let’s try the same query, but utilizing a scan instead of a seek:

/* Query 2, specific record look-up, uncovered */
Select order_id, orderDate, orderData
From dbo.orders With (Index(IX_orders_aligned))
Where order_id = 30000;
 
Select order_id, orderDate, orderData
From dbo.orders With (Index(IX_orders_unpartitioned))
Where order_id = 30000;

 

Query 2

Query 2

Again we see that the non-partitioned index performs better with the single-record look-up. This can lead to some pretty dramatic performance implications. So when *would* we want to use a partitioned index? Two instances immediately pop to mind. First, partition switching can only be performed when all indexes on a table are aligned. Secondly, partitioned indexes perform better when manipulating large data sets. To see this in action, let’s try some simple aggregation…

/* Query 3, aggregation */
Select Cast(Round(Cast(orderdate As float), 0, 1) As smalldatetime) As 'order_date'
    , Count(*)
From dbo.orders With (Index(IX_orders_aligned))
Where orderDate Between '2009-01-01' And '2009-01-07 23:59'
Group By Cast(Round(Cast(orderdate As float), 0, 1) As smalldatetime) 
Order By Cast(Round(Cast(orderdate As float), 0, 1) As smalldatetime);
 
Select Cast(Round(Cast(orderdate As float), 0, 1) As smalldatetime) As 'order_date'
    , Count(*)
From dbo.orders With (Index(IX_orders_unpartitioned))
Where orderDate Between '2009-01-01' And '2009-01-07 23:59'
Group By Cast(Round(Cast(orderdate As float), 0, 1) As smalldatetime) 
Order By Cast(Round(Cast(orderdate As float), 0, 1) As smalldatetime);

 

Query 3

Query 3

As you can see, partitioned indexes perform better when aggregating data. This is just a simple example, but the results can be even more dramatic in a large production environment. This is one of the reasons why partitioned tables and indexes are especially beneficial in data warehouses.

So now you have a general idea of what a partitioned index is and when to use a partitioned index vs a non-partitioned index. Ultimately, your indexing needs will depend largely on the application and how the data is used. When in doubt, test, test, test! So to recap…

  • Specify “On [FileGroup]“ to create an unpartitioned index on a partitioned table
  • Consider using non-partitioned indexes for single-record look-ups
  • Use partitioned indexes for multiple records and data aggregations
  • To enable partition switching, all indexes on the table must be aligned.

For more information on partitioning, check out my other partitioning articles:

Partitioning Example
Partitioning 101
Tips for Large Data Stores

BCP Basics

In this blog post, I’m going to walk through the basics of BCP (bulk copy program). BCP is a utility that installs with SQL Server and can assist with large data transfers.

Let’s see what parameter options are available to use. From the command line on a machine with SQL Server installed, type “bcp” and press Enter.

BCP Parameters

You can find out more information on BCP parameters on Books Online: http://msdn.microsoft.com/en-us/library/ms162802.aspx

For now, we’re going to examine just the basics. The simplest syntax of a BCP command is:

bcp
databaseName.Schema.TableName *or* “Query”
in, out, *or* queryout
-S ServerName\instanceName
-U userName -P password *or* -T
-c *or* -n *or* specify storage information for each column

Let’s look at these options in a little more detail:

databaseName.Schema.TableName *or* Query
You can specify either an entire table to copy or a query. The query should be surrounded in quotations and must also include the fully qualified table name.

in, out, *or* queryout
in = import, out = full table export, queryout = query to select data for export

-U userName -P password *or* -T
You can either specify a specific account to access SQL Server, or use -T to indicate Trusted Connection (i.e. Windows Authentication)

-c *or* -n *or* specify storage information for each column
-c indicates character data type, -n indicates native data type; if neither one is specified, by default you will be prompted for the data type for each column.

Now let’s put this together and run some BCP commands. All of these examples will use the AdventureWorks 2008 sample database.

First, let’s export an entire table. To do this, we’ll use the “out” parameter.

bcp AdventureWorks.Sales.SalesOrderDetail out
C:\bcp_outputTable.txt -SYourServerName -T -c

 

Export Table with BCP

Export Table with BCP - Results

I don’t normally export an entire table… or at least, not in one process. So let’s walk through what it would look like to export the same table using a query. This will use the “queryout” parameter.

 

bcp "Select SalesOrderID, SalesOrderDetailID, OrderQty, ProductID
From AdventureWorks.Sales.SalesOrderDetail" queryout
C:\bcp_outputQuery.txt -SYourServerName -T -c

 

Export Query with BCP

Export Query with BCP - Results

You’ll notice that the total duration for the query was shorter than for the full-table export. This is because we’re only exporting a few of the columns. This is important to keep in mind when bcp’ing data: you’ll get better performance if you only export the data elements that you actually need.

Now that we’ve exported some data, let’s walk through the process of importing this data. First, let’s create a table with a constraint that will result in some errors.

Create Table dbo.testBCPLoad
(
      SalesOrderID          int      Not Null
    , SalesOrderDetailID    int      Not Null
    , OrderQty              smallint Null
    , ProductID             int      Null
 
    Constraint PK_testBCPLoad
        Primary Key Clustered
        (SalesOrderID)
);

Now execute the BCP import command:

bcp sandbox.dbo.testBCPLoad in
C:\bcp_outputQuery.txt -SYourServername -T -c

 

Load Data with BCP

Load Data with BCP - Error

You should receive a Primary Key error. When you check your results in SQL Server, you should find no results loaded into the table. This is BCP’s default behavior.

Check Destination Table

Let’s change our constraint and try the same BCP command again:

Alter Table dbo.testBCPLoad
    Drop Constraint PK_testBCPLoad;
 
Alter Table dbo.testBCPLoad
    Add Constraint PK_testBCPLoad
    Primary Key Clustered
        (SalesOrderID, SalesOrderDetailID);

 

bcp sandbox.dbo.testBCPLoad in
C:\bcp_outputQuery.txt -SYourServername -T -c

You should now have the data loaded into your SQL Server destination table:

Import Data with BCP - Results

Destination Table

So there you have it, the basics of BCP! :)

A few BCP tips:

  • BCP commands are case-sensitive!
  • If you’re accessing the data across a WAN, perhaps via a VPN connection, try to remote desktop (mstsc) to the actual SQL Server to perform the BCP. If possible, keep the operation on the same local drive or even local network as the server; the less distance data needs to travel across a network, the faster BCP will perform.
  • If you need to copy large amounts of data (i.e. >100mm rows), try breaking the data into smaller chunks. This will help if you have an error during BCP (i.e. a PK error can rollback the entire import operation by default, although there are options that can change this behavior). When working with partitioned tables, I find it very efficient to segregate the data imported/exported by partition.
  • If you’re BCP’ing data into a new table, you can minimize impact on the server by waiting to create your indexes after all the data is loaded.
  • I like to construct my queries in SSMS, then copy them to BCP. Since the command-line utility does not support copy and pasting, I create a text file with my BCP command in NotePad, then save the command as a .cmd. To execute, just call the .cmd file.

Bulk Inserts with XML

Last week, I blogged about how to perform one-to-many inserts with table-valued parameters, a feature new in 2008. For those who do not yet have 2008 or will not have it in the near future, it may still be beneficial to use XML for bulk inserts.

Here’s a pretty simple example of how to accomplish this:

/* Create some tables to work with */
CREATE TABLE dbo.orders
(
      order_id      INT IDENTITY(1,1)   Not Null
    , orderDate     DATETIME            Not Null
    , customer_id   INT                 Not Null
 
    CONSTRAINT PK_orders
        PRIMARY KEY CLUSTERED(order_id)
);
 
CREATE TABLE dbo.orderDetails
(
      orderDetail_id    INT IDENTITY(1,1)   Not Null
    , order_id          INT                 Not Null
    , lineItem          INT                 Not Null
    , product_id        INT                 Not Null
 
    CONSTRAINT PK_orderDetails
        PRIMARY KEY CLUSTERED(orderDetail_id)
 
    CONSTRAINT FK_orderDetails_orderID
        FOREIGN KEY(order_id)
        REFERENCES dbo.orders(order_id)
);
Go
 
/* Create a new procedure using an XML parameter */
CREATE PROCEDURE dbo.insert_orderXML_sp
      @orderDate        DATETIME
    , @customer_id      INT
    , @orderDetailsXML  XML
AS
BEGIN
 
    SET NOCOUNT ON;
 
    DECLARE @myOrderID INT;
 
    INSERT INTO dbo.orders
    (
          orderDate
        , customer_id    
    )
    VALUES
    (
          @orderDate
        , @customer_id
    );
 
    SET @myOrderID = SCOPE_IDENTITY();
 
    INSERT INTO dbo.orderDetails
    (
          order_id
        , lineItem
        , product_id
    )
    SELECT @myOrderID
         , myXML.value('./@lineItem', 'int')
         , myXML.value('./@product_id', 'int')
    FROM @orderDetailsXML.nodes('/orderDetail') As nodes(myXML);
 
    SET NOCOUNT OFF;
 
END
GO
 
 
/* Call our stored procedure */
EXECUTE dbo.insert_orderXML_sp
      @orderDate = '2008-01-01'
    , @customer_id = 101
    , @orderDetailsXML = 
        '<orderDetail lineItem="1" product_id="123" />
         <orderDetail lineItem="2" product_id="456" />
         <orderDetail lineItem="3" product_id="789" />
         <orderDetail lineItem="4" product_id="246" />
         <orderDetail lineItem="5" product_id="135" />';
 
 
/* Check our data */
SELECT * FROM dbo.orders;
SELECT * FROM dbo.orderDetails;
 
 
/* Clean up our mess */
DROP PROCEDURE insert_orderXML_sp;
DROP TABLE dbo.orderDetails;
DROP TABLE dbo.orders;

I’ve found that this is more efficient when performing large parent/child inserts, i.e. 1 parent record to 100 child records. Keep in mind that there’s a point where doing an insert with XML is more expensive than using a traditional INSERT stored procedure. I haven’t run any tests yet to help define what that threshold is… more to come on this in the near future.

More on the Nodes() Method can be found here in Books Online: http://msdn.microsoft.com/en-us/library/ms188282(SQL.90).aspx

Update: I’ve just learned that the “value” keyword is case-sensitive. Apparently my code box plug-in was defaulting “value” to “VALUE.” :)

Here’s the error message you’ll get if you don’t have “value” in lower-case:
Cannot find either column “myXML” or the user-defined function or aggregate “myXML.VALUE”, or the name is ambiguous.

Updated Index Defrag Script (2005, 2008)

Thanks to everyone who left a comment or sent me an e-mail regarding the Index Defrag Script. I’ve received some great feedback and requests for features. I’ve also had some questions regarding how to use it, which I will answer at the end of this post.

Changes include:
– separate version for both Enterprise and Standard editions
    - Standard edition removes partitioning and online options
– output option to see fragmentation levels
– page_count added to the log table

I’ve also verified that this script works well in SQL 2008.

Enterprise Version:

IF EXISTS(SELECT OBJECT_ID FROM sys.tables
                WHERE [name] = N'dba_indexDefragLog')
BEGIN
 
    DROP TABLE dbo.dba_indexDefragLog;
 
    PRINT 'dba_indexDefragLog table dropped!';
 
END
 
CREATE TABLE dbo.dba_indexDefragLog
(
      indexDefrag_id    INT IDENTITY(1,1)   NOT NULL
    , objectID          INT                 NOT NULL
    , objectName        NVARCHAR(130)       NOT NULL
    , indexID           INT                 NOT NULL
    , indexName         NVARCHAR(130)       NOT NULL
    , partitionNumber   smallint            not null
    , fragmentation     FLOAT               NOT NULL
    , page_count        INT                 NOT NULL
    , dateTimeStart     DATETIME            NOT NULL
    , durationSeconds   INT                 NOT NULL
    CONSTRAINT PK_indexDefragLog
        PRIMARY KEY CLUSTERED (indexDefrag_id)
);
 
PRINT 'dba_indexDefragLog Table Created';
 
IF OBJECTPROPERTY(OBJECT_ID('dbo.dba_indexDefrag_sp'),
        N'IsProcedure') IS Null
BEGIN
    EXECUTE ('Create Procedure dbo.dba_indexDefrag_sp
        As Print ''Hello World!''');
    RAISERROR('Procedure dba_indexDefrag_sp created.'
        , 10, 1);
END;
Go
 
SET ANSI_Nulls ON;
SET Ansi_Padding ON;
SET Ansi_Warnings ON;
SET ArithAbort ON;
SET Concat_Null_Yields_Null ON;
SET NOCOUNT ON;
SET Numeric_RoundAbort OFF;
SET Quoted_Identifier ON;
Go
 
ALTER PROCEDURE dbo.dba_indexDefrag_sp
 
    /* Declare Parameters */
      @minFragmentation     FLOAT           = 10.0
        /* in percent, will not defrag if fragmentation
           less than specified */
    , @rebuildThreshold     FLOAT           = 30.0
        /* in percent, greater than @rebuildThreshold
           will result in rebuild instead of reorg */
    , @onlineRebuild        bit             = 1     
        /* 1 = online rebuild; 0 = offline rebuild */
    , @executeSQL           BIT             = 1
        /* 1 = execute; 0 = print command only */
    , @tableName            VARCHAR(4000)   = Null 
        /* Option to specify a table name */
    , @printCommands        BIT             = 0
        /* 1 = print commands; 0 = do not print commands */
    , @printFragmentation   BIT             = 0
        /* 1 = print fragmentation prior to defrag; 
           0 = do not print */
    , @defragDelay          CHAR(8)         = '00:00:05'
        /* time to wait between defrag commands */
AS
/********************************************************************
    Name:       dba_indexDefrag_sp
 
    Author:     Michelle F. Ufford
 
    Purpose:    Defrags all indexes for the current database
 
    Notes:      This script was designed for SQL Server 2005
                Enterprise Edition.
 
    CAUTION: Monitor transaction log if executing for the first time!
 
      @minFragmentation     defaulted to 10%, will not defrag if
                            fragmentation if less than specified.
 
      @rebuildThreshold     defaulted to 30% as recommended by
                            Microsoft in BOL;
                            > than 30% will result in rebuild instead
 
      @onlineRebuild        1 = online rebuild; 
                            0 = offline rebuild
 
      @executeSQL           1 = execute the SQL generated by this proc;
                            0 = print command only
 
      @tableName            Specify if you only want to defrag indexes
                            for a specific table
 
      @printCommands        1 = print commands to screen;
                            0 = do not print commands
 
      @printFragmentation   1 = print fragmentation to screen;
                            0 = do not print fragmentation
 
      @defragDelay          time to wait between defrag commands;
                            gives the server some time to catch up
 
    Called by:  SQL Agent Job or DBA
 
    Date        Initials  Description
    ----------------------------------------------------------------
    2008-10-27  MFU       Initial Release
    2008-11-17  MFU       Added page_count to log table
                          , added @printFragmentation option
********************************************************************
    Exec dbo.dba_indexDefrag_sp
          @executeSQL         = 1
        , @printCommands      = 1
        , @minFragmentation   = 0
        , @printFragmentation = 1;
********************************************************************/
 
SET NOCOUNT ON;
SET XACT_Abort ON;
 
BEGIN
 
    /* Declare our variables */
    DECLARE   @objectID         INT
            , @indexID          INT
            , @partitionCount   BIGINT
            , @schemaName       NVARCHAR(130)
            , @objectName       NVARCHAR(130)
            , @indexName        NVARCHAR(130)
            , @partitionNumber  SMALLINT
            , @partitions       SMALLINT
            , @fragmentation    FLOAT
            , @pageCount        INT
            , @sqlCommand       NVARCHAR(4000)
            , @rebuildCommand   NVARCHAR(200)
            , @dateTimeStart    DATETIME
            , @dateTimeEnd      DATETIME
            , @containsLOB      BIT;
 
    /* Just a little validation... */
    IF @minFragmentation Not Between 0.00 And 100.0
        SET @minFragmentation = 10.0;
 
    IF @rebuildThreshold Not Between 0.00 And 100.0
        SET @rebuildThreshold = 30.0;
 
    IF @defragDelay Not Like '00:[0-5][0-9]:[0-5][0-9]'
        SET @defragDelay = '00:00:05';
 
    /* Determine which indexes to defrag using our
       user-defined parameters */
    SELECT
          OBJECT_ID AS objectID
        , index_id AS indexID
        , partition_number AS partitionNumber
        , avg_fragmentation_in_percent AS fragmentation
        , page_count 
        , 0 AS 'defragStatus'
            /* 0 = unprocessed, 1 = processed */
    INTO #indexDefragList
    FROM sys.dm_db_index_physical_stats
        (DB_ID(), OBJECT_ID(@tableName), NULL , NULL, N'Limited')
    WHERE avg_fragmentation_in_percent > @minFragmentation
        And index_id > 0
    OPTION (MaxDop 1);
 
    /* Create a clustered index to boost performance a little */
    CREATE CLUSTERED INDEX CIX_temp_indexDefragList
        ON #indexDefragList(objectID, indexID, partitionNumber);
 
    /* Begin our loop for defragging */
    WHILE (SELECT COUNT(*) FROM #indexDefragList
            WHERE defragStatus = 0) > 0
    BEGIN
 
        /* Grab the most fragmented index first to defrag */
        SELECT TOP 1
              @objectID         = objectID
            , @fragmentation    = fragmentation
            , @indexID          = indexID
            , @partitionNumber  = partitionNumber
            , @pageCount        = page_count
        FROM #indexDefragList
        WHERE defragStatus = 0
        ORDER BY fragmentation DESC;
 
        /* Look up index information */
        SELECT @objectName = QUOTENAME(o.name)
             , @schemaName = QUOTENAME(s.name)
        FROM sys.objects AS o
        Inner Join sys.schemas AS s
            ON s.schema_id = o.schema_id
        WHERE o.OBJECT_ID = @objectID;
 
        SELECT @indexName = QUOTENAME(name)
        FROM sys.indexes
        WHERE OBJECT_ID = @objectID
            And index_id = @indexID
            And type > 0;
 
        /* Determine if the index is partitioned */
        SELECT @partitionCount = COUNT(*)
        FROM sys.partitions
        WHERE OBJECT_ID = @objectID
            And index_id = @indexID;
 
        /* Look for LOBs */
        SELECT TOP 1
            @containsLOB = column_id
        FROM sys.columns WITH (NOLOCK)
        WHERE 
            [OBJECT_ID] = @objectID
            And (system_type_id In (34, 35, 99)
            -- 34 = image, 35 = text, 99 = ntext
                    Or max_length = -1);
            -- varbinary(max), varchar(max), nvarchar(max), xml
 
        /* See if we should rebuild or reorganize; handle thusly */
        IF @fragmentation < @rebuildThreshold And @partitionCount <= 1
            SET @sqlCommand = N'Alter Index ' + @indexName + N' On '
                + @schemaName + N'.' + @objectName + N' ReOrganize';
 
        IF @fragmentation >= @rebuildThreshold
            And IsNull(@containsLOB, 0) = 0
                -- Cannot rebuild if the table has one or more LOB
            And @partitionCount <= 1
        BEGIN
 
            /* We should always rebuild online if possible
                (SQL 2005 Enterprise) */
            IF @onlineRebuild = 0
                SET @rebuildCommand = N' Rebuild With
                    (Online = Off, MaxDop = 1)';
            ELSE
                SET @rebuildCommand = N' Rebuild With
                    (Online = On, MaxDop = 1)';
 
            SET @sqlCommand = N'Alter Index ' + @indexName + N' On '
                + @schemaName + N'.' + @objectName + @rebuildCommand;
        END;
 
        /* If our index is partitioned, we should always reorganize */
        IF @partitionCount > 1
            SET @sqlCommand = N'Alter Index ' + @indexName + N' On '
                + @schemaName + N'.' + @objectName + N' ReOrganize'
                + N' Partition = '
                + CAST(@partitionNumber AS NVARCHAR(10));
                -- no MaxDop needed, single threaded operation
 
        /* Are we executing the SQL?  If so, do it */
        IF @executeSQL = 1
        BEGIN
 
            /* Grab the time for logging purposes */
            SET @dateTimeStart  = GETDATE();
            EXECUTE (@sqlCommand);
            SET @dateTimeEnd  = GETDATE();
 
            /* Log our actions */
            INSERT INTO dbo.dba_indexDefragLog
            (
                  objectID
                , objectName
                , indexID
                , indexName
                , partitionNumber
                , fragmentation
                , page_count
                , dateTimeStart
                , durationSeconds
            )
            SELECT
                  @objectID
                , @objectName
                , @indexID
                , @indexName
                , @partitionNumber
                , @fragmentation
                , @pageCount
                , @dateTimeStart
                , DATEDIFF(SECOND, @dateTimeStart, @dateTimeEnd);
 
            /* Just a little breather for the server */
            WAITFOR Delay @defragDelay;
 
            /* Print if specified to do so */
            IF @printCommands = 1
                PRINT N'Executed: ' + @sqlCommand;
        END
        ELSE
        /* Looks like we're not executing, just print
            the commands */
        BEGIN
            IF @printCommands = 1
                PRINT @sqlCommand;
        END
 
        /* Update our index defrag list when we've
            finished with that index */
        UPDATE #indexDefragList
        SET defragStatus = 1
        WHERE objectID         = @objectID
          And indexID          = @indexID
          And partitionNumber  = @partitionNumber;
 
    END
 
    /* Do we want to output our fragmentation results? */
    If @printFragmentation = 1
        Select idl.objectID
            , o.name As 'tableName'
            , idl.indexID
            , i.name As 'indexName'
            , idl.fragmentation
            , idl.page_count
        From #indexDefragList As idl
        Join sys.objects AS o
            On idl.objectID = o.object_id
        Join sys.indexes As i
            On idl.objectID = i.object_id
            And idl.indexID = i.index_id;
 
    /* When everything is done, make sure to get rid of
        our temp table */
    DROP TABLE #indexDefragList;
 
    SET NOCOUNT OFF;
    RETURN 0
END
Go

 

Standard Version:

IF EXISTS(SELECT OBJECT_ID FROM sys.tables
                WHERE [name] = N'dba_indexDefragLog')
BEGIN
 
    DROP TABLE dbo.dba_indexDefragLog;
 
    PRINT 'dba_indexDefragLog table dropped!';
 
END
 
CREATE TABLE dbo.dba_indexDefragLog
(
      indexDefrag_id    INT IDENTITY(1,1)   NOT NULL
    , objectID          INT                 NOT NULL
    , objectName        NVARCHAR(130)       NOT NULL
    , indexID           INT                 NOT NULL
    , indexName         NVARCHAR(130)       NOT NULL
    , fragmentation     FLOAT               NOT NULL
    , page_count        INT                 NOT NULL
    , dateTimeStart     DATETIME            NOT NULL
    , durationSeconds   INT                 NOT NULL
    CONSTRAINT PK_indexDefragLog
        PRIMARY KEY CLUSTERED (indexDefrag_id)
);
 
PRINT 'dba_indexDefragLog Table Created';
 
IF OBJECTPROPERTY(OBJECT_ID('dbo.dba_indexDefragStandard_sp'),
        N'IsProcedure') IS Null
BEGIN
    EXECUTE ('Create Procedure dbo.dba_indexDefragStandard_sp
        As Print ''Hello World!''');
    RAISERROR('Procedure dba_indexDefragStandard_sp created.'
        , 10, 1);
END;
Go
 
SET ANSI_Nulls ON;
SET Ansi_Padding ON;
SET Ansi_Warnings ON;
SET ArithAbort ON;
SET Concat_Null_Yields_Null ON;
SET NOCOUNT ON;
SET Numeric_RoundAbort OFF;
SET Quoted_Identifier ON;
Go
 
ALTER PROCEDURE dbo.dba_indexDefragStandard_sp
 
    /* Declare Parameters */
      @minFragmentation     FLOAT           = 10.0
        /* in percent, will not defrag if fragmentation
           less than specified */
    , @rebuildThreshold     FLOAT           = 30.0
        /* in percent, greater than @rebuildThreshold
           will result in rebuild instead of reorg */
    , @executeSQL           BIT             = 1
        /* 1 = execute; 0 = print command only */
    , @tableName            VARCHAR(4000)   = Null 
        /* Option to specify a table name */
    , @printCommands        BIT             = 0
        /* 1 = print commands; 0 = do not print commands */
    , @printFragmentation   BIT             = 0
        /* 1 = print fragmentation prior to defrag; 
           0 = do not print */
    , @defragDelay          CHAR(8)         = '00:00:05'
        /* time to wait between defrag commands */
AS
/********************************************************************
    Name:       dba_indexDefragStandard_sp
 
    Author:     Michelle F. Ufford
 
    Purpose:    Defrags all indexes for the current database
 
    Notes:      This script was designed for SQL Server 2005
                Standard edition.
 
    CAUTION: Monitor transaction log if executing for the first time!
 
      @minFragmentation     defaulted to 10%, will not defrag if
                            fragmentation if less than specified.
 
      @rebuildThreshold     defaulted to 30% as recommended by
                            Microsoft in BOL;
                            > than 30% will result in rebuild instead
 
      @executeSQL           1 = execute the SQL generated by this proc;
                            0 = print command only
 
      @tableName            Specify if you only want to defrag indexes
                            for a specific table
 
      @printCommands        1 = print commands to screen;
                            0 = do not print commands
 
      @printFragmentation   1 = print fragmentation to screen;
                            0 = do not print fragmentation
 
      @defragDelay          time to wait between defrag commands;
                            gives the server some time to catch up
 
    Called by:  SQL Agent Job or DBA
 
    Date        Initials  Description
    ----------------------------------------------------------------
    2008-10-27  MFU       Initial Release
    2008-11-17  MFU       Added page_count to log table
                          , added @printFragmentation option
********************************************************************
    Exec dbo.dba_indexDefragStandard_sp
          @executeSQL         = 1
        , @printCommands      = 1
        , @minFragmentation   = 0
        , @printFragmentation = 1;
********************************************************************/
 
SET NOCOUNT ON;
SET XACT_Abort ON;
 
BEGIN
 
    /* Declare our variables */
    DECLARE   @objectID         INT
            , @indexID          INT
            , @schemaName       NVARCHAR(130)
            , @objectName       NVARCHAR(130)
            , @indexName        NVARCHAR(130)
            , @fragmentation    FLOAT
            , @pageCount        INT
            , @sqlCommand       NVARCHAR(4000)
            , @rebuildCommand   NVARCHAR(200)
            , @dateTimeStart    DATETIME
            , @dateTimeEnd      DATETIME
            , @containsLOB      BIT;
 
    /* Just a little validation... */
    IF @minFragmentation Not Between 0.00 And 100.0
        SET @minFragmentation = 10.0;
 
    IF @rebuildThreshold Not Between 0.00 And 100.0
        SET @rebuildThreshold = 30.0;
 
    IF @defragDelay Not Like '00:[0-5][0-9]:[0-5][0-9]'
        SET @defragDelay = '00:00:05';
 
    /* Determine which indexes to defrag using our
       user-defined parameters */
    SELECT
          OBJECT_ID AS objectID
        , index_id AS indexID
        , avg_fragmentation_in_percent AS fragmentation
        , page_count 
        , 0 AS 'defragStatus'
            /* 0 = unprocessed, 1 = processed */
    INTO #indexDefragList
    FROM sys.dm_db_index_physical_stats
        (DB_ID(), OBJECT_ID(@tableName), NULL , NULL, N'Limited')
    WHERE avg_fragmentation_in_percent > @minFragmentation
        And index_id > 0
    OPTION (MaxDop 1);
 
    /* Create a clustered index to boost performance a little */
    CREATE CLUSTERED INDEX CIX_temp_indexDefragList
        ON #indexDefragList(objectID, indexID);
 
    /* Begin our loop for defragging */
    WHILE (SELECT COUNT(*) FROM #indexDefragList
            WHERE defragStatus = 0) > 0
    BEGIN
 
        /* Grab the most fragmented index first to defrag */
        SELECT TOP 1
              @objectID         = objectID
            , @fragmentation    = fragmentation
            , @indexID          = indexID
            , @pageCount        = page_count
        FROM #indexDefragList
        WHERE defragStatus = 0
        ORDER BY fragmentation DESC;
 
        /* Look up index information */
        SELECT @objectName = QUOTENAME(o.name)
             , @schemaName = QUOTENAME(s.name)
        FROM sys.objects AS o
        Inner Join sys.schemas AS s
            ON s.schema_id = o.schema_id
        WHERE o.OBJECT_ID = @objectID;
 
        SELECT @indexName = QUOTENAME(name)
        FROM sys.indexes
        WHERE OBJECT_ID = @objectID
            And index_id = @indexID
            And type > 0;
 
        /* Look for LOBs */
        SELECT TOP 1
            @containsLOB = column_id
        FROM sys.columns WITH (NOLOCK)
        WHERE 
            [OBJECT_ID] = @objectID
            And (system_type_id In (34, 35, 99)
            -- 34 = image, 35 = text, 99 = ntext
                    Or max_length = -1);
            -- varbinary(max), varchar(max), nvarchar(max), xml
 
        /* See if we should rebuild or reorganize; handle thusly */
        IF @fragmentation < @rebuildThreshold 
            Or IsNull(@containsLOB, 0) > 0 
            -- Cannot rebuild if the table has one or more LOB
            SET @sqlCommand = N'Alter Index ' + @indexName + N' On '
                + @schemaName + N'.' + @objectName + N' ReOrganize;'
        ELSE
            SET @sqlCommand = N'Alter Index ' + @indexName + N' On '
                + @schemaName + N'.' + @objectName +  ' Rebuild '
                + 'With (MaxDop = 1)'; -- minimize impact on server
 
        /* Are we executing the SQL?  If so, do it */
        IF @executeSQL = 1
        BEGIN
 
            /* Grab the time for logging purposes */
            SET @dateTimeStart  = GETDATE();
            EXECUTE (@sqlCommand);
            SET @dateTimeEnd  = GETDATE();
 
            /* Log our actions */
            INSERT INTO dbo.dba_indexDefragLog
            (
                  objectID
                , objectName
                , indexID
                , indexName
                , fragmentation
                , page_count
                , dateTimeStart
                , durationSeconds
            )
            SELECT
                  @objectID
                , @objectName
                , @indexID
                , @indexName
                , @fragmentation
                , @pageCount
                , @dateTimeStart
                , DATEDIFF(SECOND, @dateTimeStart, @dateTimeEnd);
 
            /* Just a little breather for the server */
            WAITFOR Delay @defragDelay;
 
            /* Print if specified to do so */
            IF @printCommands = 1
                PRINT N'Executed: ' + @sqlCommand;
        END
        ELSE
        /* Looks like we're not executing, just print
            the commands */
        BEGIN
            IF @printCommands = 1
                PRINT @sqlCommand;
        END
 
        /* Update our index defrag list when we've
            finished with that index */
        UPDATE #indexDefragList
        SET defragStatus = 1
        WHERE objectID  = @objectID
          And indexID   = @indexID;
 
    END
 
    /* Do we want to output our fragmentation results? */
    IF @printFragmentation = 1
        SELECT idl.objectID
            , o.name As 'tableName'
            , idl.indexID
            , i.name As 'indexName'
            , idl.fragmentation
            , idl.page_count
        FROM #indexDefragList AS idl
        JOIN sys.objects AS o
            ON idl.objectID = o.object_id
        JOIN sys.indexes As i
            ON idl.objectID = i.object_id
            AND idl.indexID = i.index_id;
 
    /* When everything is done, make sure to get rid of
        our temp table */
    DROP TABLE #indexDefragList;
 
    SET NOCOUNT OFF;
    RETURN 0
END
Go

 

For those who are having troubles with this script…

1) “Not all of my indexes were defragged!” or “Nothing happened when I executed this script.”

This script will only defrag those indexes that surpass the specified threshold. If you’re not seeing your index in the output, try executing this:

    Exec dbo.dba_indexDefrag_sp
          @executeSQL    = 0
        , @printCommands = 1
        , @minFragmentation = 0
        , @printFragmentation = 1;

Check to see what your index’s fragmentation level is. Maybe it’s not as fragmented as you feared. :)

2) “My indexes are still fragmented after running this script.”

To quote The Powers That Be (aka Microsoft)…

“In general, fragmentation on small indexes is often not controllable. The pages of small indexes are stored on mixed extents. Mixed extents are shared by up to eight objects, so the fragmentation in a small index might not be reduced after reorganizing or rebuilding the index.” — Reorganizing and Rebuilding Indexes

3) “Can I use this in my production environment?”

That really depends on your environment. I’ve successfully used this in some very large production environments. However, I wouldn’t exactly recommend executing the script in the middle of a business day on a billion+ row, heavily fragmented, unpartitioned table, either. :)

If you’re not sure what the impact will be, execute the commands-only version of the script…

    Exec dbo.dba_indexDefrag_sp
	      @executeSQL    = 0
        , @printCommands = 1
        , @printFragmentation = 1;

… then execute the statements one at a time. Make sure you monitor tempdb and the transaction log to ensure you don’t have any space issues.

If you have any additional questions or suggestions for this script, leave me a comment and I’ll be happy to help. :)

Performance Comparison of Singleton, XML, and TVP Inserts

As promised, today I took a look at the performance of bulk inserts using XML and Table-Valued Parameters. I also compared it against singleton inserts to show the value in the bulk-insert approach.

My tests were pretty simple: insert 100 records using each method. Each test was executed 10 times to ensure consistency. The duration was recorded in microseconds.

The goal was to compare the performance of the inserts. Because I was executing this entire test within SQL Server, I had to isolate only the actual insert transactions and ignore everything else, such as the loading of the data; that work would normally be performed by the calling application.

So without further ado… screenshots of the Profiler traces: (click to enlarge)

TVP

Single Insert Method

TVP

XML Method

TVP

Table-Valued Parameter Method

Summary

Method Avg CPU Avg Reads Avg Writes Avg Duration (micro)
Singleton Method 3 202 0 13378
XML Method 0 222 0 3124
TVP Method 1 207 0 780

 

As expected, both the XML and the TVP method performed significantly better than the single-insert method. As hoped, the table-valued parameter arguably performed the best of all 3.

Row Concatenation in T-SQL

Have you ever needed to create a comma-delimited list of related records, grouped by a parent record? This is one of those business requirements that just happens to pop up every so often. For example, turning this:

AuthorID    BookName
1 “Great Expectations”
1 “David Copperfield”
1 “Bleak House”
2 “Divine Comedy”
3 “Pride and Prejudice”
3 “Mansfield Park”
   

into this:

AuthorID    ListOfBooks
1 “Great Expectations”, “David Copperfield”, “Bleak House”
2 “Divine Comedy”
3 “Pride and Prejudice”, “Mansfield Park”
   

There’s a handful of ways you can handle this, especially when dealing with small data sets. However, the problem becomes a bit more tricky when dealing with large record sets (i.e. hundreds of thousands or even millions of records). My first attempt at a solution used a CTE (common table expression). It did the job but was very slow. Looking for a better solution, I discovered the XML method.

To give you a quick example:

/* Create a table variable to play with */
Declare @myTable Table 
    (customerID int, textData varchar(10));
 
/* Populate some test rescords */
Insert Into @myTable
Select 1, 'abc' Union All
Select 1, 'def' Union All
Select 2, 'uvw' Union All
Select 2, 'xyz'
 
/* Just take a look at the data
   before we format it */
Select * From @myTable;
 
/* Let's take a look at what
   For XML Raw will return 
   for us */
Select textData
From @myTable
Where customerID = 1
Order By textData
For XML Raw;
 
/* Now consolidate the data, using
   the For XML Raw option to 
   concatenate the textData column */
Select customerID
        , Replace( Replace( Replace(
            (   Select textData
                From @myTable As a
                Where a.customerID = b.customerID
                Order By textData
                For XML Raw)
                , '"/><row textData="', ', ')
                , '<row textData="', '')
                , '"/>', '')
            As 'textData'
From @myTable b
Group By customerID;

This has become my default method for handling this report requirement. While discussing this with a colleague, he mentioned using an approach that was similar in design but used Cross Apply on the XML. Wanting to see which performed better, I ran the two following queries in the AdventureWorks sample database (2008):

/* Method 1 */
Select ProductsOrdered
    , Count(*) As 'salesOrders'
From (
    Select SalesOrderID
            , Replace( Replace( Replace(
                (   Select Top 10 ProductID
                    From Sales.SalesOrderDetail As sod With (NoLock)
                    Where soh.SalesOrderID = sod.SalesOrderID
                    Order By ProductID
                    For XML Raw)
                    , '"/><row ProductID="', ', ')
                    , '<row ProductID="', '')
                    , '"/>', '')
                As 'ProductsOrdered'
    From Sales.SalesOrderHeader As soh With (NoLock)
) x
Group By ProductsOrdered
Order By Count(*) Desc
Option (MaxDop 1);
 
/* Method 2 */
Select ProductsOrdered
    , Count(*) As 'salesOrders'
From (
    Select SalesOrderID
        , SubString(ProductsOrdered, 1, Len(ProductsOrdered) - 1) 
              As 'ProductsOrdered'
    From Sales.SalesOrderHeader As soh With (NoLock)
    Cross Apply (
                    Select Top 10 
                        Cast(ProductID As varchar(10)) + ', '
                    From Sales.SalesOrderDetail As sod With (NoLock)
                    Where sod.SalesOrderID = soh.SalesOrderID
                    Order By ProductID
                    For XML Path('')
                ) X(ProductsOrdered)
) x
Group By ProductsOrdered
Order By Count(*) Desc;

Here’s the results: (click to enlarge)

Row Concatenation Comparison

Row Concatenation Comparison

As evidenced in the image above, the first method has slightly higher CPU and double the duration. The 2nd method had almost double the writes and significantly more reads.

I was hoping for a clear winner, but apparently each method has its benefits. I’ll probably continue to stick with my original, more resource-friendly method (Method 1), unless someone suggests a better solution. :)

Partitioning 101

A few posts ago, I briefly discussed partitioning. This article will continue that discussion and focus on horizontal partitioning within SQL Server 2005.

What is partitioning?

As I’ve mentioned before, horizontal partitioning is the physical segregation of a single logical table name into multiple, *identical* physical structures. In SQL 2005, all table and index pages actually exist on a partition. However, if you’ve never created a partitioning scheme, then by default each of your tables and indexes contain just a single partition.

I’ll also be talking about aligned indexes and tables, so let’s go ahead and define that here. Two tables sharing the same partitioning function and the same partitioning key are aligned. Similarly, a partitioned index on a partitioned table sharing the same partitioning key is called an aligned index. Having tables and indexes in alignment can reduce CPU and improve parallel plan execution.

Why partitioning?

The two most common reasons to implement partitioning are performance improvement and ease of archiving. Let’s talk about performance improvement first. We made the decision to implement partitioning in one of our data warehouses because of the size of the tables. We had a couple of tables that exceeded a billion rows, and several tables that were fast approaching that line. Upon deploying partitioning, CPU utilization dropped by ~25%, and our regular CPU and disk spikes caused by reporting processes were all but eliminated. Many of our stored procedures experienced reduced durations and reads in the range of 30-60%, and some had even greater improvements.

Why were we seeing this improvement? Many of the stored procedures were accessing data for recent dates. Knowing this helped us to decide on a weekly partitioning scheme. Stored procedures joining on aligned tables were now only accessing a small portion of the data, resulting in the aforementioned improvements. In addition, ad hocs that performed index scans now completed in a fraction of the time because the amount of data being scanned was significantly reduced.

Table maintenance also saw drastic improvement. It was previously unheard of to attempt an online index defrag on some of the largest tables without first scheduling down-time. Now, these same tables were being defragged nightly in one minute or less, without issue. This is because index defrag operations can be performed by partition.  This is a great resource save; obviously, once a partition is no longer written to and is fully defragmented, it will not become re-fragmented, so there is no need to waste resources on it. Refer to my Index Defrag Script for an example on how to automate this process.

Another great feature of partitioned tables is the ability to swap partitions in and out. I took advantage of this feature with great success when backfilling data. My backfill strategy primarily consisted of bcp’ing data in weekly chunks to a staging database. The table I was backfilling had a text column where one record happened to contain the column deliminator for the bcp file. This of course wasn’t caught until auditing data counts prior to deployment, when it was noticed that one week was short about ~2mm records. Now, this was ~2mm records out of ~800mm total records. The solution? Re-run the bcp process, using a different deliminator, dump the data in a staging table, and swap out the partition. The actual swap out took 2 seconds and there was no impact to the server. It was beautiful.

Swapping out partitions is particularly beneficial for archiving. Instead of writing a batch process to loop through and delete expired data, which is both time consuming and resource intensive, a partition can be swapped out and dropped in a matter of seconds. Keep in mind that, in SQL Server 2005, partitions can only be swapped out on non-replicated tables. SQL 2008 introduces functionality to allow partition swapping on a replicated table.  Check out Replicating Partitioned Tables and Indexes.

That sounds great! Where do I sign up?

There are a few things to consider prior to implementing partitioning in your environment. One is the performance hit to writes. I’ve noticed around a 10% decrease in write performance in my environment. This is partly due to the overhead required to look up which partition the record should be written to. For this reason, I only partition tables that I expect to grow quickly (my rule of thumb is >10mm records per week), or tables that have regular archiving needs.

There’s also a performance hit when querying over many partitions. Basically, think of a giant UNION ALL statement that occurs every time more than one partition is accessed. With our production hardware, we noticed that duration was still less until you hit around 30 partitions. After that, the duration was greater for partitioned tables versus non-partitioned, but reads were still less overall. This will probably vary depending on your environment, so run some tests and make sure you know what kind of impact to expect. This will also help you to determine your partitioning scheme. For example, you wouldn’t want to partition on a date column using a weekly scheme if every report needs to access data for the previous 12 months.

Also make sure you have a good candidate for partitioning. That is, do not partition on a column that is frequently changed or is written non-sequentially. Both can cause overhead to your environment and may negate the benefits of partitioning. Keep in mind that when the value of the partitioning key is changed, the record is not relocated to the correct partition. Some ideal keys include surrogate identity and sequential datetime columns.

Other things to know…

• The partitioning function requires the declaration of a parameter data type. To partition a table, the partitioning key is defined during new table creation. The data types of the partitioning keys must match. For example, if you create a partitioning function using a datetime data type and try to apply it to a table with a smalldatetime column as the partitioning key, the CREATE TABLE statement will fail.

• Ensure that the partitioning key is *always* used in a query on a partitioned table. If the partitioning key is not included in the query search criteria, then the partitioned table will perform worse than a non-partitioned table.

• By default, any index created on a partitioned table is also partitioned. This is useful for creating aligned indexes, which are typically desired. Sometimes, however, you may want to create a non-aligned or non-partitioned index. In these cases, make sure to explicitly declare the desired filegroup during the index creation process.

• Temporary tables can also be partitioned; however, this would probably not make sense in most circumstances.

• Partitioning functions need to be monitored to ensure you don’t run out of defined ranges; if that were to happen, you’d be storing all new data in one really large partition, which wouldn’t be pleasant.

Thanks for wading through this long post. I hope the information contained proves informative and helpful. I’ll be following this post with some actual code to help illustrate some of what I’ve covered here.

Regards,

Michelle

Index Interrogation Script

As a continuation of the last post on indexes, here’s a script I wrote for index interrogation.  This stored procedure was the result of repeatedly having to lookup information on included columns.

If ObjectProperty(Object_ID('dbo.dba_indexLookup_sp'), 
        N'IsProcedure') Is Null
Begin
    Execute ('Create Procedure dbo.dba_indexLookup_sp 
        As Print ''Hello World!''')
    RaisError('Procedure dbo.dba_indexLookup_sp created.', 10, 1);
End;
Go
 
Set ANSI_Nulls On;
Set Ansi_Padding On;
Set Ansi_Warnings On;
Set ArithAbort On;
Set Concat_Null_Yields_Null On;
Set NoCount On;
Set Numeric_RoundAbort Off;
Set Quoted_Identifier On;
Go
 
Alter Procedure dbo.dba_indexLookup_sp
 
        /* Declare Parameters */
        @tableName  varchar(128)  =  Null
As
/***********************************************************************
    Name:       dba_indexLookup_sp
 
    Author:     Michelle F. Ufford
 
    Purpose:    Retrieves index information for the specified table.
 
    Notes:      If the tableName is left null, it will return index 
                information for all tables and indexes.
 
    Called by:  DBA
 
    Date        User   Description
    --------------------------------------------------------------------
    2008-10-28  MFU    Initial Release
************************************************************************
    Exec dbo.dba_indexLookup_sp
        @tableName = 'myTableName';
***********************************************************************/
 
Set NoCount On;
Set XACT_Abort On;
 
Begin
 
    Declare @objectID int;
 
    If @tableName Is Not Null
        Set @objectID = Object_ID(@tableName);
 
    With indexCTE(partition_scheme_name
                , partition_function_name
                , data_space_id)
    As (
        Select sps.name
            , spf.name
            , sps.data_space_id
        From sys.partition_schemes As sps
        Join sys.partition_functions As spf
            On sps.function_id = spf.function_id
    )
 
    Select st.name As 'table_name'
        , IsNull(ix.name, '') As 'index_name'
        , ix.object_id
        , ix.index_id
		, Cast(
            Case When ix.index_id = 1 
                    Then 'clustered' 
                When ix.index_id =0
                    Then 'heap'
                Else 'nonclustered' End
			+ Case When ix.ignore_dup_key <> 0 
                Then ', ignore duplicate keys' 
                    Else '' End
			+ Case When ix.is_unique <> 0 
                Then ', unique' 
                    Else '' End
			+ Case When ix.is_primary_key <> 0 
                Then ', primary key' Else '' End As varchar(210)
            ) As 'index_description'
        , IsNull(Replace( Replace( Replace(
            (   
                Select c.name As 'columnName'
                From sys.index_columns As sic
                Join sys.columns As c 
                    On c.column_id = sic.column_id 
                    And c.object_id = sic.object_id
                Where sic.object_id = ix.object_id
                    And sic.index_id = ix.index_id
                    And is_included_column = 0
                Order By sic.index_column_id
                For XML Raw)
                , '"/><row columnName="', ', ')
                , '<row columnName="', '')
                , '"/>', ''), '')
            As 'indexed_columns'
        , IsNull(Replace( Replace( Replace(
            (   
                Select c.name As 'columnName'
                From sys.index_columns As sic
                Join sys.columns As c 
                    On c.column_id = sic.column_id 
                    And c.object_id = sic.object_id
                Where sic.object_id = ix.object_id
                    And sic.index_id = ix.index_id
                    And is_included_column = 1
                Order By sic.index_column_id
                For XML Raw)
                , '"/><row columnName="', ', ')
                , '<row columnName="', '')
                , '"/>', ''), '')
            As 'included_columns'
        , IsNull(cte.partition_scheme_name, '') 
            As 'partition_scheme_name'
        , Count(partition_number) As 'partition_count'
        , Sum(rows) As 'row_count'
    From sys.indexes As ix
    Join sys.partitions As sp
        On ix.object_id = sp.object_id
        And ix.index_id = sp.index_id
    Join sys.tables As st
        On ix.object_id = st.object_id
    Left Join indexCTE As cte
        On ix.data_space_id = cte.data_space_id
    Where ix.object_id = IsNull(@objectID, ix.object_id)
    Group By st.name
        , IsNull(ix.name, '')
        , ix.object_id
        , ix.index_id
		, Cast(
            Case When ix.index_id = 1 
                    Then 'clustered' 
                When ix.index_id =0
                    Then 'heap'
                Else 'nonclustered' End
			+ Case When ix.ignore_dup_key <> 0 
                Then ', ignore duplicate keys' 
                    Else '' End
			+ Case When ix.is_unique <> 0 
                Then ', unique' 
                    Else '' End
			+ Case When ix.is_primary_key <> 0 
                Then ', primary key' Else '' End As varchar(210)
            )
        , IsNull(cte.partition_scheme_name, '')
        , IsNull(cte.partition_function_name, '')
    Order By table_name
        , index_id;
 
    Set NoCount Off;
    Return 0;
End
Go

Index Defrag Script

UPDATE: This script has been significantly updated. Please view the latest version at http://sqlfool.com/2011/06/index-defrag-script-v4-1/.

Previously I posted that it’s important to keep your indexes in shape with weekly or (preferably) nightly defrags. Below is a script I use to execute nightly defrags in SQL 2005 Enterprise. I can’t claim complete credit… I believe this is a script I originally found on Microsoft and heavily modified to meet my needs.

You may want to modify the script if you’re doing much with XML or LOB’s. Also keep in mind that this is hitting the sys.dm_db_index_physical_stats table to view fragmentation information, which can be resource intensive.

If you’re not familiar with index defragmentation, check out “Alter Index” on Books Online.

If Not Exists(Select object_id From sys.tables 
                Where [name] = N'dba_indexDefragLog')
Begin
 
    Create Table dbo.dba_indexDefragLog
    (
          indexDefrag_id    int identity(1,1)   not null
        , objectID          int                 not null
        , objectName        nvarchar(130)       not null
        , indexID           int                 not null
        , indexName         nvarchar(130)       not null
        , partitionNumber   smallint            not null
        , fragmentation     float               not null
        , dateTimeStart     datetime            not null
        , durationSeconds   int                 not null
        Constraint PK_indexDefragLog 
            Primary Key Clustered (indexDefrag_id)
    );
 
    Print 'dba_indexDefragLog Table Created';
End
 
If ObjectProperty(Object_ID('dbo.dba_indexDefrag_sp'), 
        N'IsProcedure') Is Null
Begin
    Execute ('Create Procedure dbo.dba_indexDefrag_sp As 
                Print ''Hello World!''')
    RaisError('Procedure dba_indexDefrag_sp created.', 10, 1);
End;
Go
 
Set ANSI_Nulls On;
Set Ansi_Padding On;
Set Ansi_Warnings On;
Set ArithAbort On;
Set Concat_Null_Yields_Null On;
Set NoCount On;
Set Numeric_RoundAbort Off;
Set Quoted_Identifier On;
Go
 
Alter Procedure dbo.dba_indexDefrag_sp
 
    /* Declare Parameters */
      @minFragmentation     float           = 10.0  
        /* in percent, will not defrag if fragmentation 
           less than specified */
    , @rebuildThreshold     float           = 30.0  
        /* in percent, greater than @rebuildThreshold 
           will result in rebuild instead of reorg */
    , @onlineRebuild        bit             = 1     
        /* 1 = online rebuild; 0 = offline rebuild */
    , @executeSQL           bit             = 1     
        /* 1 = execute; 0 = print command only */
    , @tableName            varchar(4000)   = Null  
        /* Option to specify a table name */
    , @printCommands        bit             = 0     
        /* 1 = print commands; 0 = do not print commands */
    , @defragDelay          char(8)         = '00:00:05'
        /* time to wait between defrag commands */
As
/***********************************************************************
    Name:       dba_indexDefrag_sp
 
    Author:     Michelle F. Ufford
 
    Purpose:    Defrags all indexes for the current database
 
    Notes:
 
    CAUTION: Monitor transaction log when executing for the first time!
 
      @minFragmentation     defaulted to 10%, will not defrag if 
                            fragmentation if less than specified 
 
      @rebuildThreshold     defaulted to 30% as recommended by 
                            Microsoft in BOL;
                            > than 30% will result in rebuild instead
 
      @onlineRebuild        1 = online rebuild; 
                            0 = offline rebuild
 
      @executeSQL           1 = execute the SQL generated by this proc; 
                            0 = print command only
 
      @tableName            Specify if you only want to defrag indexes 
                            for a specific table
 
      @printCommands        1 = print commands to screen; 
                            0 = do not print commands
 
      @defragDelay          time to wait between defrag commands; 
                            gives the server a little time to catch up 
 
    Called by:  SQL Agent Job or DBA
 
    Date        Initials	Description
    -------------------------------------------------------------------
    2008-10-27  MFU         Initial Release
***********************************************************************
    Exec dbo.dba_indexDefrag_sp
	      @executeSQL    = 0
        , @printCommands = 1;
***********************************************************************/
 
Set NoCount On;
Set XACT_Abort On;
 
Begin
 
    /* Declare our variables */
    Declare   @objectID         int
            , @indexID          int
            , @partitionCount   bigint
            , @schemaName       nvarchar(130)
            , @objectName       nvarchar(130)
            , @indexName        nvarchar(130)
            , @partitionNumber  smallint
            , @partitions       smallint
            , @fragmentation    float
            , @sqlCommand       nvarchar(4000)
            , @rebuildCommand   nvarchar(200)
            , @dateTimeStart    datetime
            , @dateTimeEnd      datetime
            , @containsLOB      bit;
 
    /* Just a little validation... */
    If @minFragmentation Not Between 0.00 And 100.0
        Set @minFragmentation = 10.0;
 
    If @rebuildThreshold Not Between 0.00 And 100.0
        Set @rebuildThreshold = 30.0;
 
    If @defragDelay Not Like '00:[0-5][0-9]:[0-5][0-9]'
        Set @defragDelay = '00:00:05';
 
    /* Determine which indexes to defrag using our 
       user-defined parameters */
    Select
          Object_ID AS objectID
        , index_id AS indexID
        , partition_number AS partitionNumber
        , avg_fragmentation_in_percent AS fragmentation
        , 0 As 'defragStatus' 
            /* 0 = unprocessed, 1 = processed */
    Into #indexDefragList
    From sys.dm_db_index_physical_stats 
        (DB_ID(), Object_Id(@tableName), NULL , NULL, N'Limited')
    Where avg_fragmentation_in_percent > @minFragmentation 
        And index_id > 0
    Option (MaxDop 1);
 
    /* Create a clustered index to boost performance a little */
    Create Clustered Index CIX_temp_indexDefragList
        On #indexDefragList(objectID, indexID, partitionNumber);
 
    /* Begin our loop for defragging */
    While (Select Count(*) From #indexDefragList 
            Where defragStatus = 0) > 0
    Begin
 
        /* Grab the most fragmented index first to defrag */
        Select Top 1 
              @objectID         = objectID
            , @fragmentation    = fragmentation
            , @indexID          = indexID
            , @partitionNumber  = partitionNumber
        From #indexDefragList
        Where defragStatus = 0
        Order By fragmentation Desc;
 
        /* Look up index information */
        Select @objectName = QuoteName(o.name)
             , @schemaName = QuoteName(s.name)
        From sys.objects As o
        Inner Join sys.schemas As s 
            On s.schema_id = o.schema_id
        Where o.object_id = @objectID;
 
        Select @indexName = QuoteName(name)
        From sys.indexes 
        Where object_id = @objectID
            And index_id = @indexID
            And type > 0;
 
        /* Determine if the index is partitioned */
        Select @partitionCount = Count(*)
        From sys.partitions
        Where object_id = @objectID 
            And index_id = @indexID;
 
        /* Look for LOBs */
        Select Top 1 
            @containsLOB = column_id
        From sys.columns With (NoLock) 
        Where  
            [object_id] = @objectID
            And (system_type_id In (34, 35, 99) 
            -- 34 = image, 35 = text, 99 = ntext
                    Or max_length = -1); 
            -- varbinary(max), varchar(max), nvarchar(max), xml
 
        /* See if we should rebuild or reorganize; handle thusly */
        If @fragmentation < @rebuildThreshold And @partitionCount <= 1
            Set @sqlCommand = N'Alter Index ' + @indexName + N' On ' 
                + @schemaName + N'.' + @objectName + N' ReOrganize';
 
        If @fragmentation >= @rebuildThreshold 
            And IsNull(@containsLOB, 0) = 0 
                -- Cannot rebuild if the table has one or more LOB
            And @partitionCount <= 1
        Begin
 
            /* We should always rebuild online if possible 
                (SQL 2005 Enterprise) */
            If @onlineRebuild = 0 
                Set @rebuildCommand = N' Rebuild With 
                    (Online = Off, MaxDop = 1)';
            Else
                Set @rebuildCommand = N' Rebuild With 
                    (Online = On, MaxDop = 1)';
 
            Set @sqlCommand = N'Alter Index ' + @indexName + N' On ' 
                + @schemaName + N'.' + @objectName + @rebuildCommand;
        End;
 
        /* If our index is partitioned, we should always reorganize */
        If @partitionCount > 1
            Set @sqlCommand = N'Alter Index ' + @indexName + N' On ' 
                + @schemaName + N'.' + @objectName + N' ReOrganize'
                + N' Partition = ' 
                + Cast(@partitionNumber As nvarchar(10)); 
                -- no MaxDop needed, single threaded operation
 
        /* Are we executing the SQL?  If so, do it */
        If @executeSQL = 1
        Begin
 
            /* Grab the time for logging purposes */
            Set @dateTimeStart  = GetDate();
            Execute (@sqlCommand);
            Set @dateTimeEnd  = GetDate();
 
            /* Log our actions */
            Insert Into dbo.dba_indexDefragLog
            (
                  objectID
                , objectName
                , indexID
                , indexName
                , partitionNumber
                , fragmentation
                , dateTimeStart
                , durationSeconds
            )
            Select
                  @objectID
                , @objectName
                , @indexID
                , @indexName
                , @partitionNumber
                , @fragmentation
                , @dateTimeStart
                , DateDiff(second, @dateTimeStart, @dateTimeEnd);
 
            /* Just a little breather for the server */
            WaitFor Delay @defragDelay;
 
            /* Print if specified to do so */
            If @printCommands = 1
                Print N'Executed: ' + @sqlCommand;
        End
        Else
        /* Looks like we're not executing, just print 
            the commands */
        Begin
            If @printCommands = 1
                Print @sqlCommand;
        End
 
        /* Update our index defrag list when we've
            finished with that index */
        Update #indexDefragList
        Set defragStatus = 1
        Where objectID         = @objectID
          And indexID          = @indexID
          And partitionNumber  = @partitionNumber;
 
    End
 
    /* When everything is done, make sure to get rid of 
        our temp table */
    Drop Table #indexDefragList;
 
    Set NoCount Off;
    Return 0
End
Go
 
Set Quoted_Identifier Off 
Set ANSI_Nulls On
Go

20081117 UPDATE: Check out my Updated Index Defrag Script for SQL 2005 and 2008.

Tips for Large Data Stores

I’m currently working on projects that involve the collection of massive amounts of data (i.e. terabyte-class, billion-row tables). One of the challenges of collecting so much data is the ability to report on the data quickly.  To this end, I plan to spend the next several blog posts discussing tips for designing and managing large relational data stores.  These are primarily design practices that have performed well for me within the various environments in which I’ve worked.

Partitioning

• Partitioning is the segregation of a single logical table into separate, identical physical structures. Basically, every table is stored in at least one partition; “partitioned tables” are merely comprised of many partitions.

• Implemented correctly, partitioning can have dramatic improvement on read performance, index maintenance cost reduction, storage of large fact tables on separate disks, and data archiving.

• If you’re not familiar with partitioning, check out Kimberly Tripp’s excellent white paper, Partitioned Tables and Indexes in SQL Server 2005.

• I cannot rave enough about the impact partitioning has had within our environment. To take one of the most extreme examples of improvement, the execution time of one BI report decreased from 2.5+ hours to 20 minutes after we implemented partitioning. While not all stored procedures experienced such phenomenal improvement, improvement in the range of 30% – 60% was very common.

• While partitioning does have many benefits, there are some negatives. Namely, more expensive writes (I’ve noticed around 10%), increased duration of queries that span many partitions (i.e. queries on long time spans, such as a year), and increased maintenance needs.

• My rule of thumb is to partition any table with growth rates > 10mm records per week or with regular archiving needs.

• Much more to come on partitioning in the near future. :)

Indexing

• Any experienced DBA can tell you that effective index management is critical.

• Ensure your stored procedures are using index seeks instead of scans. If scans cannot be avoided, consider using partitioning with aligned indexes to limit the amount of pages scanned.

• Periodically look for un-used indexes and remove them to improve write performance.

• Look for missing indexes, create one or two, and evaluate. Try composite indexes and included columns as a way to limit the amount of indexes you need to create.

• Defrag your indexes! Nightly, if you can get away with it.

Stored Procedure Tuning

• When querying on large amounts of data, try breaking your queries up into individual components and storing in temporary tables, then performing the joins.

• When querying on large date ranges, try looping through the days and inserting the data into a temp table. This can be especially beneficial when querying partitioned tables.

• Create indexes on your temporary tables!

• Utilize index seeks whenever possible.

• If permitted to use dirty reads and your environment’s isolation level is not defaulted to read uncommitted, use With (NoLock) hints.

• Temporary tables can be partitioned! This is usually not necessary but can sometimes be beneficial.

• Try replacing table variables with temporary tables. My rule of thumb is this: if I’m performing joins on the temporary data or if I have more than 100 records, use a temporary table. Table variables can decrease stored procedure recompiles but can sometimes have a negative impact on performance. When in doubt, test each method and evaluate which works better for your needs.

• *Always* include the partitioning key when querying partitioned tables.