Max INT Identity Value Reached (DBCC CheckIdent)

One of my colleagues shared the following experience.

Recently, the identity column on one of his tables reached the maximum value allowable for an INT data type (2,147,483,647). Obviously, this caused all sorts of errors, as the application was unable to insert records into the table. This is a customer-facing table, so the issue needed to be resolved quickly. Converting the column to a BIGINT would require a significant amount of down-time. The solution? Reseed the identity column to -2,147,483,648 and have the values count upward toward zero. This took only seconds and the app was back up immediately. This is, of course, a temporary solution; it allows the team to schedule a more convenient time to resolve the issue.

Here’s how to do this:

DBCC CHECKIDENT ('tableName', RESEED, -2147483648);

Thanks, Peter, for sharing this tip!

Partitioning Example

The following code will walk you through the process of creating a partitioned table:

--------------------------------
-- Create A Partitioned Table --
--------------------------------
 
/* Create a partition function. */
Create Partition Function [test_monthlyDateRange_pf] (smalldatetime) 
    As Range Right For Values 
    ('2008-01-01', '2008-02-01', '2008-03-01');
Go
 
/* Associate the partition function with a partition scheme. */
Create Partition Scheme test_monthlyDateRange_ps 
    As Partition test_monthlyDateRange_pf 
    All To ([Primary]);
Go
 
/* Create your first partitioned table! 
   Make sure the data types match.  */
Create Table dbo.orders
(
      order_id  int Identity(1,1)   Not Null
    , orderDate smalldatetime       Not Null
 
    Constraint PK_orders Primary Key Clustered
    (
        orderDate
      , order_id
    )
) On test_monthlyDateRange_ps(orderDate);
Go
 
/* Create some records to play with. */
Insert Into dbo.orders
Select '2007-12-31' Union All
Select '2008-01-02' Union All
Select '2008-01-03' Union All
Select '2008-01-04' Union All
Select '2008-02-01' Union All
Select '2008-02-02' Union All
Select '2008-03-01' Union All
Select '2008-03-02';
 
/* The $partition function can be used to interrogate partition data.
   Let's use it to see where those records are physically located. */
Select $partition.test_monthlyDateRange_pf(orderDate) 
    As 'partition_number'
    , *
From dbo.orders;
 
/* By default, all new indexes are created on the partition.
   Let's create an aligned index */
Create NonClustered Index IX_orders_aligned
    On dbo.orders(order_id)
    On test_monthlyDateRange_ps(orderDate);
 
/* Now let's create an un-aligned index.
   We'll need to specify the filegroup. */
Create NonClustered Index IX_orders_nonaligned
    On dbo.orders(order_id)
    On [Primary]; -- can be any filegroup
 
/* Review your indexes */
Execute sp_helpindex orders;

Using the previous code as a building block, let’s try swapping partitions:

--------------------------
-- Swap Out A Partition --
--------------------------
 
/* We need to drop our un-aligned index; otherwise we'll 
   get an error when we attempt to do the switch. */
Drop Index IX_orders_nonaligned On dbo.orders;
 
/* Create the table to hold the data you're swapping out.
   The table structures must match identically; however, 
   DO NOT partition this table. */
Create Table dbo.orders_stage_swapOut
(
      order_id  int             Not Null
    , orderDate smalldatetime   Not Null
 
    Constraint PK_orders_stage_swapOut Primary Key Clustered
    (
        orderDate
      , order_id
    )
) On [Primary];
Go
 
/* Create the table to hold the data you're swapping in.
   The table structures must match identically; however,
   DO NOT partition this table. */
Create Table dbo.orders_stage_swapIn
(
      order_id  int             Not Null
    , orderDate smalldatetime   Not Null
 
    Constraint PK_orders_stage_swapIn Primary Key Clustered
    (
        orderDate
      , order_id
    )
) On [Primary];
Go
 
/* Populate the table you're swapping in. */
Insert Into dbo.orders_stage_swapIn
Select -5, '2008-02-02' Union All
Select -4, '2008-02-03' Union All
Select -3, '2008-02-04' Union All
Select -2, '2008-02-05' Union All
Select -1, '2008-02-06';
 
/* Create any indexes on your table to match the 
   indexes on your partitioned table. */
Create NonClustered Index IX_orders_stage_swapIn
    On dbo.orders_stage_swapIn(order_id);
 
/* Add a check constraint for the 
   partition to be swapped in. 
   This step is required. */
Alter Table dbo.orders_stage_swapIn
With Check
Add Constraint orders_stage_swapIn_orderDateCK
	Check (orderDate >= '2008-02-01' 
		And orderDate < '2008-03-01');
Go
 
/* Swap out the old partition. */
Alter Table dbo.orders 
    Switch Partition 3 To dbo.orders_stage_swapOut;
Go
 
/* Swap in the new partition. */
Alter Table dbo.orders_stage_swapIn
    Switch To dbo.orders Partition 3;
Go
 
---------------------
-- Check your data --
---------------------
 
/* You should have 2 records in here. */
Select * From dbo.orders_stage_swapOut;
 
/* You should have 5 records here. */
Select * 
From dbo.orders
Where orderDate >= '2008-02-01' 
And orderDate < '2008-03-01';
 
/* There should be no records in this table. */
Select * From dbo.orders_stage_swapIn;
 
Select $partition.test_monthlyDateRange_pf(orderDate) 
    As 'partition_number'
    , *
From dbo.orders;
 
/* Clean-up time!
Drop Table dbo.orders
Drop Table dbo.orders_stage_swapOut
Drop Table dbo.orders_stage_swapIn
Drop Partition Scheme test_monthlyDateRange_ps
Drop Partition Function [test_monthlyDateRange_pf]
*/

Pretty easy, huh? :)

Partitioning 101

A few posts ago, I briefly discussed partitioning. This article will continue that discussion and focus on horizontal partitioning within SQL Server 2005.

What is partitioning?

As I’ve mentioned before, horizontal partitioning is the physical segregation of a single logical table name into multiple, *identical* physical structures. In SQL 2005, all table and index pages actually exist on a partition. However, if you’ve never created a partitioning scheme, then by default each of your tables and indexes contain just a single partition.

I’ll also be talking about aligned indexes and tables, so let’s go ahead and define that here. Two tables sharing the same partitioning function and the same partitioning key are aligned. Similarly, a partitioned index on a partitioned table sharing the same partitioning key is called an aligned index. Having tables and indexes in alignment can reduce CPU and improve parallel plan execution.

Why partitioning?

The two most common reasons to implement partitioning are performance improvement and ease of archiving. Let’s talk about performance improvement first. We made the decision to implement partitioning in one of our data warehouses because of the size of the tables. We had a couple of tables that exceeded a billion rows, and several tables that were fast approaching that line. Upon deploying partitioning, CPU utilization dropped by ~25%, and our regular CPU and disk spikes caused by reporting processes were all but eliminated. Many of our stored procedures experienced reduced durations and reads in the range of 30-60%, and some had even greater improvements.

Why were we seeing this improvement? Many of the stored procedures were accessing data for recent dates. Knowing this helped us to decide on a weekly partitioning scheme. Stored procedures joining on aligned tables were now only accessing a small portion of the data, resulting in the aforementioned improvements. In addition, ad hocs that performed index scans now completed in a fraction of the time because the amount of data being scanned was significantly reduced.

Table maintenance also saw drastic improvement. It was previously unheard of to attempt an online index defrag on some of the largest tables without first scheduling down-time. Now, these same tables were being defragged nightly in one minute or less, without issue. This is because index defrag operations can be performed by partition.  This is a great resource save; obviously, once a partition is no longer written to and is fully defragmented, it will not become re-fragmented, so there is no need to waste resources on it. Refer to my Index Defrag Script for an example on how to automate this process.

Another great feature of partitioned tables is the ability to swap partitions in and out. I took advantage of this feature with great success when backfilling data. My backfill strategy primarily consisted of bcp’ing data in weekly chunks to a staging database. The table I was backfilling had a text column where one record happened to contain the column deliminator for the bcp file. This of course wasn’t caught until auditing data counts prior to deployment, when it was noticed that one week was short about ~2mm records. Now, this was ~2mm records out of ~800mm total records. The solution? Re-run the bcp process, using a different deliminator, dump the data in a staging table, and swap out the partition. The actual swap out took 2 seconds and there was no impact to the server. It was beautiful.

Swapping out partitions is particularly beneficial for archiving. Instead of writing a batch process to loop through and delete expired data, which is both time consuming and resource intensive, a partition can be swapped out and dropped in a matter of seconds. Keep in mind that, in SQL Server 2005, partitions can only be swapped out on non-replicated tables. SQL 2008 introduces functionality to allow partition swapping on a replicated table.  Check out Replicating Partitioned Tables and Indexes.

That sounds great! Where do I sign up?

There are a few things to consider prior to implementing partitioning in your environment. One is the performance hit to writes. I’ve noticed around a 10% decrease in write performance in my environment. This is partly due to the overhead required to look up which partition the record should be written to. For this reason, I only partition tables that I expect to grow quickly (my rule of thumb is >10mm records per week), or tables that have regular archiving needs.

There’s also a performance hit when querying over many partitions. Basically, think of a giant UNION ALL statement that occurs every time more than one partition is accessed. With our production hardware, we noticed that duration was still less until you hit around 30 partitions. After that, the duration was greater for partitioned tables versus non-partitioned, but reads were still less overall. This will probably vary depending on your environment, so run some tests and make sure you know what kind of impact to expect. This will also help you to determine your partitioning scheme. For example, you wouldn’t want to partition on a date column using a weekly scheme if every report needs to access data for the previous 12 months.

Also make sure you have a good candidate for partitioning. That is, do not partition on a column that is frequently changed or is written non-sequentially. Both can cause overhead to your environment and may negate the benefits of partitioning. Keep in mind that when the value of the partitioning key is changed, the record is not relocated to the correct partition. Some ideal keys include surrogate identity and sequential datetime columns.

Other things to know…

• The partitioning function requires the declaration of a parameter data type. To partition a table, the partitioning key is defined during new table creation. The data types of the partitioning keys must match. For example, if you create a partitioning function using a datetime data type and try to apply it to a table with a smalldatetime column as the partitioning key, the CREATE TABLE statement will fail.

• Ensure that the partitioning key is *always* used in a query on a partitioned table. If the partitioning key is not included in the query search criteria, then the partitioned table will perform worse than a non-partitioned table.

• By default, any index created on a partitioned table is also partitioned. This is useful for creating aligned indexes, which are typically desired. Sometimes, however, you may want to create a non-aligned or non-partitioned index. In these cases, make sure to explicitly declare the desired filegroup during the index creation process.

• Temporary tables can also be partitioned; however, this would probably not make sense in most circumstances.

• Partitioning functions need to be monitored to ensure you don’t run out of defined ranges; if that were to happen, you’d be storing all new data in one really large partition, which wouldn’t be pleasant.

Thanks for wading through this long post. I hope the information contained proves informative and helpful. I’ll be following this post with some actual code to help illustrate some of what I’ve covered here.

Regards,

Michelle

Index Interrogation Script

As a continuation of the last post on indexes, here’s a script I wrote for index interrogation.  This stored procedure was the result of repeatedly having to lookup information on included columns.

If ObjectProperty(Object_ID('dbo.dba_indexLookup_sp'), 
        N'IsProcedure') Is Null
Begin
    Execute ('Create Procedure dbo.dba_indexLookup_sp 
        As Print ''Hello World!''')
    RaisError('Procedure dbo.dba_indexLookup_sp created.', 10, 1);
End;
Go
 
Set ANSI_Nulls On;
Set Ansi_Padding On;
Set Ansi_Warnings On;
Set ArithAbort On;
Set Concat_Null_Yields_Null On;
Set NoCount On;
Set Numeric_RoundAbort Off;
Set Quoted_Identifier On;
Go
 
Alter Procedure dbo.dba_indexLookup_sp
 
        /* Declare Parameters */
        @tableName  varchar(128)  =  Null
As
/***********************************************************************
    Name:       dba_indexLookup_sp
 
    Author:     Michelle F. Ufford
 
    Purpose:    Retrieves index information for the specified table.
 
    Notes:      If the tableName is left null, it will return index 
                information for all tables and indexes.
 
    Called by:  DBA
 
    Date        User   Description
    --------------------------------------------------------------------
    2008-10-28  MFU    Initial Release
************************************************************************
    Exec dbo.dba_indexLookup_sp
        @tableName = 'myTableName';
***********************************************************************/
 
Set NoCount On;
Set XACT_Abort On;
 
Begin
 
    Declare @objectID int;
 
    If @tableName Is Not Null
        Set @objectID = Object_ID(@tableName);
 
    With indexCTE(partition_scheme_name
                , partition_function_name
                , data_space_id)
    As (
        Select sps.name
            , spf.name
            , sps.data_space_id
        From sys.partition_schemes As sps
        Join sys.partition_functions As spf
            On sps.function_id = spf.function_id
    )
 
    Select st.name As 'table_name'
        , IsNull(ix.name, '') As 'index_name'
        , ix.object_id
        , ix.index_id
		, Cast(
            Case When ix.index_id = 1 
                    Then 'clustered' 
                When ix.index_id =0
                    Then 'heap'
                Else 'nonclustered' End
			+ Case When ix.ignore_dup_key <> 0 
                Then ', ignore duplicate keys' 
                    Else '' End
			+ Case When ix.is_unique <> 0 
                Then ', unique' 
                    Else '' End
			+ Case When ix.is_primary_key <> 0 
                Then ', primary key' Else '' End As varchar(210)
            ) As 'index_description'
        , IsNull(Replace( Replace( Replace(
            (   
                Select c.name As 'columnName'
                From sys.index_columns As sic
                Join sys.columns As c 
                    On c.column_id = sic.column_id 
                    And c.object_id = sic.object_id
                Where sic.object_id = ix.object_id
                    And sic.index_id = ix.index_id
                    And is_included_column = 0
                Order By sic.index_column_id
                For XML Raw)
                , '"/><row columnName="', ', ')
                , '<row columnName="', '')
                , '"/>', ''), '')
            As 'indexed_columns'
        , IsNull(Replace( Replace( Replace(
            (   
                Select c.name As 'columnName'
                From sys.index_columns As sic
                Join sys.columns As c 
                    On c.column_id = sic.column_id 
                    And c.object_id = sic.object_id
                Where sic.object_id = ix.object_id
                    And sic.index_id = ix.index_id
                    And is_included_column = 1
                Order By sic.index_column_id
                For XML Raw)
                , '"/><row columnName="', ', ')
                , '<row columnName="', '')
                , '"/>', ''), '')
            As 'included_columns'
        , IsNull(cte.partition_scheme_name, '') 
            As 'partition_scheme_name'
        , Count(partition_number) As 'partition_count'
        , Sum(rows) As 'row_count'
    From sys.indexes As ix
    Join sys.partitions As sp
        On ix.object_id = sp.object_id
        And ix.index_id = sp.index_id
    Join sys.tables As st
        On ix.object_id = st.object_id
    Left Join indexCTE As cte
        On ix.data_space_id = cte.data_space_id
    Where ix.object_id = IsNull(@objectID, ix.object_id)
    Group By st.name
        , IsNull(ix.name, '')
        , ix.object_id
        , ix.index_id
		, Cast(
            Case When ix.index_id = 1 
                    Then 'clustered' 
                When ix.index_id =0
                    Then 'heap'
                Else 'nonclustered' End
			+ Case When ix.ignore_dup_key <> 0 
                Then ', ignore duplicate keys' 
                    Else '' End
			+ Case When ix.is_unique <> 0 
                Then ', unique' 
                    Else '' End
			+ Case When ix.is_primary_key <> 0 
                Then ', primary key' Else '' End As varchar(210)
            )
        , IsNull(cte.partition_scheme_name, '')
        , IsNull(cte.partition_function_name, '')
    Order By table_name
        , index_id;
 
    Set NoCount Off;
    Return 0;
End
Go

Index Defrag Script

UPDATE: This script has been significantly updated. Please view the latest version at http://sqlfool.com/2011/06/index-defrag-script-v4-1/.

Previously I posted that it’s important to keep your indexes in shape with weekly or (preferably) nightly defrags. Below is a script I use to execute nightly defrags in SQL 2005 Enterprise. I can’t claim complete credit… I believe this is a script I originally found on Microsoft and heavily modified to meet my needs.

You may want to modify the script if you’re doing much with XML or LOB’s. Also keep in mind that this is hitting the sys.dm_db_index_physical_stats table to view fragmentation information, which can be resource intensive.

If you’re not familiar with index defragmentation, check out “Alter Index” on Books Online.

If Not Exists(Select object_id From sys.tables 
                Where [name] = N'dba_indexDefragLog')
Begin
 
    Create Table dbo.dba_indexDefragLog
    (
          indexDefrag_id    int identity(1,1)   not null
        , objectID          int                 not null
        , objectName        nvarchar(130)       not null
        , indexID           int                 not null
        , indexName         nvarchar(130)       not null
        , partitionNumber   smallint            not null
        , fragmentation     float               not null
        , dateTimeStart     datetime            not null
        , durationSeconds   int                 not null
        Constraint PK_indexDefragLog 
            Primary Key Clustered (indexDefrag_id)
    );
 
    Print 'dba_indexDefragLog Table Created';
End
 
If ObjectProperty(Object_ID('dbo.dba_indexDefrag_sp'), 
        N'IsProcedure') Is Null
Begin
    Execute ('Create Procedure dbo.dba_indexDefrag_sp As 
                Print ''Hello World!''')
    RaisError('Procedure dba_indexDefrag_sp created.', 10, 1);
End;
Go
 
Set ANSI_Nulls On;
Set Ansi_Padding On;
Set Ansi_Warnings On;
Set ArithAbort On;
Set Concat_Null_Yields_Null On;
Set NoCount On;
Set Numeric_RoundAbort Off;
Set Quoted_Identifier On;
Go
 
Alter Procedure dbo.dba_indexDefrag_sp
 
    /* Declare Parameters */
      @minFragmentation     float           = 10.0  
        /* in percent, will not defrag if fragmentation 
           less than specified */
    , @rebuildThreshold     float           = 30.0  
        /* in percent, greater than @rebuildThreshold 
           will result in rebuild instead of reorg */
    , @onlineRebuild        bit             = 1     
        /* 1 = online rebuild; 0 = offline rebuild */
    , @executeSQL           bit             = 1     
        /* 1 = execute; 0 = print command only */
    , @tableName            varchar(4000)   = Null  
        /* Option to specify a table name */
    , @printCommands        bit             = 0     
        /* 1 = print commands; 0 = do not print commands */
    , @defragDelay          char(8)         = '00:00:05'
        /* time to wait between defrag commands */
As
/***********************************************************************
    Name:       dba_indexDefrag_sp
 
    Author:     Michelle F. Ufford
 
    Purpose:    Defrags all indexes for the current database
 
    Notes:
 
    CAUTION: Monitor transaction log when executing for the first time!
 
      @minFragmentation     defaulted to 10%, will not defrag if 
                            fragmentation if less than specified 
 
      @rebuildThreshold     defaulted to 30% as recommended by 
                            Microsoft in BOL;
                            > than 30% will result in rebuild instead
 
      @onlineRebuild        1 = online rebuild; 
                            0 = offline rebuild
 
      @executeSQL           1 = execute the SQL generated by this proc; 
                            0 = print command only
 
      @tableName            Specify if you only want to defrag indexes 
                            for a specific table
 
      @printCommands        1 = print commands to screen; 
                            0 = do not print commands
 
      @defragDelay          time to wait between defrag commands; 
                            gives the server a little time to catch up 
 
    Called by:  SQL Agent Job or DBA
 
    Date        Initials	Description
    -------------------------------------------------------------------
    2008-10-27  MFU         Initial Release
***********************************************************************
    Exec dbo.dba_indexDefrag_sp
	      @executeSQL    = 0
        , @printCommands = 1;
***********************************************************************/
 
Set NoCount On;
Set XACT_Abort On;
 
Begin
 
    /* Declare our variables */
    Declare   @objectID         int
            , @indexID          int
            , @partitionCount   bigint
            , @schemaName       nvarchar(130)
            , @objectName       nvarchar(130)
            , @indexName        nvarchar(130)
            , @partitionNumber  smallint
            , @partitions       smallint
            , @fragmentation    float
            , @sqlCommand       nvarchar(4000)
            , @rebuildCommand   nvarchar(200)
            , @dateTimeStart    datetime
            , @dateTimeEnd      datetime
            , @containsLOB      bit;
 
    /* Just a little validation... */
    If @minFragmentation Not Between 0.00 And 100.0
        Set @minFragmentation = 10.0;
 
    If @rebuildThreshold Not Between 0.00 And 100.0
        Set @rebuildThreshold = 30.0;
 
    If @defragDelay Not Like '00:[0-5][0-9]:[0-5][0-9]'
        Set @defragDelay = '00:00:05';
 
    /* Determine which indexes to defrag using our 
       user-defined parameters */
    Select
          Object_ID AS objectID
        , index_id AS indexID
        , partition_number AS partitionNumber
        , avg_fragmentation_in_percent AS fragmentation
        , 0 As 'defragStatus' 
            /* 0 = unprocessed, 1 = processed */
    Into #indexDefragList
    From sys.dm_db_index_physical_stats 
        (DB_ID(), Object_Id(@tableName), NULL , NULL, N'Limited')
    Where avg_fragmentation_in_percent > @minFragmentation 
        And index_id > 0
    Option (MaxDop 1);
 
    /* Create a clustered index to boost performance a little */
    Create Clustered Index CIX_temp_indexDefragList
        On #indexDefragList(objectID, indexID, partitionNumber);
 
    /* Begin our loop for defragging */
    While (Select Count(*) From #indexDefragList 
            Where defragStatus = 0) > 0
    Begin
 
        /* Grab the most fragmented index first to defrag */
        Select Top 1 
              @objectID         = objectID
            , @fragmentation    = fragmentation
            , @indexID          = indexID
            , @partitionNumber  = partitionNumber
        From #indexDefragList
        Where defragStatus = 0
        Order By fragmentation Desc;
 
        /* Look up index information */
        Select @objectName = QuoteName(o.name)
             , @schemaName = QuoteName(s.name)
        From sys.objects As o
        Inner Join sys.schemas As s 
            On s.schema_id = o.schema_id
        Where o.object_id = @objectID;
 
        Select @indexName = QuoteName(name)
        From sys.indexes 
        Where object_id = @objectID
            And index_id = @indexID
            And type > 0;
 
        /* Determine if the index is partitioned */
        Select @partitionCount = Count(*)
        From sys.partitions
        Where object_id = @objectID 
            And index_id = @indexID;
 
        /* Look for LOBs */
        Select Top 1 
            @containsLOB = column_id
        From sys.columns With (NoLock) 
        Where  
            [object_id] = @objectID
            And (system_type_id In (34, 35, 99) 
            -- 34 = image, 35 = text, 99 = ntext
                    Or max_length = -1); 
            -- varbinary(max), varchar(max), nvarchar(max), xml
 
        /* See if we should rebuild or reorganize; handle thusly */
        If @fragmentation < @rebuildThreshold And @partitionCount <= 1
            Set @sqlCommand = N'Alter Index ' + @indexName + N' On ' 
                + @schemaName + N'.' + @objectName + N' ReOrganize';
 
        If @fragmentation >= @rebuildThreshold 
            And IsNull(@containsLOB, 0) = 0 
                -- Cannot rebuild if the table has one or more LOB
            And @partitionCount <= 1
        Begin
 
            /* We should always rebuild online if possible 
                (SQL 2005 Enterprise) */
            If @onlineRebuild = 0 
                Set @rebuildCommand = N' Rebuild With 
                    (Online = Off, MaxDop = 1)';
            Else
                Set @rebuildCommand = N' Rebuild With 
                    (Online = On, MaxDop = 1)';
 
            Set @sqlCommand = N'Alter Index ' + @indexName + N' On ' 
                + @schemaName + N'.' + @objectName + @rebuildCommand;
        End;
 
        /* If our index is partitioned, we should always reorganize */
        If @partitionCount > 1
            Set @sqlCommand = N'Alter Index ' + @indexName + N' On ' 
                + @schemaName + N'.' + @objectName + N' ReOrganize'
                + N' Partition = ' 
                + Cast(@partitionNumber As nvarchar(10)); 
                -- no MaxDop needed, single threaded operation
 
        /* Are we executing the SQL?  If so, do it */
        If @executeSQL = 1
        Begin
 
            /* Grab the time for logging purposes */
            Set @dateTimeStart  = GetDate();
            Execute (@sqlCommand);
            Set @dateTimeEnd  = GetDate();
 
            /* Log our actions */
            Insert Into dbo.dba_indexDefragLog
            (
                  objectID
                , objectName
                , indexID
                , indexName
                , partitionNumber
                , fragmentation
                , dateTimeStart
                , durationSeconds
            )
            Select
                  @objectID
                , @objectName
                , @indexID
                , @indexName
                , @partitionNumber
                , @fragmentation
                , @dateTimeStart
                , DateDiff(second, @dateTimeStart, @dateTimeEnd);
 
            /* Just a little breather for the server */
            WaitFor Delay @defragDelay;
 
            /* Print if specified to do so */
            If @printCommands = 1
                Print N'Executed: ' + @sqlCommand;
        End
        Else
        /* Looks like we're not executing, just print 
            the commands */
        Begin
            If @printCommands = 1
                Print @sqlCommand;
        End
 
        /* Update our index defrag list when we've
            finished with that index */
        Update #indexDefragList
        Set defragStatus = 1
        Where objectID         = @objectID
          And indexID          = @indexID
          And partitionNumber  = @partitionNumber;
 
    End
 
    /* When everything is done, make sure to get rid of 
        our temp table */
    Drop Table #indexDefragList;
 
    Set NoCount Off;
    Return 0
End
Go
 
Set Quoted_Identifier Off 
Set ANSI_Nulls On
Go

20081117 UPDATE: Check out my Updated Index Defrag Script for SQL 2005 and 2008.

Tips for Large Data Stores

I’m currently working on projects that involve the collection of massive amounts of data (i.e. terabyte-class, billion-row tables). One of the challenges of collecting so much data is the ability to report on the data quickly.  To this end, I plan to spend the next several blog posts discussing tips for designing and managing large relational data stores.  These are primarily design practices that have performed well for me within the various environments in which I’ve worked.

Partitioning

• Partitioning is the segregation of a single logical table into separate, identical physical structures. Basically, every table is stored in at least one partition; “partitioned tables” are merely comprised of many partitions.

• Implemented correctly, partitioning can have dramatic improvement on read performance, index maintenance cost reduction, storage of large fact tables on separate disks, and data archiving.

• If you’re not familiar with partitioning, check out Kimberly Tripp’s excellent white paper, Partitioned Tables and Indexes in SQL Server 2005.

• I cannot rave enough about the impact partitioning has had within our environment. To take one of the most extreme examples of improvement, the execution time of one BI report decreased from 2.5+ hours to 20 minutes after we implemented partitioning. While not all stored procedures experienced such phenomenal improvement, improvement in the range of 30% – 60% was very common.

• While partitioning does have many benefits, there are some negatives. Namely, more expensive writes (I’ve noticed around 10%), increased duration of queries that span many partitions (i.e. queries on long time spans, such as a year), and increased maintenance needs.

• My rule of thumb is to partition any table with growth rates > 10mm records per week or with regular archiving needs.

• Much more to come on partitioning in the near future. :)

Indexing

• Any experienced DBA can tell you that effective index management is critical.

• Ensure your stored procedures are using index seeks instead of scans. If scans cannot be avoided, consider using partitioning with aligned indexes to limit the amount of pages scanned.

• Periodically look for un-used indexes and remove them to improve write performance.

• Look for missing indexes, create one or two, and evaluate. Try composite indexes and included columns as a way to limit the amount of indexes you need to create.

• Defrag your indexes! Nightly, if you can get away with it.

Stored Procedure Tuning

• When querying on large amounts of data, try breaking your queries up into individual components and storing in temporary tables, then performing the joins.

• When querying on large date ranges, try looping through the days and inserting the data into a temp table. This can be especially beneficial when querying partitioned tables.

• Create indexes on your temporary tables!

• Utilize index seeks whenever possible.

• If permitted to use dirty reads and your environment’s isolation level is not defaulted to read uncommitted, use With (NoLock) hints.

• Temporary tables can be partitioned! This is usually not necessary but can sometimes be beneficial.

• Try replacing table variables with temporary tables. My rule of thumb is this: if I’m performing joins on the temporary data or if I have more than 100 records, use a temporary table. Table variables can decrease stored procedure recompiles but can sometimes have a negative impact on performance. When in doubt, test each method and evaluate which works better for your needs.

• *Always* include the partitioning key when querying partitioned tables.

Scientific Notation with IsNumeric()

Be careful when using the built-in IsNumeric() function. When presented with a character string formatted as “number+D+number” or “number+E+number”, the function will return a value of 1 (true). This is because SQL Server evaluates [number][d,e][number] as scientific notation.

For example,

[cc lang="tsql"]
Declare @dNotation char(10)
, @eNotation char(10)
, @standardText char(5);

Select @dNotation = ’1d1′
, @eNotation = ’1e1′
, @standardText = ‘abcde’;

Select IsNumeric(@dNotation) As ‘dNotation’
, IsNumeric(@eNotation) As ‘eNotation’
, IsNumeric(@standardText) As ‘standardText’;
[/cc]

… will return:

dNotation   eNotation   standardText
————————————————-
1           1            0

I’ve seen a couple of different ways to approach this, and each have their merits. The most easy and elegant solution I’ve seen came from Dave C. — PatIndex(‘%[^0123456789]%’, columnName):

[cc lang="tsql"]
Select PatIndex(‘%[^0123456789]%’, @dNotation)
, PatIndex(‘%[^0123456789]%’, @eNotation)
, PatIndex(‘%[^0123456789]%’, @standardText);
[/cc]

I’ve also seen Replace() and user-defined functions used with great effect.

Thanks to Dave C. for sharing his solution! :)

HTH!

Michelle

Permissions Script

Let’s get this blog thing started!  :)

I sometimes find that appropriate permissions are not always assigned to stored procedures during creation.  This usually occurs when a proc is scripted and re-deployed in a hurry, or when a non-DBA decides to play in my sandbox.  I use this rather simple script to quickly find and grant all missing permissions.

Declare @schema_owner varchar(20);
Set @schema_owner = 'dbo';
 
Select [name] As 'storedProcedure', 'Grant Execute On '
    + @schema_owner + '.' + [name]
    + ' To [insertDatabaseRoleHere];' As 'sqlCode'
From sys.objects With (NoLock)
Where [name] Not In (
    Select o.name
    From sys.database_permissions p With (NoLock)
    Inner Join sys.objects o With (NoLock)
        On p.major_id = o.object_id
    Inner Join sys.database_principals u With (NoLock)
        On u.principal_id = p.grantee_principal_id
    Where u.name = '[insertDatabaseRoleHere]'
)
And [type] = 'P';

This is especially useful for when you have a common database role that all stored procedures need to belong to; for example, infrastructures that share role membership to allow applications to execute procs.