Large Updates on Replicated Tables

Late last night, I executed a 70mm update on a replicated table. This was a somewhat low priority update, so the primary goal (aside from the data change) was to avoid any issues on the publishing and subscribing servers, including replication latency errors. I have performed many large updates in the past, but this was the first on a replicated table.

To minimize impact, one of our system DBA’s suggested the use of a replicated stored procedure. Basically, instead of replicating each of the 70mm updates as a singleton transaction, the replicated stored procedure is called on the subscribing server, which then performs the bulk update locally. This was my first time using it and it worked beautifully.

Another of my colleagues, Jeff M., suggested the use of a control table. Normally, I would just output the last affected ID and update the script manually. However, this also worked so well that it will be adopted in all future update scripts.

Using the following pseudo-script and a replicated stored procedure, I was able to execute the update on 70mm records in 11.5 hours with *zero* impact on the servers and replication. I could’ve probably increased the batch size and reduced the execution time even further, but as I mentioned, this was a low priority update, so there was no need to push it.

This process should work equally well on non-replicated updates; merely replace the Execute statement with the actual update.

/************ Chunked Update Script with Control Table ************/
 
/* --------------------- Preparation Script --------------------- */
 
/* Note: Execute this section in a separate query window */
 
Use yourDatabase;
Go
 
Set NoCount On;
 
/* Create a control table to facilitate tweaking of parameters */
Create Table dbo.scratch_largeUpdate_control
(
      sizeOfBatch       int
    , waitForDelay      char(8)
    , minRecordToUpdate int 
    , maxRecordToUpdate int
);
 
/* Create your control data; you only want 1 row in this table */
Insert Into dbo.scratch_largeUpdate_control 
(sizeOfBatch, waitForDelay, minRecordToUpdate, maxRecordToUpdate)
Select 10000, '00:00:05', 40297132, 107459380;
 
/* Update Script */
Update dbo.scratch_largeUpdate_control
Set sizeOfBatch  = 100000
  , waitForDelay = '00:00:30';
 
 
/* ------------------------ Update Script ------------------------ */
Use yourDatabase;
Go
 
Set NoCount On;
 
Declare 
      @batchSize        int
    , @minID            int
    , @maxID            int
    , @procMinID        int
    , @procMaxID        int
    , @delay            char(8)
    , @statusMsg        varchar(1000);
 
Begin Try
 
    IF @@SERVERNAME Not In ('PRODSERVER')
        RAISERROR('Sorry, this cannot be executed here!', 16, 1);
 
    If Not Exists(
            Select object_id 
            From sys.objects 
            Where [name] = 'scratch_largeUpdate_control' 
            And type = 'U' )
        RaisError ('ERROR: Control table does not exist!', 16, 1)
            With NoWait;
    Else
        Select 
              @minID        = minRecordToUpdate 
            , @maxID        = maxRecordToUpdate 
            , @batchSize    = sizeOfBatch
            , @delay        = waitForDelay
        From dbo.scratch_largeUpdate_control With (NoLock);
 
    Set @statusMsg = 'Beginning update; batch size of ' 
        + Cast(@batchSize As varchar(10)) + ', delay of ' 
        + @delay + ' defined.  Estimate ' 
        + Cast((((@maxID - @minID) / @batchSize) + 1) As varchar(10)) 
        + ' iterations to be performed.'
 
    RaisError (@statusMsg, 10, 1) With NoWait;
 
    While @minID < @maxID
    Begin
 
        Select @procMinID = @minID
            , @procMaxID = (@minID + (@batchSize - 1));
 
        /* Execute actual update code here 
           OR 
           Call a replicated stored procedure, i.e. */
        Execute dbo.myReplicatedUpdateProc 
              @minRecordID = @procMinID
            , @maxRecordID = @procMaxID;
 
        Set @statusMsg = 
            'Updating records ' + Cast(@minID As varchar(10)) 
            + ' through ' + Cast((@minID + (@batchSize - 1)) 
            As varchar(10)) + '...';
 
        RaisError (@statusMsg, 10, 1) With NoWait;
 
        /* Update our control table with the last successfully
           updated record ID.  In the event of an error,
           we can start from here. */
        Update dbo.scratch_largeUpdate_control 
        Set minRecordToUpdate = @minID + @batchSize;
 
        Select @minID = @minID + @batchSize; 
        WaitFor Delay @delay; -- breather for the server
 
        /* Check to see if our control values have changed */
        If Not Exists(
            Select * 
            From dbo.scratch_largeUpdate_control With (NoLock) 
            Where @batchSize = sizeOfBatch And @delay = waitForDelay)
        Begin
 
            /* There was a change, so grab our new values */
            Select @batchSize = sizeOfBatch
                 , @delay = waitForDelay
            From dbo.scratch_largeUpdate_control With (NoLock)
 
            /* Print a status message with the new values */
            Set @statusMsg = 'Parameters changed:  batch size = ' 
                + Cast(@batchSize As varchar(10)) 
                + ', delay = ' + @delay;
 
            RaisError (@statusMsg, 10, 1) With NoWait;
 
        End
    End
 
    RaisError ('Success!', 10, 1) With NoWait;
 
End Try
/* Handle your errors */
Begin Catch
 
        Set @statusMsg = 'An error has occurred and the last '
                         + 'transaction has been rolled back. '
                         + 'Last record successfully updated was '
                         + 'record_id = ' 
                         + Cast((@minID + (@batchSize - 1)) 
                            As varchar(10));
 
        RaisError (@statusMsg, 16, 1) With NoWait;
 
        /* Return the error message */
        Select Error_Number()
            , Error_Procedure()
		    , DB_Name()
		    , Error_Line()
		    , Error_Message()
		    , Error_Severity()
		    , Error_State();
 
End Catch;
 
/* -------------------------- Clean-Up ----------------------------
Drop Table dbo.scratch_largeUpdate_control;
----------------------------------------------------------------- */

Row Concatenation in T-SQL

Have you ever needed to create a comma-delimited list of related records, grouped by a parent record? This is one of those business requirements that just happens to pop up every so often. For example, turning this:

AuthorID    BookName
1 “Great Expectations”
1 “David Copperfield”
1 “Bleak House”
2 “Divine Comedy”
3 “Pride and Prejudice”
3 “Mansfield Park”
   

into this:

AuthorID    ListOfBooks
1 “Great Expectations”, “David Copperfield”, “Bleak House”
2 “Divine Comedy”
3 “Pride and Prejudice”, “Mansfield Park”
   

There’s a handful of ways you can handle this, especially when dealing with small data sets. However, the problem becomes a bit more tricky when dealing with large record sets (i.e. hundreds of thousands or even millions of records). My first attempt at a solution used a CTE (common table expression). It did the job but was very slow. Looking for a better solution, I discovered the XML method.

To give you a quick example:

/* Create a table variable to play with */
Declare @myTable Table 
    (customerID int, textData varchar(10));
 
/* Populate some test rescords */
Insert Into @myTable
Select 1, 'abc' Union All
Select 1, 'def' Union All
Select 2, 'uvw' Union All
Select 2, 'xyz'
 
/* Just take a look at the data
   before we format it */
Select * From @myTable;
 
/* Let's take a look at what
   For XML Raw will return 
   for us */
Select textData
From @myTable
Where customerID = 1
Order By textData
For XML Raw;
 
/* Now consolidate the data, using
   the For XML Raw option to 
   concatenate the textData column */
Select customerID
        , Replace( Replace( Replace(
            (   Select textData
                From @myTable As a
                Where a.customerID = b.customerID
                Order By textData
                For XML Raw)
                , '"/><row textData="', ', ')
                , '<row textData="', '')
                , '"/>', '')
            As 'textData'
From @myTable b
Group By customerID;

This has become my default method for handling this report requirement. While discussing this with a colleague, he mentioned using an approach that was similar in design but used Cross Apply on the XML. Wanting to see which performed better, I ran the two following queries in the AdventureWorks sample database (2008):

/* Method 1 */
Select ProductsOrdered
    , Count(*) As 'salesOrders'
From (
    Select SalesOrderID
            , Replace( Replace( Replace(
                (   Select Top 10 ProductID
                    From Sales.SalesOrderDetail As sod With (NoLock)
                    Where soh.SalesOrderID = sod.SalesOrderID
                    Order By ProductID
                    For XML Raw)
                    , '"/><row ProductID="', ', ')
                    , '<row ProductID="', '')
                    , '"/>', '')
                As 'ProductsOrdered'
    From Sales.SalesOrderHeader As soh With (NoLock)
) x
Group By ProductsOrdered
Order By Count(*) Desc
Option (MaxDop 1);
 
/* Method 2 */
Select ProductsOrdered
    , Count(*) As 'salesOrders'
From (
    Select SalesOrderID
        , SubString(ProductsOrdered, 1, Len(ProductsOrdered) - 1) 
              As 'ProductsOrdered'
    From Sales.SalesOrderHeader As soh With (NoLock)
    Cross Apply (
                    Select Top 10 
                        Cast(ProductID As varchar(10)) + ', '
                    From Sales.SalesOrderDetail As sod With (NoLock)
                    Where sod.SalesOrderID = soh.SalesOrderID
                    Order By ProductID
                    For XML Path('')
                ) X(ProductsOrdered)
) x
Group By ProductsOrdered
Order By Count(*) Desc;

Here’s the results: (click to enlarge)

Row Concatenation Comparison

Row Concatenation Comparison

As evidenced in the image above, the first method has slightly higher CPU and double the duration. The 2nd method had almost double the writes and significantly more reads.

I was hoping for a clear winner, but apparently each method has its benefits. I’ll probably continue to stick with my original, more resource-friendly method (Method 1), unless someone suggests a better solution. :)

Max INT Identity Value Reached (DBCC CheckIdent)

One of my colleagues shared the following experience.

Recently, the identity column on one of his tables reached the maximum value allowable for an INT data type (2,147,483,647). Obviously, this caused all sorts of errors, as the application was unable to insert records into the table. This is a customer-facing table, so the issue needed to be resolved quickly. Converting the column to a BIGINT would require a significant amount of down-time. The solution? Reseed the identity column to -2,147,483,648 and have the values count upward toward zero. This took only seconds and the app was back up immediately. This is, of course, a temporary solution; it allows the team to schedule a more convenient time to resolve the issue.

Here’s how to do this:

DBCC CHECKIDENT ('tableName', RESEED, -2147483648);

Thanks, Peter, for sharing this tip!

Partitioning Example

The following code will walk you through the process of creating a partitioned table:

--------------------------------
-- Create A Partitioned Table --
--------------------------------
 
/* Create a partition function. */
Create Partition Function [test_monthlyDateRange_pf] (smalldatetime) 
    As Range Right For Values 
    ('2008-01-01', '2008-02-01', '2008-03-01');
Go
 
/* Associate the partition function with a partition scheme. */
Create Partition Scheme test_monthlyDateRange_ps 
    As Partition test_monthlyDateRange_pf 
    All To ([Primary]);
Go
 
/* Create your first partitioned table! 
   Make sure the data types match.  */
Create Table dbo.orders
(
      order_id  int Identity(1,1)   Not Null
    , orderDate smalldatetime       Not Null
 
    Constraint PK_orders Primary Key Clustered
    (
        orderDate
      , order_id
    )
) On test_monthlyDateRange_ps(orderDate);
Go
 
/* Create some records to play with. */
Insert Into dbo.orders
Select '2007-12-31' Union All
Select '2008-01-02' Union All
Select '2008-01-03' Union All
Select '2008-01-04' Union All
Select '2008-02-01' Union All
Select '2008-02-02' Union All
Select '2008-03-01' Union All
Select '2008-03-02';
 
/* The $partition function can be used to interrogate partition data.
   Let's use it to see where those records are physically located. */
Select $partition.test_monthlyDateRange_pf(orderDate) 
    As 'partition_number'
    , *
From dbo.orders;
 
/* By default, all new indexes are created on the partition.
   Let's create an aligned index */
Create NonClustered Index IX_orders_aligned
    On dbo.orders(order_id)
    On test_monthlyDateRange_ps(orderDate);
 
/* Now let's create an un-aligned index.
   We'll need to specify the filegroup. */
Create NonClustered Index IX_orders_nonaligned
    On dbo.orders(order_id)
    On [Primary]; -- can be any filegroup
 
/* Review your indexes */
Execute sp_helpindex orders;

Using the previous code as a building block, let’s try swapping partitions:

--------------------------
-- Swap Out A Partition --
--------------------------
 
/* We need to drop our un-aligned index; otherwise we'll 
   get an error when we attempt to do the switch. */
Drop Index IX_orders_nonaligned On dbo.orders;
 
/* Create the table to hold the data you're swapping out.
   The table structures must match identically; however, 
   DO NOT partition this table. */
Create Table dbo.orders_stage_swapOut
(
      order_id  int             Not Null
    , orderDate smalldatetime   Not Null
 
    Constraint PK_orders_stage_swapOut Primary Key Clustered
    (
        orderDate
      , order_id
    )
) On [Primary];
Go
 
/* Create the table to hold the data you're swapping in.
   The table structures must match identically; however,
   DO NOT partition this table. */
Create Table dbo.orders_stage_swapIn
(
      order_id  int             Not Null
    , orderDate smalldatetime   Not Null
 
    Constraint PK_orders_stage_swapIn Primary Key Clustered
    (
        orderDate
      , order_id
    )
) On [Primary];
Go
 
/* Populate the table you're swapping in. */
Insert Into dbo.orders_stage_swapIn
Select -5, '2008-02-02' Union All
Select -4, '2008-02-03' Union All
Select -3, '2008-02-04' Union All
Select -2, '2008-02-05' Union All
Select -1, '2008-02-06';
 
/* Create any indexes on your table to match the 
   indexes on your partitioned table. */
Create NonClustered Index IX_orders_stage_swapIn
    On dbo.orders_stage_swapIn(order_id);
 
/* Add a check constraint for the 
   partition to be swapped in. 
   This step is required. */
Alter Table dbo.orders_stage_swapIn
With Check
Add Constraint orders_stage_swapIn_orderDateCK
	Check (orderDate >= '2008-02-01' 
		And orderDate < '2008-03-01');
Go
 
/* Swap out the old partition. */
Alter Table dbo.orders 
    Switch Partition 3 To dbo.orders_stage_swapOut;
Go
 
/* Swap in the new partition. */
Alter Table dbo.orders_stage_swapIn
    Switch To dbo.orders Partition 3;
Go
 
---------------------
-- Check your data --
---------------------
 
/* You should have 2 records in here. */
Select * From dbo.orders_stage_swapOut;
 
/* You should have 5 records here. */
Select * 
From dbo.orders
Where orderDate >= '2008-02-01' 
And orderDate < '2008-03-01';
 
/* There should be no records in this table. */
Select * From dbo.orders_stage_swapIn;
 
Select $partition.test_monthlyDateRange_pf(orderDate) 
    As 'partition_number'
    , *
From dbo.orders;
 
/* Clean-up time!
Drop Table dbo.orders
Drop Table dbo.orders_stage_swapOut
Drop Table dbo.orders_stage_swapIn
Drop Partition Scheme test_monthlyDateRange_ps
Drop Partition Function [test_monthlyDateRange_pf]
*/

Pretty easy, huh? :)

Index Interrogation Script

As a continuation of the last post on indexes, here’s a script I wrote for index interrogation.  This stored procedure was the result of repeatedly having to lookup information on included columns.

If ObjectProperty(Object_ID('dbo.dba_indexLookup_sp'), 
        N'IsProcedure') Is Null
Begin
    Execute ('Create Procedure dbo.dba_indexLookup_sp 
        As Print ''Hello World!''')
    RaisError('Procedure dbo.dba_indexLookup_sp created.', 10, 1);
End;
Go
 
Set ANSI_Nulls On;
Set Ansi_Padding On;
Set Ansi_Warnings On;
Set ArithAbort On;
Set Concat_Null_Yields_Null On;
Set NoCount On;
Set Numeric_RoundAbort Off;
Set Quoted_Identifier On;
Go
 
Alter Procedure dbo.dba_indexLookup_sp
 
        /* Declare Parameters */
        @tableName  varchar(128)  =  Null
As
/***********************************************************************
    Name:       dba_indexLookup_sp
 
    Author:     Michelle F. Ufford
 
    Purpose:    Retrieves index information for the specified table.
 
    Notes:      If the tableName is left null, it will return index 
                information for all tables and indexes.
 
    Called by:  DBA
 
    Date        User   Description
    --------------------------------------------------------------------
    2008-10-28  MFU    Initial Release
************************************************************************
    Exec dbo.dba_indexLookup_sp
        @tableName = 'myTableName';
***********************************************************************/
 
Set NoCount On;
Set XACT_Abort On;
 
Begin
 
    Declare @objectID int;
 
    If @tableName Is Not Null
        Set @objectID = Object_ID(@tableName);
 
    With indexCTE(partition_scheme_name
                , partition_function_name
                , data_space_id)
    As (
        Select sps.name
            , spf.name
            , sps.data_space_id
        From sys.partition_schemes As sps
        Join sys.partition_functions As spf
            On sps.function_id = spf.function_id
    )
 
    Select st.name As 'table_name'
        , IsNull(ix.name, '') As 'index_name'
        , ix.object_id
        , ix.index_id
		, Cast(
            Case When ix.index_id = 1 
                    Then 'clustered' 
                When ix.index_id =0
                    Then 'heap'
                Else 'nonclustered' End
			+ Case When ix.ignore_dup_key <> 0 
                Then ', ignore duplicate keys' 
                    Else '' End
			+ Case When ix.is_unique <> 0 
                Then ', unique' 
                    Else '' End
			+ Case When ix.is_primary_key <> 0 
                Then ', primary key' Else '' End As varchar(210)
            ) As 'index_description'
        , IsNull(Replace( Replace( Replace(
            (   
                Select c.name As 'columnName'
                From sys.index_columns As sic
                Join sys.columns As c 
                    On c.column_id = sic.column_id 
                    And c.object_id = sic.object_id
                Where sic.object_id = ix.object_id
                    And sic.index_id = ix.index_id
                    And is_included_column = 0
                Order By sic.index_column_id
                For XML Raw)
                , '"/><row columnName="', ', ')
                , '<row columnName="', '')
                , '"/>', ''), '')
            As 'indexed_columns'
        , IsNull(Replace( Replace( Replace(
            (   
                Select c.name As 'columnName'
                From sys.index_columns As sic
                Join sys.columns As c 
                    On c.column_id = sic.column_id 
                    And c.object_id = sic.object_id
                Where sic.object_id = ix.object_id
                    And sic.index_id = ix.index_id
                    And is_included_column = 1
                Order By sic.index_column_id
                For XML Raw)
                , '"/><row columnName="', ', ')
                , '<row columnName="', '')
                , '"/>', ''), '')
            As 'included_columns'
        , IsNull(cte.partition_scheme_name, '') 
            As 'partition_scheme_name'
        , Count(partition_number) As 'partition_count'
        , Sum(rows) As 'row_count'
    From sys.indexes As ix
    Join sys.partitions As sp
        On ix.object_id = sp.object_id
        And ix.index_id = sp.index_id
    Join sys.tables As st
        On ix.object_id = st.object_id
    Left Join indexCTE As cte
        On ix.data_space_id = cte.data_space_id
    Where ix.object_id = IsNull(@objectID, ix.object_id)
    Group By st.name
        , IsNull(ix.name, '')
        , ix.object_id
        , ix.index_id
		, Cast(
            Case When ix.index_id = 1 
                    Then 'clustered' 
                When ix.index_id =0
                    Then 'heap'
                Else 'nonclustered' End
			+ Case When ix.ignore_dup_key <> 0 
                Then ', ignore duplicate keys' 
                    Else '' End
			+ Case When ix.is_unique <> 0 
                Then ', unique' 
                    Else '' End
			+ Case When ix.is_primary_key <> 0 
                Then ', primary key' Else '' End As varchar(210)
            )
        , IsNull(cte.partition_scheme_name, '')
        , IsNull(cte.partition_function_name, '')
    Order By table_name
        , index_id;
 
    Set NoCount Off;
    Return 0;
End
Go