Identity Columns: Are You Nearing The Limits?

I use identity columns frequently. After all, identity columns make great clustering keys. But it’s important when using identity columns to check on the amount of values you have left before you reach the limit of your data type. An identity column has a fixed amount of values it can use based upon whether you specified tinyint, smallint, int, or bigint when you defined the column. If you reach this limit, your inserts will blow up and cause a Chernobyl-like SQL meltdown will begin to fail. I just finished an audit of my tables and thought I’d share the script. I would like to warn that this script is *not* perfect… namely, it doesn’t handle negative integer values very elegantly. It also doesn’t know if you started your seed at zero, approached your max positive limit, then reseeded to the negative limit (see my “quick and dirty fix” tip at the end of this article).

/* Define how close we are to the value limit
   before we start throwing up the red flag.
   The higher the value, the closer to the limit. */
Declare @threshold decimal(3,2) = .85;
/* Create a temp table */
Create Table #identityStatus
      database_name     varchar(128)
    , table_name        varchar(128)
    , column_name       varchar(128)
    , data_type         varchar(128)
    , last_value        bigint
    , max_value         bigint
/* Use an undocumented command to run a SQL statement
   in each database on a server */
Execute sp_msforeachdb '
    Use [?];
    Insert Into #identityStatus
    Select ''?'' As [database_name]
        , Object_Name(id.object_id, DB_ID(''?'')) As [table_name]
        , As [column_name]
        , As [data_type]
        , Cast(id.last_value As bigint) As [last_value]
        , Case 
            When = ''tinyint''   Then 255 
            When = ''smallint''  Then 32767 
            When = ''int''       Then 2147483647 
            When = ''bigint''    Then 9223372036854775807
          End As [max_value]
    From sys.identity_columns As id
    Join sys.types As t
        On id.system_type_id = t.system_type_id
    Where id.last_value Is Not Null';
/* Retrieve our results and format it all prettily */
Select database_name
    , table_name
    , column_name
    , data_type
    , last_value
    , Case 
        When last_value < 0 Then 100
        Else (1 - Cast(last_value As float(4)) / max_value) * 100 
      End As [percentLeft]
    , Case 
        When Cast(last_value As float(4)) / max_value >= @threshold
            Then 'warning: approaching max limit'
        Else 'okay'
        End As [id_status]
From #identityStatus
Order By percentLeft;
/* Clean up after ourselves */
Drop Table #identityStatus;

If you find yourself quickly approaching your max limit and need to implement a quick and dirty fix, you can reseed your identity column. Of course, this only works if you started at zero instead of the actual lower, negative limit.

SQL Agent Job Script

This blog post is brought to you by the awesomeness that is the SQL Server Twitter community. Contributions were made by several awesome Tweeples, including Denny Cherry, Kendra Little, Ken Simmons, and Magnus Ahlkvist, among others.

What started this is something you’ve all probably run into at one time or another. We’re currently having some resource contention on our server when a ton of processes kickoff and overlap in the morning. Now, I have a script that I’ve used in the past for monitoring SQL agent jobs, but this time I wanted to add some historical run-time information. I know the sysjobhistory table contains the information I need, but it has some… let’s say, creative ways of storing the data. Opinions on the reasons why vary:

Regardless the reason, I needed to do some conversions. Denny shared with us the msdb.dbo.agent_datetime function to convert run_date and run_time into an actual datetime value. I have to say, this certainly cleans up the code quite a bit! Then Magnus shared a method to convert run_duration into seconds, which he modified from a script on SQLServerCentral. I was able to use these two tidbits to update my old script. You can now run this script to get back a list of all enabled procs, a script that will disable them, a breakdown of the schedule, and a historical run-time average.

/*  How many days do you want to include in your run-time average?
    Recent values tend to be more useful. */
Declare @daysToAverage smallint = 30;
Declare @avgRunTime Table
      job_id      uniqueidentifier
    , avgRunTime  int
/* We need to parse the schedule into something we can understand */
Declare @weekDay Table (
      mask       int
    , maskValue  varchar(32)
Insert Into @weekDay
Select 1, 'Sunday'  Union All
Select 2, 'Monday'  Union All
Select 4, 'Tuesday'  Union All
Select 8, 'Wednesday'  Union All
Select 16, 'Thursday'  Union All
Select 32, 'Friday'  Union All
Select 64, 'Saturday';
/* First, let's get our run-time average */
Insert Into @avgRunTime
Select job_id
    , Avg((run_duration/10000) * 3600 + (run_duration/100%100)*60 + run_duration%100) As 'avgRunTime' /* convert HHMMSS to seconds */
From msdb.dbo.sysjobhistory
Where step_id = 0 -- only grab our total run-time
    And run_status = 1 -- only grab successful executions
    And msdb.dbo.agent_datetime(run_date, run_time) >= DateAdd(day, -@daysToAverage, GetDate())
Group By job_id;
/* Now let's get our schedule information */
With myCTE
    Select As 'scheduleName'
        , sched.schedule_id
        , jobsched.job_id
        , Case When sched.freq_type = 1 Then 'Once' 
            When sched.freq_type = 4 
                And sched.freq_interval = 1 
                    Then 'Daily'
            When sched.freq_type = 4 
                Then 'Every ' + Cast(sched.freq_interval As varchar(5)) + ' days'
            When sched.freq_type = 8 Then 
                Replace( Replace( Replace(( 
                    Select maskValue 
                    From @weekDay As x 
                    Where sched.freq_interval & x.mask <> 0 
                    Order By mask For XML Raw)
                , '"/><row maskValue="', ', '), '<row maskValue="', ''), '"/>', '') 
                + Case When sched.freq_recurrence_factor <> 0 
                        And sched.freq_recurrence_factor = 1 
                            Then '; weekly' 
                    When sched.freq_recurrence_factor <> 0 Then '; every ' 
                + Cast(sched.freq_recurrence_factor As varchar(10)) + ' weeks' End
            When sched.freq_type = 16 Then 'On day ' 
                + Cast(sched.freq_interval As varchar(10)) + ' of every '
                + Cast(sched.freq_recurrence_factor As varchar(10)) + ' months' 
            When sched.freq_type = 32 Then 
                Case When sched.freq_relative_interval = 1 Then 'First'
                    When sched.freq_relative_interval = 2 Then 'Second'
                    When sched.freq_relative_interval = 4 Then 'Third'
                    When sched.freq_relative_interval = 8 Then 'Fourth'
                    When sched.freq_relative_interval = 16 Then 'Last'
                End + 
                Case When sched.freq_interval = 1 Then ' Sunday'
                    When sched.freq_interval = 2 Then ' Monday'
                    When sched.freq_interval = 3 Then ' Tuesday'
                    When sched.freq_interval = 4 Then ' Wednesday'
                    When sched.freq_interval = 5 Then ' Thursday'
                    When sched.freq_interval = 6 Then ' Friday'
                    When sched.freq_interval = 7 Then ' Saturday'
                    When sched.freq_interval = 8 Then ' Day'
                    When sched.freq_interval = 9 Then ' Weekday'
                    When sched.freq_interval = 10 Then ' Weekend'
                + Case When sched.freq_recurrence_factor <> 0 
                        And sched.freq_recurrence_factor = 1 Then '; monthly'
                    When sched.freq_recurrence_factor <> 0 Then '; every ' 
                + Cast(sched.freq_recurrence_factor As varchar(10)) + ' months' End
            When sched.freq_type = 64 Then 'StartUp'
            When sched.freq_type = 128 Then 'Idle'
          End As 'frequency'
        , IsNull('Every ' + Cast(sched.freq_subday_interval As varchar(10)) + 
            Case When sched.freq_subday_type = 2 Then ' seconds'
                When sched.freq_subday_type = 4 Then ' minutes'
                When sched.freq_subday_type = 8 Then ' hours'
            End, 'Once') As 'subFrequency'
        , Replicate('0', 6 - Len(sched.active_start_time)) 
            + Cast(sched.active_start_time As varchar(6)) As 'startTime'
        , Replicate('0', 6 - Len(sched.active_end_time)) 
            + Cast(sched.active_end_time As varchar(6)) As 'endTime'
        , Replicate('0', 6 - Len(jobsched.next_run_time)) 
            + Cast(jobsched.next_run_time As varchar(6)) As 'nextRunTime'
        , Cast(jobsched.next_run_date As char(8)) As 'nextRunDate'
    From msdb.dbo.sysschedules As sched
    Join msdb.dbo.sysjobschedules As jobsched
        On sched.schedule_id = jobsched.schedule_id
    Where sched.enabled = 1
/* Finally, let's look at our actual jobs and tie it all together */
Select As 'jobName'
    , sched.scheduleName
    , sched.frequency
    , sched.subFrequency
    , SubString(sched.startTime, 1, 2) + ':' 
        + SubString(sched.startTime, 3, 2) + ' - ' 
        + SubString(sched.endTime, 1, 2) + ':' 
        + SubString(sched.endTime, 3, 2) 
        As 'scheduleTime' -- HH:MM
    , SubString(sched.nextRunDate, 1, 4) + '/' 
        + SubString(sched.nextRunDate, 5, 2) + '/' 
        + SubString(sched.nextRunDate, 7, 2) + ' ' 
        + SubString(sched.nextRunTime, 1, 2) + ':' 
        + SubString(sched.nextRunTime, 3, 2) As 'nextRunDate'
      /* Note: the sysjobschedules table refreshes every 20 min, 
        so nextRunDate may be out of date */
    , 'Execute msdb.dbo.sp_update_job @job_id = ''' 
        + Cast(job.job_id As char(36)) + ''', @enabled = 0;' As 'disableScript'
    , art.avgRunTime As 'avgRunTime_inSec' -- in seconds
    , (art.avgRunTime / 60) As 'avgRunTime_inMin' -- convert to minutes
From msdb.dbo.sysjobs As job
Join myCTE As sched
    On job.job_id = sched.job_id
Left Join @avgRunTime As art
    On job.job_id = art.job_id
Where job.enabled = 1 -- do not display disabled jobs
Order By nextRunDate;

If this doesn’t do exactly what you want, check out SQLJobVis, which Ken recommended. It’s a free tool that helps visualize the job history of SQL jobs.

Metadata for Table Valued Parameters

Table-valued parameters (TVP) are a great feature that was new in SQL Server 2008 that allow you to insert a dataset into a table. Previously, the most common way of doing this was by passing and parsing XML. As I’ve previously posted, TVP’s perform an astounding 94% faster than singleton inserts and 75% faster than XML inserts. But for some reason, TVP’s still aren’t widely used and understood. In this post, I’ll walk you through how to use these and how to query the metadata for TVP’s.

I’ve previously posted about what TVP’s are and how to use them. But in honor of Halloween this week, I’ve updated my demo script:

/* Create some tables to work with */
CREATE TABLE dbo.orders
      order_id      INT IDENTITY(1,1)   Not Null
    , orderDate     DATE                Not Null
    , customer      VARCHAR(20)         Not Null
    CONSTRAINT PK_orders
        PRIMARY KEY CLUSTERED(order_id)
CREATE TABLE dbo.orderDetails
      orderDetail_id    INT IDENTITY(1,1)   Not Null
    , order_id          INT                 Not Null
    , lineItem          INT                 Not Null
    , product           VARCHAR(20)         Not Null
    CONSTRAINT PK_orderDetails
        PRIMARY KEY CLUSTERED(orderDetail_id)
    CONSTRAINT FK_orderDetails_orderID
        FOREIGN KEY(order_id)
        REFERENCES dbo.orders(order_id)
/* Create our new table types */
CREATE TYPE dbo.orderTable AS TABLE 
      orderDate     DATE
    , customer      VARCHAR(20)
CREATE TYPE dbo.orderDetailTable AS TABLE 
      lineItem      INT
    , product       VARCHAR(20)
/* Create a new procedure using a table-valued parameter */
CREATE PROCEDURE dbo.insert_orderTVP_sp
      @myOrderTable orderTable READONLY
    , @myOrderDetailTable orderDetailTable READONLY
    DECLARE @myOrderID INT;
    INSERT INTO dbo.orders
    SELECT orderDate
        , customer
    FROM @myOrderTable;
    SET @myOrderID = SCOPE_IDENTITY();
    INSERT INTO dbo.orderDetails
    SELECT @myOrderID
        , lineItem
        , product
    FROM @myOrderDetailTable;
/* Call our new proc! */
DECLARE @myTableHeaderData AS orderTable
    , @myTableDetailData AS orderDetailTable;
INSERT INTO @myTableHeaderData
(orderDate, customer)
VALUES (GETDATE(), 'Zombie');
INSERT INTO @myTableDetailData
(lineItem, product)
SELECT 20, 'More Brains';
EXECUTE dbo.insert_orderTVP_sp 
    , @myTableDetailData;
DELETE FROM @myTableHeaderData;
DELETE FROM @myTableDetailData;
INSERT INTO @myTableHeaderData
(orderDate, customer)
VALUES (GETDATE(), 'Vampire');
INSERT INTO @myTableDetailData
(lineItem, product)
SELECT 10, 'Blood Type O+' UNION ALL
SELECT 20, 'Blood Type B-' UNION ALL
SELECT 30, 'Blood Type AB+' UNION ALL
SELECT 40, 'Blood Type A+';
EXECUTE dbo.insert_orderTVP_sp 
    , @myTableDetailData;
/* Check our data */
SELECT * FROM dbo.orders;
SELECT * FROM dbo.orderDetails;

Once you’ve run this, you should see the following data:

order_id    orderDate  customer
----------- ---------- --------------------
1           2010-10-28 Zombie
2           2010-10-28 Vampire
(2 row(s) affected)
orderDetail_id order_id    lineItem    product
-------------- ----------- ----------- --------------------
1              1           10          Brains
2              1           20          More Brains
3              2           10          Blood Type O+
4              2           20          Blood Type B-
5              2           30          Blood Type AB+
6              2           40          Blood Type A+
(6 row(s) affected)

Now that we’ve successfully created a couple of table types to support our TVP’s, how do we go back and find out which objects we’ve created? You can query the sys.types catalog view to find out. Just search for system_type_id 243, which identifies the record as a table type.

/* Let's check out our new data types */
    , system_type_id
    , is_table_type
FROM sys.types
WHERE system_type_id = 243;
name                 system_type_id is_table_type
-------------------- -------------- -------------
orderTable           243            1
orderDetailTable     243            1
(2 row(s) affected)

Even better yet, you can use the sys.table_types catalog view. This gives us the same information as sys.types but also gives us the type_table_object_id, which we’ll need shortly.

    , system_type_id
    , is_table_type
    , type_table_object_id
FROM sys.table_types;
name                 system_type_id is_table_type type_table_object_id
-------------------- -------------- ------------- --------------------
orderTable           243            1             917578307
orderDetailTable     243            1             933578364
(2 row(s) affected)

What if you need to look up the table type definition? You can do this using the type_table_object_id and joining to sys.columns.

SELECT AS 'table_type_name'
    , AS 'column_name'
    , AS 'data_type'
FROM sys.table_types AS tt
JOIN sys.columns AS c
    ON tt.type_table_object_id = c.object_id
JOIN sys.types As t
    ON c.system_type_id = t.system_type_id;
table_type_name      column_name     data_type
-------------------- --------------- ---------------
orderTable           orderDate       date
orderDetailTable     lineItem        int
orderTable           customer        varchar
orderDetailTable     product         varchar
(4 row(s) affected)

And last, but certainly not least, how do we see if any procs are currently using the table types? SQL Server 2008 makes this easy for us with the sys.dm_sql_referencing_entities DMV.

SELECT referencing_schema_name, referencing_entity_name, referencing_id
FROM sys.dm_sql_referencing_entities ('dbo.orderTable', 'TYPE');
referencing_schema_name referencing_entity_name referencing_id
----------------------- ----------------------- --------------
dbo                     insert_orderTVP_sp      949578421
(1 row(s) affected)

If you’re wondering how to implement SQL Server TVP’s in your .NET code, well… I can’t tell you how to do it, but I can point you to a place that can. Stephen Forte has a post that explains how easy it is to do.

So now that you have a better understanding of how to work with TVP’s, why don’t you go implement one in your environment and see how for yourself just how awesome it is? :)

Oh, and Happy Halloween!

Rename Database Objects En Masse

Ever need to rename all objects in a database? Yeah, okay… it doesn’t happen very often, but when it does, it can be time consuming. This recently came up as something I needed to do. When you consider all the tables involved and you add in defaults, indexes, and foreign keys, well… you can imagine how the number of objects adds up quickly. After doing a few tedious renames, it occurred to me that I could write a script to generate the rename scripts for me. Work smarter, not harder, right? For anyone interested, here’s the script.

Select name
    , [object_id] 
    , Case 
        When [type] = 'U' Then 'Execute sp_rename N''' + name + ''', N''old_' + name + ''''
        When [type] IN ('D', 'PK', 'F') Then 
            'Execute sp_rename N''' + name + ''', N''old_' + name + ''', N''OBJECT'''
        End As 'renameScript'
    , Case When parent_object_id > 0 Then 0 Else 1 End As 'sortOrder'
From sys.objects
Where [type] In ('D', 'PK', 'U', 'F')
Union ALL
    , o.[object_id]
    , 'Execute sp_rename N''' + + '.' + + ''', N''old_' + + ''', N''INDEX''' As 'renameScript'
    , 0 As 'sortOrder'
From sys.indexes As i
JOIN sys.objects As o
    On i.object_id = o.object_id
Where i.is_primary_key = 0 -- exclude PKs, we take care of that above
    AND i.type <> 0 -- exclude heaps
    AND o.type Not In ('S', 'IT') -- exclude system & internal tables
Order By sortOrder;

Be forewarned that I only tested this on a couple of databases, but it seemed to run without problem for tables, indexes, primary keys, defaults, and foreign keys. The sortOrder column is there only to ensure that table renames are performed last. Otherwise, your index renames would fail. This will only run on SQL Server 2005 or 2008. If you have any problems with the script, please let me know. :)

Check VLF Counts

Today I stumbled across a database with 87,302 VLF’s. Yes, that’s right… 87 THOUSAND. Most of our databases have a few dozen VLF’s, but this was an old database that had grown to 1.5 TB and had the default autogrowth settings left in tact. How did we discover this? During a routine reboot of the server, this database took 30 minutes to recover, but there were no error messages or status messages in the log.

Now, this blog post is not about VLF’s or why you should keep the number of VLF’s to a small, manageable number — although I hear under 50 is a good rule of thumb. No, the purpose of this blog post is to share a little script I wrote to check the number of VLF’s each database uses:

Create Table #stage(
    FileID      int
  , FileSize    bigint
  , StartOffset bigint
  , FSeqNo      bigint
  , [Status]    bigint
  , Parity      bigint
  , CreateLSN   numeric(38)
Create Table #results(
    Database_Name   sysname
  , VLF_count       int 
Exec sp_msforeachdb N'Use ?; 
            Insert Into #stage 
            Exec sp_executeSQL N''DBCC LogInfo(?)''; 
            Insert Into #results 
            Select DB_Name(), Count(*) 
            From #stage; 
            Truncate Table #stage;'
Select * 
From #results
Order By VLF_count Desc;
Drop Table #stage;
Drop Table #results;

This script is low-impact and is safe to run on large, production databases during business hours. However, just be aware that it’s using some undocumented commands.

For more information on VLF’s, check out these excellent articles:

Filtered Indexes Work-Around

Recently, I needed to create a stored procedure that queried a rather large table. The table has a filtered index on a date column, and it covers the query. However, the Query Optimizer was not using the index, which was increasing the execution time (not to mention IO!) by at least 10x. This wasn’t the first time I’ve had the Optimizer fail to use a filtered index. Normally when this happens, I use a table hint to force the filtered index — after I verify that it is indeed faster, of course. However, since this was a stored procedure, I was receiving the following error message whenever I tried to execute the proc:

Query processor could not produce a query plan because of the hints defined in this query. Resubmit the query without specifying any hints and without using SET FORCEPLAN.

SQL Server would not allow me to execute the stored procedure using the filtered index hint. If I removed the hint, it executed, but it used a different, non-covering and far more expensive index. For those of you not familiar with this issue, allow me to illustrate the problem.

First, create a table to play with and populate it with some bogus data:

Create Table dbo.filteredIndexTest
      myID   int Identity(1,3)
    , myDate smalldatetime 
    , myData char(100)
    Constraint PK_filteredIndexTest
        Primary Key Clustered(myID)
Set NoCount On;
Declare @date smalldatetime = '2010-01-01';
While @date < '2010-02-01'
    Insert Into dbo.filteredIndexTest
        , myData
    Select @date
        , 'Date: ' + Convert(varchar(20), @date, 102);
    Set @date = DateAdd(minute, 1, @date);
Select Count(*) From dbo.filteredIndexTest;

It looks like this will generate 44,640 rows of test data… plenty enough for our purposes. Now, let’s create our filtered index and write a query that will use it:

Create NonClustered Index IX_filteredIndexTest_1
    On dbo.filteredIndexTest(myDate)
    Include (myData)
    Where myDate >= '2010-01-27';
Select Distinct myData
From dbo.filteredIndexTest
Where myDate >= '2010-01-28';

If you look at the execution plan for this query, you’ll notice that the Optimizer is using the filtered index. Perfect! Now let’s parameterize it.

Declare @myDate1 smalldatetime = '2010-01-28';
Select Distinct myData
From dbo.filteredIndexTest
Where myDate >= @myDate1;

Uh oh. Looking at the execution plan, we see that SQL Server is no longer using the filtered index. Instead, it’s scanning the clustered index! Why is this? There’s actually a good explanation for it. The reason is that I could, in theory, pass a date to my parameter that fell outside of the filtered date range. If that’s the case, then SQL Server could not utilize the filtered index. Personally, I think it’s a bug and SQL Server should identify whether or not a filtered index could be used based on the actual value submitted, but… that’s a whole other blog post. :)

So what can we do? Well, dynamic SQL may be able to help us out in this case. Let’s give it a go. First, let’s try parameterized dynamic SQL.

Declare @mySQL1 nvarchar(2000)
    , @myParam nvarchar(2000) = '@p_myDate2 smalldatetime'
    , @myDate2 smalldatetime = '2010-01-28';
Set @mySQL1 = 'Select Distinct myData
              From dbo.filteredIndexTest
              Where myDate >= @p_myDate2';
Execute sp_executeSQL @mySQL1, @myParam, @p_myDate2 = @myDate2;

Looking at the execution plan, we see we’re still scanning on the clustered index. This is because the parameterized dynamic SQL resolves to be the exact same query as the one above it. Let’s try unparameterized SQL instead:

Declare @mySQL2 nvarchar(2000)
    , @myDate3 smalldatetime = '2010-01-28';
Set @mySQL2 = 'Select Distinct myData
              From dbo.filteredIndexTest
              Where myDate >= ''' + Cast(@myDate3 As varchar(20)) + '''';
Execute sp_executeSQL @mySQL2;
-- Drop Table dbo.filteredIndexTest;

Voila! We have a seek on our filtered index. Why? Because the statement resolves to be identical to our first query, where we hard-coded the date value in the WHERE clause.

Now, I want to stress this fact: you should always, ALWAYS use parameterized dynamic SQL whenever possible. Not only is it safer, but it’s also faster, because it can reuse cached plans. But sometimes you just cannot accomplish the same tasks with it. This is one of those times. If you do end up needing to use unparameterized dynamic SQL as a work-around, please make sure you’re validating your input, especially if you’re interfacing with any sort of external source.

There’s an even easier work-around for this problem that Dave ( shared with me: recompile.

Adding “Option (Recompile)” to the end of your statements will force the Optimizer to re-evaluate which index will best meet the needs of your query every time the statement is executed. More importantly, it evaluates the plan based on the actual values passed to the parameter… just like in our hard-coded and dynamic SQL examples. Let’s see it in action:

DECLARE @myDate4 SMALLDATETIME = '2010-01-28';
FROM dbo.filteredIndexTest 
WHERE myDate >= @myDate4
DECLARE @myDate5 SMALLDATETIME = '2010-01-20';
FROM dbo.filteredIndexTest 
WHERE myDate >= @myDate5

If we look at the execution plans for the 2 queries above, we see that the first query seeks on the filtered index, and the second query scans on the clustered index. This is because the second query cannot be satisfied with the filtered index because we initially limited our index to dates greater than or equal to 1/27/2010.

There are, of course, trade-offs associated with each approach, so use whichever one best meets your needs. Do you have another work-around for this issue? If so, please let me know. :)


Alex Kuznetsov ( shared this method too:

DECLARE @myDate1 SMALLDATETIME = '2010-01-28';
FROM dbo.filteredIndexTest
WHERE myDate = @myDate1
AND myDate >= '2010-01-27';

Like the other examples, this will result in an index seek on the filtered index. Basically, by explicitly declaring the start date of your filter, you’re letting the Optimizer know that the filtered index can satisfy the request, regardless of the parameter value passed. Thanks for the tip, Alex! :)

Replication Bug with Partitioned Tables

Recently, we came across a bug in SQL Server 2005 on one of our production servers. Apparently, if you execute an ALTER TABLE statement on a replicated table with more than 128 partitions, the log reader will fail. A relatively obscure bug, I know. Microsoft has recognized this as a confirmed bug, but I couldn’t find it anywhere on the intertubes, thus the inspiration for this blog post. Microsoft’s official solution for this issue is to upgrade to SQL Server 2008.

For various reasons, we were unable to execute an upgrade at the time. And since this was a 2 terabyte database, we wanted to come up with a solution that wouldn’t involve reinitializing the entire publication. Our quick-fix while we were troubleshooting the issue was to create a linked server to the production box. Not ideal, I know, but it worked in a pinch and minimized exposure of the issue. Fortunately for us, we were able to solve the problem on the publication database pretty easily. All of the affected partition functions had empty partitions created several months in the future, so we simply merged any empty partition ranges for future dates. Our solution to our now-out-of-date subscribers was to apply static row filtering to any table with more than 100 million records. While this would introduce some overhead with the replication of these tables, it would allow us a much faster recovery time. We decided to use the start of the most recent hour as our filtering criteria, just to give us a “clean” filter, so we had to delete data from any table where we were going to apply the filter. After that, it was simply a matter of resuming replication.

All things considered, it took us a little over a day to recover from the issue. Most of that time was spent troubleshooting the problem and identifying a workable solution; actual execution of the changes was pretty quick. Moral of the story? Upgrade to SQL Server 2008. :)

Partitioning Tricks

For those of you who are using partitioning, or who are considering using partitioning, allow me to share some tips with you.

Easy Partition Staging Tables

Switching partitions (or more specifically, hobts) in and out of a partitioned table requires the use of a staging table. The staging table has very specific requirements: it must be completely identical to the partitioned table, including indexing structures, and it must have a check constraint that limits data to the partitioning range. Thanks to my co-worker Jeff, I’ve recently started using the SQL Server Partition Management tool on CodePlex. I haven’t used the automatic partition switching feature — frankly, using any sort of data modification tool in a production environment makes me nervous — but I’ve been using the scripting option to create staging tables in my development environment, which I then copy to production for use. It’s nothing you can’t do yourself, but it does make the whole process easy and painless, plus it saves you from annoying typos. But be careful when using this tool to just create the table and check constraints automatically, because you may need to…

Add Check Constraints After Loading Data

Most of the time, I add the check constraint when I create the staging table, then I load data and perform the partition switch. However, for some reason, I was receiving the following error:

.Net SqlClient Data Provider: Msg 4972, Level 16, State 1, Line 1
ALTER TABLE SWITCH statement failed. Check constraints or partition function of source table ‘myStagingTable’ allows values that are not allowed by check constraints or partition function on target table ‘myDestinationTable’.

This drove me crazy. I confirmed my check constraints were correct, that I had the correct partition number, and that all schema and indexes matched identically. After about 30 minutes of this, I decided to drop and recreate the constraint. For some reason, it fixed the issue. Repeat tests produced the same results: the check constraint needed to be added *after* data was loaded. This error is occurring on a SQL Server 2008 SP1 box; to be honest, I’m not sure what’s causing the error, so if you know, please leave me a comment. But I figured I’d share so that anyone else running into this issue can hopefully save some time and headache. :)

Replicating Into Partitioned and Non-Partitioned Tables

Recently, we needed to replicate a non-partitioned table to two different destinations. We wanted to use partitioning for Server A, which has 2008 Enterprise; Server B, which is on 2005 Standard, could not take advantage of partitioning. The solution was really easy: create a pre-snapshot and post-snapshot script for the publication, then modify to handle each server group differently. Using pseudo-code, it looked something like this:

/* Identify which servers get the partitioned version */
If @@ServerName In ('yourServerNameList') 
    /* Create your partitioning scheme if necessary */
    If Not Exists(Select * From sys.partition_schemes Where name = 'InsertPartitionScheme')
        CREATE PARTITION SCHEME InsertPartitionScheme 
            AS PARTITION InsertPartitionFunction ALL TO ([PRIMARY]);    
    /* Create your partitioning function if necessary */
    If Not Exists(Select * From sys.partition_functions Where name = 'InsertPartitionFunction')
        CREATE PARTITION FUNCTION InsertPartitionFunction (smalldatetime) 
            AS RANGE RIGHT FOR VALUES ('insertValues');    
    /* Create a partitioned version of your table */
    CREATE TABLE [dbo].[yourTableName] (
    ) ON InsertPartitionScheme([partitioningKey]);
    /* Create a non-partitioned version of your table */
    CREATE TABLE [dbo].[yourTableName] (
    ) ON [Primary];

You could also use an edition check instead of a server name check, if you prefer. The post-snapshot script basically looked the same, except you create partitioned indexes instead.

Compress Old Partitions

Did you know you can set different compression levels for individual partitions? It’s true! I’ve just completed doing this on our largest partitioned table. Here’s how:

/* Apply compression to your partitioned table */
Alter Table dbo.yourTableName
Rebuild Partition = All
      Data_Compression = Page On Partitions(1 to 9)
    , Data_Compression = Row  On Partitions(10 to 11) 
    , Data_Compression = None On Partitions(12)
/* Apply compression to your partitioned index */
Alter Index YourPartitionedIndex
    On dbo.yourTableName
    Rebuild Partition = All
      Data_Compression = Page On Partitions(1 to 9)
    , Data_Compression = Row  On Partitions(10 to 11) 
    , Data_Compression = None On Partitions(12)
/* Apply compression to your unpartitioned index */
Alter Index YourUnpartitionedIndex
    On dbo.yourTableName
    Rebuild With (Data_Compression = Row);

A couple of things to note. In all of our proof-of-concept testing, we found that compression significantly reduced query execution time, reads (IO), and storage. However, CPU was also increased significantly. The results were more dramatic, both good and bad, with page compression versus row compression. Still, for our older partitions, which aren’t queried regularly, it made sense to turn on page compression. The newer partitions receive row compression, and the newest partitions, which are still queried very regularly by routine processes, were left completely uncompressed. This seems to strike a nice balance in our environment, but of course, results will vary depending on how you use your data.

Something to be aware of is that compressing your clustered index does *not* compress your non-clustered indexes; those are separate operations. Lastly, for those who are curious, it took us about 1 minute to apply row compression and about 7 minutes to apply page compression to partitions averaging 30 million rows.

Looking for more information on table partitioning? Check out my overview of partitioning, my example code, and my article on indexing on partitioned tables.

Find Recently Executed Stored Procedures

This past weekend, we had an issue where replication fell far behind on one of our databases. The replicated database is used for all sorts of reporting, so the immediate need was to identify processes that may have been affected by the incomplete data.

Now, there’s hundreds of stored procedures that reference the affected database; the trick is finding out which ones are relevant. To do this, I used the sys.dm_exec_query_stats DMV. This does two things for me. One, it shows me a list of stored procedures in cache, meaning they’ve been executed relatively recently and are probably relevant to the search. Secondly, it shows me the last execution time, which in some cases may have been before the issue, meaning I do not need to worry about re-running those processes.

Here’s the query I used:

Select DB_Name(dest.[dbid]) As 'databaseName'
    , Object_Name(dest.objectid, dest.[dbid]) As 'procName'
    , Max(deqs.last_execution_time) As 'last_execution'
From sys.dm_exec_query_stats As deqs
Cross Apply sys.dm_exec_sql_text(deqs.sql_handle) As dest
Where dest.[text] Like '%yourTableName%' -- replace
    And dest.[dbid] Is Not Null  -- exclude ad-hocs
Group By db_name(dest.[dbid])
    , Object_Name(dest.objectid, dest.[dbid])
Order By databaseName
    , procName
Option (MaxDop 1);

This will return results similar to:

databaseName         procName                       last_execution
-------------------- ------------------------------ -----------------------
AdventureWorks       ufnGetProductListPrice         2009-08-03 09:57:25.390
AdventureWorksDW     DimProductCategoryGet_sp       2009-08-03 09:59:05.820
AdventureWorksDW     DimProductGet_sp               2009-08-03 09:58:38.370

I want to stress that this is *not* a list of all referencing objects, but rather a list of recently executed stored procedures that are still in memory. This list may not be accurate if your cache has recently been flushed or if you’ve recently rebooted your server.

Poor (Wo)Man’s Graph

Lary shared this poor (wo)man’s graph with me today, and I thought it was pretty awesome:

Select OrderDate 
    , COUNT(*) As 'orders'
    , REPLICATE('=', COUNT(*)) As 'orderGraph'
    , SUM(TotalDue) As 'revenue'
    , REPLICATE('$', SUM(TotalDue)/1000) As 'revenueGraph'
From AdventureWorks.Sales.SalesOrderHeader
Where OrderDate Between '2003-07-15' And '2003-07-31'
Group By OrderDate
Order By OrderDate;

This will return a simple but effective “graph” for you:

orderDate  orders orderGraph                     revenue  revenueGraph
---------- ------ ------------------------------ -------- ----------------------------------------
2003-07-15 19     ===================            34025.24 $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$      
2003-07-16 14     ==============                 26687.65 $$$$$$$$$$$$$$$$$$$$$$$$$$$             
2003-07-17 16     ================               32411.93 $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$        
2003-07-18 9      =========                      18634.91 $$$$$$$$$$$$$$$$$$$                     
2003-07-19 13     =============                  19603.23 $$$$$$$$$$$$$$$$$$$$                    
2003-07-20 24     ========================       47522.80 $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
2003-07-21 9      =========                      11781.62 $$$$$$$$$$$$                            
2003-07-22 17     =================              32322.50 $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$        
2003-07-23 15     ===============                30906.44 $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$         
2003-07-24 28     ============================   51107.90 $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
2003-07-25 15     ===============                27058.10 $$$$$$$$$$$$$$$$$$$$$$$$$$$             
2003-07-26 18     ==================             41076.49 $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
2003-07-27 15     ===============                22169.88 $$$$$$$$$$$$$$$$$$$$$$                  
2003-07-28 16     ================               23945.80 $$$$$$$$$$$$$$$$$$$$$$$$                
2003-07-29 25     =========================      51122.95 $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
2003-07-30 12     ============                   23476.44 $$$$$$$$$$$$$$$$$$$$$$$                 
2003-07-31 18     ==================             36266.76 $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$

Who needs Reporting Services when you’ve got REPLICATE? ;)