Index Defrag Script Updates – Beta Testers Needed

Update: Wow! I’ve received a ton of responses to my request for beta testers. Thank you all! The SQL Community is really amazing. I’ll hopefully have the new version online in just a few days. :)

Over the last few months, I’ve received many great comments and suggestions regarding my Index Defrag Script v3.0. I’ve just recently had time to implement most of these suggestions, plus some other things that I thought would be useful. :)

Here’s some of what you can look forward to shortly:

  • Probably the single most requested feature, the new version of the script allows you to set a time limit for index defrags.
  • There’s now a static table for managing the status of index defrags. This way, when your time limit is reached, you can pick up where you left off the next day, without the need to rescan indexes.
  • There’s now an option to prioritize defrags by range scan counts, fragmentation level, or page counts.
  • For those using partitioning, there is now an option to exclude the right-most populated partition from defrags (in theory, the one you’re writing to in a sliding-window scenario).
  • Options such as page count limits and SORT_IN_TEMPDB are now parameterized.
  • I’ve enhanced error logging.
  • … and more!

Right now, I’m looking for a few folks who are willing to beta test the script. If you’re interested, please send me an e-mail at michelle at sqlfool dot com with the editions of SQL Server you can test this on (i.e. 2005 Standard, 2008 Enterprise, etc.).

Thank you! :)

#PASSAwesomeness

Allen Kinsel on Twitter (@sqlinsaneo) recently started a new Twitter tag, #PASSAwesomeness, about all of the cool things about PASS Summit. I really like the tag, so I’m going to blatantly steal borrow it for this post. :)

First, and long overdue, I want to give a brief recap of the East Iowa SQL Saturday. On October 17th, our local PASS chapter, 380PASS, sponsored our first ever SQL Saturday at the University of Iowa in Iowa City. By all accounts, the event was a great success! We had 90 attendees, 11 speakers, and 21 sessions. We received numerous compliments on the quality of the speakers, the niceness of the facilities, and the abundance of food. Not too shabby for our first time hosting the event, if I do say so myself. :)

I’d like to thank all of our wonderful speakers, especially those who traveled from out of town and out of state, for making this event such a success. I’d also like to thank our amazing volunteers for helping put this all together. Lastly, but certainly not least, I’d like to thank our generous sponsors, without whom this event would not be possible. Because this event went so smoothly and was so well received in the community, we’ve already started planning our next big SQL event! In the meantime, don’t forget to check out our monthly 380PASS meetings to tide you over.

I’d also like to take a moment to discuss the PASS Summit. Unless you’re a DBA who’s been living under a rock, you’ve probably heard of the PASS Summit. If you *have* been living under a rock — and hey, I’m not poking fun, I used to live under a rock, too! — then what you need to know is that the Summit is the largest SQL Server conference in the world. It’s a gathering of Microsoft developers and SQL Server gurus; the rest of us show up to try to absorb as much from them as possible. Since I’ve recently moved to the Business Intelligence team, I’m extremely excited to delve into the amazing amount of BI content offered.

I’m also deeply honored to be presenting at the Summit this year on some of the performance tuning techniques I’ve used with great success in my production environments. The session is titled, Super Bowl, Super Load – A Look At Performance Tuning for VLDB’s. If you’re interested in performance tuning or VLDB (very large database) topics, consider stopping by to catch my session. From what I can tell, I’ll be presenting on Tuesday from 10:15am – 11:30am in room(s?) 602-604.

If you read my blog, or if we’ve ever interacted in any way on the internet — Twitter, LinkedIn, e-mails, blog comments, etc. — please stop by and say “hi”! Aside from all of the awesome SQL Server content, I’m really looking forward to meeting as many new folks as possible.

And on that note…

Getting to meet all of the amazing SQL Server professionals out there who have inspired and encouraged me in so many ways #PASSAwesomeness

Partitioning Tricks

For those of you who are using partitioning, or who are considering using partitioning, allow me to share some tips with you.

Easy Partition Staging Tables

Switching partitions (or more specifically, hobts) in and out of a partitioned table requires the use of a staging table. The staging table has very specific requirements: it must be completely identical to the partitioned table, including indexing structures, and it must have a check constraint that limits data to the partitioning range. Thanks to my co-worker Jeff, I’ve recently started using the SQL Server Partition Management tool on CodePlex. I haven’t used the automatic partition switching feature — frankly, using any sort of data modification tool in a production environment makes me nervous — but I’ve been using the scripting option to create staging tables in my development environment, which I then copy to production for use. It’s nothing you can’t do yourself, but it does make the whole process easy and painless, plus it saves you from annoying typos. But be careful when using this tool to just create the table and check constraints automatically, because you may need to…

Add Check Constraints After Loading Data

Most of the time, I add the check constraint when I create the staging table, then I load data and perform the partition switch. However, for some reason, I was receiving the following error:

.Net SqlClient Data Provider: Msg 4972, Level 16, State 1, Line 1
ALTER TABLE SWITCH statement failed. Check constraints or partition function of source table ‘myStagingTable’ allows values that are not allowed by check constraints or partition function on target table ‘myDestinationTable’.

This drove me crazy. I confirmed my check constraints were correct, that I had the correct partition number, and that all schema and indexes matched identically. After about 30 minutes of this, I decided to drop and recreate the constraint. For some reason, it fixed the issue. Repeat tests produced the same results: the check constraint needed to be added *after* data was loaded. This error is occurring on a SQL Server 2008 SP1 box; to be honest, I’m not sure what’s causing the error, so if you know, please leave me a comment. But I figured I’d share so that anyone else running into this issue can hopefully save some time and headache. :)

Replicating Into Partitioned and Non-Partitioned Tables

Recently, we needed to replicate a non-partitioned table to two different destinations. We wanted to use partitioning for Server A, which has 2008 Enterprise; Server B, which is on 2005 Standard, could not take advantage of partitioning. The solution was really easy: create a pre-snapshot and post-snapshot script for the publication, then modify to handle each server group differently. Using pseudo-code, it looked something like this:

/* Identify which servers get the partitioned version */
If @@ServerName In ('yourServerNameList') 
Begin
 
    /* Create your partitioning scheme if necessary */
    If Not Exists(Select * From sys.partition_schemes Where name = 'InsertPartitionScheme')
        CREATE PARTITION SCHEME InsertPartitionScheme 
            AS PARTITION InsertPartitionFunction ALL TO ([PRIMARY]);    
 
    /* Create your partitioning function if necessary */
    If Not Exists(Select * From sys.partition_functions Where name = 'InsertPartitionFunction')
        CREATE PARTITION FUNCTION InsertPartitionFunction (smalldatetime) 
            AS RANGE RIGHT FOR VALUES ('insertValues');    
 
    /* Create a partitioned version of your table */
    CREATE TABLE [dbo].[yourTableName] (
	    [yourTableSchema]
    ) ON InsertPartitionScheme([partitioningKey]);
 
End
Else
Begin
 
    /* Create a non-partitioned version of your table */
    CREATE TABLE [dbo].[yourTableName] (
	    [yourTableSchema]
    ) ON [Primary];
 
End

You could also use an edition check instead of a server name check, if you prefer. The post-snapshot script basically looked the same, except you create partitioned indexes instead.

Compress Old Partitions

Did you know you can set different compression levels for individual partitions? It’s true! I’ve just completed doing this on our largest partitioned table. Here’s how:

/* Apply compression to your partitioned table */
Alter Table dbo.yourTableName
Rebuild Partition = All
With 
(
      Data_Compression = Page On Partitions(1 to 9)
    , Data_Compression = Row  On Partitions(10 to 11) 
    , Data_Compression = None On Partitions(12)
);
 
/* Apply compression to your partitioned index */
Alter Index YourPartitionedIndex
    On dbo.yourTableName
    Rebuild Partition = All
    With 
    (
      Data_Compression = Page On Partitions(1 to 9)
    , Data_Compression = Row  On Partitions(10 to 11) 
    , Data_Compression = None On Partitions(12)
    );
 
/* Apply compression to your unpartitioned index */
Alter Index YourUnpartitionedIndex
    On dbo.yourTableName
    Rebuild With (Data_Compression = Row);

A couple of things to note. In all of our proof-of-concept testing, we found that compression significantly reduced query execution time, reads (IO), and storage. However, CPU was also increased significantly. The results were more dramatic, both good and bad, with page compression versus row compression. Still, for our older partitions, which aren’t queried regularly, it made sense to turn on page compression. The newer partitions receive row compression, and the newest partitions, which are still queried very regularly by routine processes, were left completely uncompressed. This seems to strike a nice balance in our environment, but of course, results will vary depending on how you use your data.

Something to be aware of is that compressing your clustered index does *not* compress your non-clustered indexes; those are separate operations. Lastly, for those who are curious, it took us about 1 minute to apply row compression and about 7 minutes to apply page compression to partitions averaging 30 million rows.

Looking for more information on table partitioning? Check out my overview of partitioning, my example code, and my article on indexing on partitioned tables.

Monitoring Process for Performance Counters

Recently I needed to create a process to monitor performance counters over a short period of time. We were going to implement a change and we wanted to compare performance before and after to see if there was any impact.

To do this, I first created a couple of tables. One table is used to actually store the monitored values. The second table is used for configuration; you insert only the counters you want to monitor.

/* Create the table to store our logged perfmon counters */
Create Table dbo.dba_perfCounterMonitor
(
      capture_id    int Identity(1,1)   Not Null
    , captureDate   smalldatetime       Not Null
    , objectName    nvarchar(128)       Not Null
    , counterName   nvarchar(128)       Not Null
    , instanceName  nvarchar(128)       Not Null
    , value         float(6)            Not Null
    , valueType     nvarchar(10)        Not Null
 
    Constraint PK_dba_perfCounterMonitor
        Primary Key Clustered(capture_id)
);
 
/* Create the table that controls which counters we're going to monitor */
Create Table dbo.dba_perfCounterMonitorConfig
(
      objectName    nvarchar(128)   Not Null
    , counterName   nvarchar(128)   Not Null
    , instanceName  nvarchar(128)   Null
);

If you leave the instanceName NULL in the config table, it’ll monitor all instances. Now we’re going to insert some sample performance counters into the config table. The counters you’re interested in can, and likely will, vary.

/* Insert some perfmon counters to be monitored */
Insert Into dbo.dba_perfCounterMonitorConfig
Select 'SQLServer:Buffer Manager', 'Page Life Expectancy', Null Union All
Select 'SQLServer:Locks', 'Lock Requests/sec', Null Union All
Select 'SQLServer:Locks', 'Lock Waits/sec', Null Union All
Select 'SQLServer:Locks', 'Lock Wait Time (ms)', Null Union All
Select 'SQLServer:Buffer Manager', 'Page reads/sec', Null Union All
Select 'SQLServer:Buffer Manager', 'Page writes/sec', Null Union All
Select 'SQLServer:Buffer Manager', 'Buffer cache hit ratio', Null Union All
Select 'SQLServer:Databases', 'Transactions/sec', 'AdventureWorks' Union All
Select 'SQLServer:General Statistics', 'Processes blocked', Null;

Now let’s create our proc. This proc will run for a specified time period and will *average* the counters over that time. I personally take snapshots every 15 seconds for 4 minutes; I have a scheduled task that runs this every 5 minutes. It’s not perfect, but it gives me a good idea of what’s happening on the server.

Create Procedure dbo.dba_perfCounterMonitor_sp
 
        /* Declare Parameters */
          @samplePeriod    int      =  240  /* how long to sample, in seconds */
        , @sampleRate      char(8)  =  '00:00:15'  /* how frequently to sample, in seconds */
        , @displayResults  bit      =  0  /* display the results when done */
As
/*********************************************************************************
    Name:       dba_perfCounterMonitor_sp
 
    Author:     Michelle Ufford, http://sqlfool.com
 
    Purpose:    Monitors performance counters.  Uses the dba_perfCounterMonitorConfig
                table to manage which perf counters to monitor.  
 
                @samplePeriod - specifies how long the process will try to monitor
                                performance counters; in seconds.
 
                @sampleRate - how long inbetween samples; in seconds.
 
                The average values over sample period is then logged to the
                dba_perfCounterMonitor table.
 
    Notes:      There are 3 basic types of performance counter calculations:
 
                Value/Base: these calculations require 2 counters. The value 
                            counter (cntr_type = 537003264) has to be divided 
                            by the base counter (cntr_type = 1073939712).
 
                Per Second: these counters are store cumulative values; the
                            value must be compared at 2 different times to
                            calculate the difference (cntr_type = 537003264).
 
                Point In Time:  these counters show what the value of the
                                counter is at the current point-in-time 
                                (cntr_type = 65792).  No calculation is 
                                necessary to derive the value.
 
    Called by:  DBA
 
    Date        User    Description
    ----------------------------------------------------------------------------
    2009-09-04  MFU     Initial Release
*********************************************************************************
    Exec dbo.dba_perfCounterMonitor_sp
          @samplePeriod     = 60
        , @sampleRate       = '00:00:01'
        , @displayResults   = 1;
*********************************************************************************/
 
Set NoCount On;
Set XACT_Abort On;
Set Ansi_Padding On;
Set Ansi_Warnings On;
Set ArithAbort On;
Set Concat_Null_Yields_Null On;
Set Numeric_RoundAbort Off;
 
Begin
 
    /* Declare Variables */
    Declare @startTime datetime
        , @endTime datetime
        , @iteration int;
 
    Select @startTime = GetDate()
        , @iteration = 1;
 
    Declare @samples Table
    (
          iteration     int             Not Null
        , objectName    nvarchar(128)   Not Null
        , counterName   nvarchar(128)   Not Null
        , instanceName  nvarchar(128)   Not Null
        , cntr_value    float           Not Null
        , base_value    float           Null
        , cntr_type     bigint          Not Null
    );
 
    Begin Try
 
        /* Start a new transaction */
        Begin Transaction;
 
        /* Grab all of our counters */
        Insert Into @samples
        Select @iteration
            , RTrim(dopc.object_name)
            , RTrim(dopc.counter_name)
            , RTrim(dopc.instance_name)
            , RTrim(dopc.cntr_value)
            , (Select cntr_value From sys.dm_os_performance_counters As dopc1
                Where dopc1.object_name = pcml.objectName
                And dopc1.counter_name = pcml.counterName + ' base'
                And dopc1.instance_name = IsNull(pcml.instanceName, dopc.instance_name))
            , dopc.cntr_type
        From sys.dm_os_performance_counters As dopc
        Join dbo.dba_perfCounterMonitorConfig As pcml
            On dopc.object_name = pcml.objectName
                And dopc.counter_name = pcml.counterName
                And dopc.instance_name = IsNull(pcml.instanceName, dopc.instance_name);
 
        /* During our sample period, grab our counter values and store the results */
        While GetDate() < DateAdd(second, @samplePeriod, @startTime)
        Begin
 
            Set @iteration = @iteration + 1;
 
            Insert Into @samples
            Select @iteration
                , RTrim(dopc.object_name)
                , RTrim(dopc.counter_name)
                , RTrim(dopc.instance_name)
                , dopc.cntr_value
                , (Select cntr_value From sys.dm_os_performance_counters As dopc1
                    Where dopc1.object_name = pcml.objectName
                    And dopc1.counter_name = pcml.counterName + ' base'
                    And dopc1.instance_name = IsNull(pcml.instanceName, dopc.instance_name))
                , dopc.cntr_type
            From sys.dm_os_performance_counters As dopc
            Join dbo.dba_perfCounterMonitorConfig As pcml
                On dopc.object_name = pcml.objectName
                    And dopc.counter_name = pcml.counterName
                    And dopc.instance_name = IsNull(pcml.instanceName, dopc.instance_name);
 
            /* Wait for a small delay */
            WaitFor Delay @sampleRate;
 
        End;
 
        /* Grab our end time for calculations */
        Set @endTime = GetDate();
 
        /* Store the average of our point-in-time counters */
        Insert Into dbo.dba_perfCounterMonitor 
        (
			  captureDate
			, objectName
			, counterName
			, instanceName
			, value
			, valueType
		) 
		Select @startTime
		    , objectName
		    , counterName
		    , instanceName
		    , Avg(cntr_value)
		    , 'value'
		From @samples
		Where cntr_type = 65792
		Group By objectName
		    , counterName
		    , instanceName;
 
        /* Store the average of the value vs the base for cntr_type = 537003264 */
        Insert Into dbo.dba_perfCounterMonitor 
        (
			  captureDate
			, objectName
			, counterName
			, instanceName
			, value
			, valueType
		) 
		Select @startTime
		    , objectName
		    , counterName
		    , instanceName
		    , Avg(cntr_value)/Avg(IsNull(base_value, 1))
		    , 'percent'
		From @samples
		Where cntr_type = 537003264
		Group By objectName
		    , counterName
		    , instanceName;
 
        /* Compare the first and last values for our cumulative, per-second counters */
        Insert Into dbo.dba_perfCounterMonitor 
        (
			  captureDate
			, objectName
			, counterName
			, instanceName
			, value
			, valueType
		) 
		Select @startTime
		    , objectName
		    , counterName
		    , instanceName
		    , (Max(cntr_value) - Min(cntr_value)) / DateDiff(second, @startTime, @endTime)
		    , 'value'
		From @samples
		Where cntr_type = 272696576
        Group By objectName
		    , counterName
		    , instanceName;
 
        /* Should we display the results of our most recent execution?  */
        If @displayResults = 1
            Select captureDate
                , objectName
                , counterName
                , instanceName
                , value
                , valueType
            From dbo.dba_perfCounterMonitor With (NoLock)
            Where captureDate = Cast(@startTime As smalldatetime)
            Order By objectName
                , counterName
                , instanceName;
 
        /* If you have an open transaction, commit it */
        If @@TranCount > 0
            Commit Transaction;
 
    End Try
    Begin Catch
 
        /* Whoops, there was an error... rollback! */
        If @@TranCount > 0
            Rollback Transaction;
 
        /* Return an error message and log it */
        Execute dbo.dba_logError_sp;
 
    End Catch;
 
    Set NoCount Off;
    Return 0;
End
Go

Like I said, it’s not perfect, but it gets the job done. :)

Getting an error about dba_logError_sp? Take a look at my error handling proc.

Bored this summer?

Bored this summer? Do you like to help others? Do you have too much free time? Do you find yourself thinking, “Man, I really should spend more time indoors.” If you answered “yes” to all any of these questions, then have I got a proposition for you!

What could be more fun than getting second-degree burns at the waterpark, you ask? Volunteering on the PASS Performance SIG! That’s right, we’re looking for a few good women and men to join our ranks as content contributors. Specifically, we’re looking for people to write articles and/or host LiveMeeting events on performance-related topics. Not a performance expert? This can be a great way for you to learn more.

In case I scared you off in my opening paragraph, let me assure you that it really does not take that much time to be a volunteer. Just 3-4 hours a month can be a huge help. We’re also looking for contributors of all experience levels, so if you’re only comfortable writing intro-level articles, that’s definitely okay.

Oh, and while I’m begging for volunteers, we’re still looking for speakers for the SQL Saturday in East Iowa. :)

If you’re interested in either, then please send me an e-mail at michelle at sqlfool dot com for more information.

Index Defrag Script, v3.0

UPDATE: This script has been significantly updated. Please view the latest version at http://sqlfool.com/2011/06/index-defrag-script-v4-1/.

I’ve just completed the latest version of my index defrag script! Here’s a brief list of the updates:

  • Fixed a bug with the LOB logic. In the previous version, after a LOB was encountered, all subsequent indexes would be reorganized.
  • Added support for stat rebuilds after the index defrag is complete (@rebuildStats)
  • Added an exclusion list table (dba_indexDefragExclusion) to support index scheduling
  • Modified logging to show which defrags are “in progress”; added columns to dba_indexDefragLog
  • Added support for the defrag of the model and msdb databases
  • Added @scanMode as a configurable parameter

So what can this index defrag script do? Well, for starters, you can:

  • Schedule it to run with the default settings; it works “right out of the box” with no additional configuration necessary
  • Run this one script from a centralized database for all databases on a server
  • Run this script for a specific database or table
  • Configure custom threshold limits and the point at which a rebuild should be performed (instead of a reorganize)
  • Defrag individual partitions
  • Log its actions and the duration of the defrag
  • Run in “commands only” mode (@executeSQL = 0, @printCommands = 1)
  • Customize performance parameters such as @maxDopRestriction and @defragDelay to minimize impact on the server
  • Schedule specific indexes to only be defragged on weekends, or every other day

To use this last option, you need to add a record to the dba_indexDefragExclusion table. I think all of the columns are pretty self-explanatory except the [exclusionMask] column. The way this works is each day of the week is assigned a value:
1=Sunday, 2=Monday, 4=Tuesday, 8=Wednesday, 16=Thursday, 32=Friday, 64=Saturday

Take a SUM of the values for the days that you want excluded. So if you want an index to only be defragged on weekends, you would add up Monday through Friday (2+4+8+16+32) and use a value of 62 for the exclusionMask column. For a little more information on how this works, check out my blog post on Bitwise Operations.

Please note: if you don’t insert any records into the dba_indexDefragExclusion table, by default all indexes will be defragged every run-time if they exceed the specified thresholds. This is normal behavior and may be perfectly fine in your environment. However, if the dba_indexDefragExclusion table does not exist, the script will fail.

I try to document each parameter within the code, so check the comments section in the script for a full list of parameters and what they do.

Special thanks to everyone who helped beta test this script! :)

Without further ado, the script:

/* Drop Table Scripts:
Drop Table dbo.dba_indexDefragLog;
Drop Table dbo.dba_indexDefragExclusion;
*/
If Not Exists(Select [object_id] From sys.tables 
    Where [name] In (N'dba_indexDefragLog', 'dba_indexDefragExclusion'))
Begin
 
    Create Table dbo.dba_indexDefragLog
    (
          indexDefrag_id    int identity(1,1)   Not Null
        , databaseID        int                 Not Null
        , databaseName      nvarchar(128)       Not Null
        , objectID          int                 Not Null
        , objectName        nvarchar(128)       Not Null
        , indexID           int                 Not Null
        , indexName         nvarchar(128)       Not Null
        , partitionNumber   smallint            Not Null
        , fragmentation     float               Not Null
        , page_count        int                 Not Null
        , dateTimeStart     datetime            Not Null
        , dateTimeEnd       datetime            Null
        , durationSeconds   int                 Null
 
        Constraint PK_indexDefragLog 
            Primary Key Clustered (indexDefrag_id)
    );
 
    Print 'dba_indexDefragLog Table Created';
 
    Create Table dbo.dba_indexDefragExclusion
    (
          databaseID        int                 Not Null
        , databaseName      nvarchar(128)       Not Null
        , objectID          int                 Not Null
        , objectName        nvarchar(128)       Not Null
        , indexID           int                 Not Null
        , indexName         nvarchar(128)       Not Null
        , exclusionMask     int                 Not Null
            /* 1=Sunday, 2=Monday, 4=Tuesday, 8=Wednesday, 16=Thursday, 32=Friday, 64=Saturday */
 
        Constraint PK_indexDefragExclusion 
            Primary Key Clustered (databaseID, objectID, indexID)
    );
 
    Print 'dba_indexDefragExclusion Table Created';
 
End
Else
    RaisError('One or more tables already exist.  Please drop or rename before proceeding.', 16, 0);
 
If ObjectProperty(Object_ID('dbo.dba_indexDefrag_sp'), N'IsProcedure') = 1
Begin
    Drop Procedure dbo.dba_indexDefrag_sp;
    Print 'Procedure dba_indexDefrag_sp dropped';
End;
Go
 
 
Create Procedure dbo.dba_indexDefrag_sp
 
    /* Declare Parameters */
      @minFragmentation     float           = 5.0  
        /* in percent, will not defrag if fragmentation less than specified */
    , @rebuildThreshold     float           = 30.0  
        /* in percent, greater than @rebuildThreshold will result in rebuild instead of reorg */
    , @executeSQL           bit             = 1     
        /* 1 = execute; 0 = print command only */
    , @database             varchar(128)    = Null
        /* Option to specify a database name; null will return all */
    , @tableName            varchar(4000)   = Null  -- databaseName.schema.tableName
        /* Option to specify a table name; null will return all */
    , @scanMode             varchar(10)     = N'LIMITED'
        /* Options are LIMITED, SAMPLED, and DETAILED */
    , @onlineRebuild        bit             = 1     
        /* 1 = online rebuild; 0 = offline rebuild; only in Enterprise */
    , @maxDopRestriction    tinyint         = Null
        /* Option to restrict the number of processors for the operation; only in Enterprise */
    , @printCommands        bit             = 0     
        /* 1 = print commands; 0 = do not print commands */
    , @printFragmentation   bit             = 0
        /* 1 = print fragmentation prior to defrag; 
           0 = do not print */
    , @defragDelay          char(8)         = '00:00:05'
        /* time to wait between defrag commands */
    , @debugMode            bit             = 0
        /* display some useful comments to help determine if/where issues occur */
    , @rebuildStats         bit             = 1
        /* option to rebuild stats after completed index defrags */
 
As
/*********************************************************************************
    Name:       dba_indexDefrag_sp
 
    Author:     Michelle Ufford, http://sqlfool.com
 
    Purpose:    Defrags all indexes for the current database
 
    Notes:
 
    CAUTION: TRANSACTION LOG SIZE SHOULD BE MONITORED CLOSELY WHEN DEFRAGMENTING.
 
      @minFragmentation     defaulted to 10%, will not defrag if fragmentation 
                            is less than that
 
      @rebuildThreshold     defaulted to 30% as recommended by Microsoft in BOL;
                            greater than 30% will result in rebuild instead
 
      @executeSQL           1 = execute the SQL generated by this proc; 
                            0 = print command only
 
      @database             Optional, specify specific database name to defrag;
                            If not specified, all non-system databases will
                            be defragged.
 
      @tableName            Specify if you only want to defrag indexes for a 
                            specific table, format = databaseName.schema.tableName;
                            if not specified, all tables will be defragged.
 
      @scanMode             Specifies which scan mode to use to determine
                            fragmentation levels.  Options are:
                            LIMITED - scans the parent level; quickest mode,
                                      recommended for most cases.
                            SAMPLED - samples 1% of all data pages; if less than
                                      10k pages, performs a DETAILED scan.
                            DETAILED - scans all data pages.  Use great care with
                                       this mode, as it can cause performance issues.
 
      @onlineRebuild        1 = online rebuild; 
                            0 = offline rebuild
 
      @maxDopRestriction    Option to specify a processor limit for index rebuilds
 
      @printCommands        1 = print commands to screen; 
                            0 = do not print commands
 
      @printFragmentation   1 = print fragmentation to screen;
                            0 = do not print fragmentation
 
      @defragDelay          Time to wait between defrag commands; gives the
                            server a little time to catch up 
 
      @debugMode            1 = display debug comments; helps with troubleshooting
                            0 = do not display debug comments
 
      @rebuildStats         Affects only statistics that need to be rebuilt
                            1 = rebuild stats
                            0 = do not rebuild stats
 
    Called by:  SQL Agent Job or DBA
 
    Date        Initials	Version Description
    ----------------------------------------------------------------------------
    2007-12-18  MFU         1.0     Initial Release
    2008-10-17  MFU         1.1     Added @defragDelay, CIX_temp_indexDefragList
    2008-11-17  MFU         1.2     Added page_count to log table
                                    , added @printFragmentation option
    2009-03-17  MFU         2.0     Provided support for centralized execution
                                    , consolidated Enterprise & Standard versions
                                    , added @debugMode, @maxDopRestriction
                                    , modified LOB and partition logic  
    2009-06-18  MFU         3.0     Fixed bug in LOB logic, added @scanMode option
                                    , added support for stat rebuilds (@rebuildStats)
                                    , support model and msdb defrag
                                    , added columns to the dba_indexDefragLog table
                                    , modified logging to show "in progress" defrags
                                    , added defrag exclusion list (scheduling)
*********************************************************************************
    Exec dbo.dba_indexDefrag_sp
          @executeSQL           = 0
        , @printCommands        = 1
        , @debugMode            = 1
        , @printFragmentation   = 1;
*********************************************************************************/																
 
Set NoCount On;
Set XACT_Abort On;
Set Ansi_Padding On;
Set Ansi_Warnings On;
Set ArithAbort On;
Set Concat_Null_Yields_Null On;
Set Numeric_RoundAbort Off;
Set Quoted_Identifier On;
 
Begin
 
    If @debugMode = 1 RaisError('Undusting the cogs and starting up...', 0, 42) With NoWait;
 
    /* Declare our variables */
    Declare   @objectID             int
            , @databaseID           int
            , @databaseName         nvarchar(128)
            , @indexID              int
            , @partitionCount       bigint
            , @schemaName           nvarchar(128)
            , @objectName           nvarchar(128)
            , @indexName            nvarchar(128)
            , @partitionNumber      smallint
            , @fragmentation        float
            , @pageCount            int
            , @sqlCommand           nvarchar(4000)
            , @rebuildCommand       nvarchar(200)
            , @dateTimeStart        datetime
            , @dateTimeEnd          datetime
            , @containsLOB          bit
            , @editionCheck         bit
            , @debugMessage         varchar(128)
            , @updateSQL            nvarchar(4000)
            , @partitionSQL         nvarchar(4000)
            , @partitionSQL_Param   nvarchar(1000)
            , @LOB_SQL              nvarchar(4000)
            , @LOB_SQL_Param        nvarchar(1000)
            , @rebuildStatsID       int
            , @rebuildStatsSQL      nvarchar(1000)
            , @indexDefrag_id       int;
 
    /* Create our temporary tables */
    Create Table #indexDefragList
    (
          databaseID        int
        , databaseName      nvarchar(128)
        , objectID          int
        , indexID           int
        , partitionNumber   smallint
        , fragmentation     float
        , page_count        int
        , defragStatus      bit
        , schemaName        nvarchar(128)   Null
        , objectName        nvarchar(128)   Null
        , indexName         nvarchar(128)   Null
    );
 
    Create Table #databaseList
    (
          databaseID        int
        , databaseName      varchar(128)
        , scanStatus        bit
        , statsStatus       bit
    );
 
    Create Table #processor 
    (
          [index]           int
        , Name              varchar(128)
        , Internal_Value    int
        , Character_Value   int
    );
 
    If @debugMode = 1 RaisError('Beginning validation...', 0, 42) With NoWait;
 
    /* Just a little validation... */
    If @minFragmentation Not Between 0.00 And 100.0
        Set @minFragmentation = 10.0;
 
    If @rebuildThreshold Not Between 0.00 And 100.0
        Set @rebuildThreshold = 30.0;
 
    If @defragDelay Not Like '00:[0-5][0-9]:[0-5][0-9]'
        Set @defragDelay = '00:00:05';
 
    If @scanMode Not In ('LIMITED', 'SAMPLED', 'DETAILED')
        Set @scanMode = 'LIMITED';
 
    /* Make sure we're not exceeding the number of processors we have available */
    Insert Into #processor
    Execute xp_msver 'ProcessorCount';
 
    If @maxDopRestriction Is Not Null And @maxDopRestriction > (Select Internal_Value From #processor)
        Select @maxDopRestriction = Internal_Value
        From #processor;
 
    /* Check our server version; 1804890536 = Enterprise, 610778273 = Enterprise Evaluation, -2117995310 = Developer */
    If (Select ServerProperty('EditionID')) In (1804890536, 610778273, -2117995310) 
        Set @editionCheck = 1 -- supports online rebuilds
    Else
        Set @editionCheck = 0; -- does not support online rebuilds
 
    If @debugMode = 1 RaisError('Grabbing a list of our databases...', 0, 42) With NoWait;
 
    /* Retrieve the list of databases to investigate */
    Insert Into #databaseList
    Select database_id
        , name
        , 0 -- not scanned yet for fragmentation
        , 0 -- statistics not yet updated
    From sys.databases
    Where name = IsNull(@database, name)
        And [name] Not In ('master', 'tempdb')-- exclude system databases
        And [state] = 0; -- state must be ONLINE
 
    If @debugMode = 1 RaisError('Looping through our list of databases and checking for fragmentation...', 0, 42) With NoWait;
 
    /* Loop through our list of databases */
    While (Select Count(*) From #databaseList Where scanStatus = 0) > 0
    Begin
 
        Select Top 1 @databaseID = databaseID
        From #databaseList
        Where scanStatus = 0;
 
        Select @debugMessage = '  working on ' + DB_Name(@databaseID) + '...';
 
        If @debugMode = 1
            RaisError(@debugMessage, 0, 42) With NoWait;
 
       /* Determine which indexes to defrag using our user-defined parameters */
        Insert Into #indexDefragList
        Select
              database_id As databaseID
            , QuoteName(DB_Name(database_id)) As 'databaseName'
            , [object_id] As objectID
            , index_id As indexID
            , partition_number As partitionNumber
            , avg_fragmentation_in_percent As fragmentation
            , page_count 
            , 0 As 'defragStatus' /* 0 = unprocessed, 1 = processed */
            , Null As 'schemaName'
            , Null As 'objectName'
            , Null As 'indexName'
        From sys.dm_db_index_physical_stats (@databaseID, Object_Id(@tableName), Null , Null, @scanMode)
        Where avg_fragmentation_in_percent >= @minFragmentation 
            And index_id > 0 -- ignore heaps
            And page_count > 8 -- ignore objects with less than 1 extent
            And index_level = 0 -- leaf-level nodes only, supports @scanMode
        Option (MaxDop 2);
 
        /* Keep track of which databases have already been scanned */
        Update #databaseList
        Set scanStatus = 1
        Where databaseID = @databaseID;
 
    End
 
    Create Clustered Index CIX_temp_indexDefragList
        On #indexDefragList(databaseID, objectID, indexID, partitionNumber);
 
    /* Delete any indexes from our to-do that are also in our exclusion list for today */
    Delete idl
    From #indexDefragList As idl
    Join dbo.dba_indexDefragExclusion As ide
        On idl.databaseID = ide.databaseID
        And idl.objectID = ide.objectID
        And idl.indexID = ide.indexID
    Where exclusionMask & Power(2, DatePart(weekday, GetDate())-1) > 0;
 
    Select @debugMessage = 'Looping through our list... there''s ' + Cast(Count(*) As varchar(10)) + ' indexes to defrag!'
    From #indexDefragList;
 
    If @debugMode = 1 RaisError(@debugMessage, 0, 42) With NoWait;
 
    /* Begin our loop for defragging */
    While (Select Count(*) From #indexDefragList Where defragStatus = 0) > 0
    Begin
 
        If @debugMode = 1 RaisError('  Picking an index to beat into shape...', 0, 42) With NoWait;
 
        /* Grab the most fragmented index first to defrag */
        Select Top 1 
              @objectID         = objectID
            , @indexID          = indexID
            , @databaseID       = databaseID
            , @databaseName     = databaseName
            , @fragmentation    = fragmentation
            , @partitionNumber  = partitionNumber
            , @pageCount        = page_count
        From #indexDefragList
        Where defragStatus = 0
        Order By fragmentation Desc;
 
        If @debugMode = 1 RaisError('  Looking up the specifics for our index...', 0, 42) With NoWait;
 
        /* Look up index information */
        Select @updateSQL = N'Update idl
            Set schemaName = QuoteName(s.name)
                , objectName = QuoteName(o.name)
                , indexName = QuoteName(i.name)
            From #indexDefragList As idl
            Inner Join ' + @databaseName + '.sys.objects As o
                On idl.objectID = o.object_id
            Inner Join ' + @databaseName + '.sys.indexes As i
                On o.object_id = i.object_id
            Inner Join ' + @databaseName + '.sys.schemas As s
                On o.schema_id = s.schema_id
            Where o.object_id = ' + Cast(@objectID As varchar(10)) + '
                And i.index_id = ' + Cast(@indexID As varchar(10)) + '
                And i.type > 0
                And idl.databaseID = ' + Cast(@databaseID As varchar(10));
 
        Execute sp_executeSQL @updateSQL;
 
        /* Grab our object names */
        Select @objectName  = objectName
            , @schemaName   = schemaName
            , @indexName    = indexName
        From #indexDefragList
        Where objectID = @objectID
            And indexID = @indexID
            And databaseID = @databaseID;
 
        If @debugMode = 1 RaisError('  Grabbing the partition count...', 0, 42) With NoWait;
 
        /* Determine if the index is partitioned */
        Select @partitionSQL = 'Select @partitionCount_OUT = Count(*)
                                    From ' + @databaseName + '.sys.partitions
                                    Where object_id = ' + Cast(@objectID As varchar(10)) + '
                                        And index_id = ' + Cast(@indexID As varchar(10)) + ';'
            , @partitionSQL_Param = '@partitionCount_OUT int OutPut';
 
        Execute sp_executeSQL @partitionSQL, @partitionSQL_Param, @partitionCount_OUT = @partitionCount OutPut;
 
        If @debugMode = 1 RaisError('  Seeing if there''s any LOBs to be handled...', 0, 42) With NoWait;
 
        /* Determine if the table contains LOBs */
        Select @LOB_SQL = ' Select @containsLOB_OUT = Count(*)
                            From ' + @databaseName + '.sys.columns With (NoLock) 
                            Where [object_id] = ' + Cast(@objectID As varchar(10)) + '
                                And (system_type_id In (34, 35, 99)
                                        Or max_length = -1);'
                            /*  system_type_id --> 34 = image, 35 = text, 99 = ntext
                                max_length = -1 --> varbinary(max), varchar(max), nvarchar(max), xml */
                , @LOB_SQL_Param = '@containsLOB_OUT int OutPut';
 
        Execute sp_executeSQL @LOB_SQL, @LOB_SQL_Param, @containsLOB_OUT = @containsLOB OutPut;
 
        If @debugMode = 1 RaisError('  Building our SQL statements...', 0, 42) With NoWait;
 
        /* If there's not a lot of fragmentation, or if we have a LOB, we should reorganize */
        If @fragmentation < @rebuildThreshold Or @containsLOB >= 1 Or @partitionCount > 1
        Begin
 
            Set @sqlCommand = N'Alter Index ' + @indexName + N' On ' + @databaseName + N'.' 
                                + @schemaName + N'.' + @objectName + N' ReOrganize';
 
            /* If our index is partitioned, we should always reorganize */
            If @partitionCount > 1
                Set @sqlCommand = @sqlCommand + N' Partition = ' 
                                + Cast(@partitionNumber As nvarchar(10));
 
        End;
 
        /* If the index is heavily fragmented and doesn't contain any partitions or LOB's, rebuild it */
        If @fragmentation >= @rebuildThreshold And IsNull(@containsLOB, 0) != 1 And @partitionCount <= 1
        Begin
 
            /* Set online rebuild options; requires Enterprise Edition */
            If @onlineRebuild = 1 And @editionCheck = 1 
                Set @rebuildCommand = N' Rebuild With (Online = On';
            Else
                Set @rebuildCommand = N' Rebuild With (Online = Off';
 
            /* Set processor restriction options; requires Enterprise Edition */
            If @maxDopRestriction Is Not Null And @editionCheck = 1
                Set @rebuildCommand = @rebuildCommand + N', MaxDop = ' + Cast(@maxDopRestriction As varchar(2)) + N')';
            Else
                Set @rebuildCommand = @rebuildCommand + N')';
 
            Set @sqlCommand = N'Alter Index ' + @indexName + N' On ' + @databaseName + N'.'
                            + @schemaName + N'.' + @objectName + @rebuildCommand;
 
        End;
 
        /* Are we executing the SQL?  If so, do it */
        If @executeSQL = 1
        Begin
 
            If @debugMode = 1 RaisError('  Executing SQL statements...', 0, 42) With NoWait;
 
            /* Grab the time for logging purposes */
            Set @dateTimeStart  = GetDate();
 
            /* Log our actions */
            Insert Into dbo.dba_indexDefragLog
            (
                  databaseID
                , databaseName
                , objectID
                , objectName
                , indexID
                , indexName
                , partitionNumber
                , fragmentation
                , page_count
                , dateTimeStart
            )
            Select
                  @databaseID
                , @databaseName
                , @objectID
                , @objectName
                , @indexID
                , @indexName
                , @partitionNumber
                , @fragmentation
                , @pageCount
                , @dateTimeStart;
 
            Set @indexDefrag_id = Scope_Identity();
 
            /* Execute our defrag! */
            Execute sp_executeSQL @sqlCommand;
            Set @dateTimeEnd  = GetDate();
 
            /* Update our log with our completion time */
            Update dbo.dba_indexDefragLog
            Set dateTimeEnd = @dateTimeEnd
                , durationSeconds = DateDiff(second, @dateTimeStart, @dateTimeEnd)
            Where indexDefrag_id = @indexDefrag_id;
 
            /* Just a little breather for the server */
            WaitFor Delay @defragDelay;
 
            /* Print if specified to do so */
            If @printCommands = 1
                Print N'Executed: ' + @sqlCommand;
        End
        Else
        /* Looks like we're not executing, just printing the commands */
        Begin
            If @debugMode = 1 RaisError('  Printing SQL statements...', 0, 42) With NoWait;
 
            If @printCommands = 1 Print IsNull(@sqlCommand, 'error!');
        End
 
        If @debugMode = 1 RaisError('  Updating our index defrag status...', 0, 42) With NoWait;
 
        /* Update our index defrag list so we know we've finished with that index */
        Update #indexDefragList
        Set defragStatus = 1
        Where databaseID       = @databaseID
          And objectID         = @objectID
          And indexID          = @indexID
          And partitionNumber  = @partitionNumber;
 
    End
 
    /* Do we want to output our fragmentation results? */
    If @printFragmentation = 1
    Begin
 
        If @debugMode = 1 RaisError('  Displaying fragmentation results...', 0, 42) With NoWait;
 
        Select databaseID
            , databaseName
            , objectID
            , objectName
            , indexID
            , indexName
            , fragmentation
            , page_count
        From #indexDefragList;
 
    End;
 
    /* Do we want to rebuild stats? */
    If @rebuildStats = 1
    Begin
 
        While Exists(Select Top 1 * From #databaseList Where statsStatus = 0)
        Begin
 
            /* Build our SQL statement to update stats */
            Select Top 1 @rebuildStatsSQL = 'Use [' + databaseName + ']; ' + 
                                            'Execute sp_updatestats;'
                    , @rebuildStatsID = databaseID
            From #databaseList
            Where statsStatus = 0;
 
            Set @debugMessage = 'Rebuilding Statistics: ' + @rebuildStatsSQL;
 
            If @debugMode = 1 RaisError(@debugMessage, 0, 42) With NoWait;
 
            /* Execute our stats update! */
            Execute sp_executesql @rebuildStatsSQL;
 
            /* Keep track of which databases have been updated */
            Update #databaseList 
            Set statsStatus = 1
            Where databaseID = @rebuildStatsID;
 
        End;
    End;
 
    /* When everything is said and done, make sure to get rid of our temp table */
    Drop Table #indexDefragList;
    Drop Table #databaseList;
    Drop Table #processor;
 
    If @debugMode = 1 RaisError('DONE!  Thank you for taking care of your indexes!  :)', 0, 42) With NoWait;
 
    Set NoCount Off;
    Return 0
End
Go
 
Set Quoted_Identifier Off 
Set ANSI_Nulls On
Go

PASS Summit 2009

My abstract for PASS Summit 2009 was accepted! Woot! You may not be able to see it from where you’re sitting, but I’m doing the happy dance. :)

In case you missed my original post on my abstract submission, here’s what I’ll be presenting on:

Super Bowl, Super Load – A Look at Performance Tuning for VLDB’s

Few DBA’s have the opportunity to experience a real-life load test in their production environment. Michelle Ufford works for GoDaddy.com, a company that has experienced phenomenal success with its Super Bowl ads. These ads are designed to drive traffic to the company’s websites, which puts the database servers under high load. In her presentation, Michelle will explore the performance tuning techniques that have resulted in an 80% reduction in server response times and allowed her VLDB’s to reach rates of 27k transactions per second. Topics will include vertical and horizontal partitioning, bulk operations, table design, and indexing.

Do you read my blog? Do I read yours? Do we exchange weird messages on Twitter? Do you have free cookies? If you’re going to to the PASS Summit and answered “yes” to any of these questions, then I want to meet you! Make sure to say “hi” to me in Seattle. :)

Random Number Generator in T-SQL

Ever need to generate a random number in T-SQL? I have, on a couple of different occasions. I’m pretty sure that there’s several different ways of doing this in T-SQL, but here’s what I use:

Declare @maxRandomValue tinyint = 100
	, @minRandomValue tinyint = 0;
 
Select Cast(((@maxRandomValue + 1) - @minRandomValue) 
	* Rand() + @minRandomValue As tinyint) As 'randomNumber';

This approach uses the RAND() function to generate a random seed; it also ensures that the value returned is between the specified min and max value. I’ve been using this method in one stored procedure that’s called a couple of hundred times per second, and it seems to perform pretty well.

What method do YOU use to generate a random number? Is it faster than this method?

Overhead in Non-Unique Clustered Indexes

I’ve received a couple of questions regarding my article, Performance Considerations of Data Types, and the overhead associated with non-unique clustered indexes. I started to respond via e-mail, but my response was so long I decided to turn it into a blog post instead. ;)

I should start by clarifying that non-unique clustered indexes do not necessarily consume more space and overhead; it depends on the data stored. If you have duplicate clustered key values, the first instance of the value will be handled as though it were unique. Any subsequent values, however, will incur overhead to manage the uniquifier that SQL Server adds to maintain row uniqueness. This same overhead is also incurred in non-clustered indexes, too, adding to the overall expense of this approach.

I think it helps to actually look at the data, so let’s walk through a few different common scenarios. We’ll create a table with a unique clustered index, a table with a non-unique clustered index but no duplicates, and a table with duplicate key values.

Also, a little warning that I started to write this in SQL Server 2008, and since I’m on a 2008 kick, I decided to leave it that way. You can modify this pretty easily to work in 2005, if necessary.

Use sandbox;
Go
 
/* Unique, clustered index, no duplicate values */
Create Table dbo.uniqueClustered
(
      myDate    date        Not Null
    , myNumber  int         Not Null
    , myColumn  char(995)   Not Null
);
 
Create Unique Clustered Index CIX_uniqueClustered
    On dbo.uniqueClustered(myDate);
 
 
/* Non-unique clustered index, but no duplicate values */
Create Table dbo.nonUniqueNoDups
(
      myDate    date        Not Null
    , myNumber  int         Not Null
    , myColumn  char(995)   Not Null
);
 
Create Clustered Index CIX_nonUniqueNoDups
    On dbo.nonUniqueNoDups(myDate);
 
 
/* Non-unique clustered index, duplicate values */
Create Table dbo.nonUniqueDuplicates
(
      myDate    date        Not Null
    , myNumber  int         Not Null
    , myColumn  char(995)   Not Null
);
 
Create Clustered Index CIX_nonUniqueDuplicates
    On dbo.nonUniqueDuplicates(myDate);

I’m going to use the date data type in 2008 for my clustered index key. To ensure uniqueness for the first two tables, I’ll iterate through a few years’ worth of dates. This is typical of what you may see in a data mart, where you’d have one record with an aggregation of each day’s data. For the table with duplicate values, I’m going to insert the same date for each row.

/* Populate some test data */
Set NoCount On;
Declare @myDate date = '1990-01-01'
    , @myNumber int = 1;
 
While @myDate < '2010-01-01'
Begin
 
    Insert Into dbo.uniqueClustered
    Select @myDate, @myNumber, 'data';
 
    Insert Into dbo.nonUniqueNoDups
    Select @myDate, @myNumber, 'data';
 
    Insert Into dbo.nonUniqueDuplicates
    Select '2009-01-01', @myNumber, 'data';
 
    Select @myDate = DateAdd(day, 1, @myDate)
        , @myNumber += 1;
 
End;

After running the above script, each table should have 7,305 records. This is obviously pretty small for a table, but it’ll serve our purposes. Now let’s take a look at the size of our tables:

/* Look at the details of our indexes */
 
/* Unique, clustered index, no duplicate values */
Select 'unique' As 'type', page_count, avg_page_space_used_in_percent, record_count
    , min_record_size_in_bytes, max_record_size_in_bytes
From sys.dm_db_index_physical_stats(DB_ID(), Object_ID(N'uniqueClustered'), Null, Null, N'Detailed') 
Where index_level = 0
Union All
/* Non-unique clustered index, but no duplicate values */
Select 'non-unique, no dups', page_count, avg_page_space_used_in_percent, record_count
    , min_record_size_in_bytes, max_record_size_in_bytes
From sys.dm_db_index_physical_stats(DB_ID(), Object_ID(N'nonUniqueNoDups'), Null, Null, N'Detailed') 
Where index_level = 0
Union All
/* Non-unique clustered index, duplicate values */
Select 'duplicates', page_count, avg_page_space_used_in_percent, record_count
    , min_record_size_in_bytes, max_record_size_in_bytes
From sys.dm_db_index_physical_stats(DB_ID(), Object_ID(N'nonUniqueDuplicates'), Null, Null, N'Detailed') 
Where index_level = 0;

Here’s the results:

type                page_count           avg_page_space_used_in_percent record_count         min_record_size_in_bytes max_record_size_in_bytes
------------------- -------------------- ------------------------------ -------------------- ------------------------ ------------------------
unique              914                  99.8055102545095               7305                 1009                     1009
non-unique, no dups 914                  99.8055102545095               7305                 1009                     1009
duplicates          1044                 88.066036570299                7305                 1009                     1017

I want to point out a couple of things. First, there is no difference in the number of pages between the non-unique clustered index with no duplicates ([nonUniqueNoDups]) and the unique clustered index ([uniqueClustered]). The table with duplicate clustered key values, however, requires 14% more pages to store the same amount of data. Secondly, the [max_record_size_in_bytes] of the [nonUniqueDuplicates] table is 8 bytes more than that of the other two. We’ll discuss why in a minute.

Now let’s take a look at the actual data pages. For this, I’m going to use my page internals proc.

EXECUTE dbo.dba_viewPageData_sp
      @databaseName = 'sandbox'
    , @tableName    = 'sandbox.dbo.uniqueClustered'
    , @indexName    = 'CIX_uniqueClustered';

I’m not going to post the entire results here, but I want to draw your attention to “m_slotCnt = 8″, which is near the top of the page. That means 8 records are stored on this page. Also, when you look near the end of the first record (Slot 0), you should see the following results:

Slot 0 Offset 0x60 Length 1009
Record Type = PRIMARY_RECORD         Record Attributes =  NULL_BITMAP     Record Size = 1009
Memory Dump @0x00A9C060
00000000:   1000ee03 c3150b01 00000064 61746120 †..î.Ã......data 
[...]
000003F0:   00†††††††††††††††††††††††††††††††††††.  
 
Slot 0 Column 1 Offset 0x4 Length 3 Length (physical) 3
myDate = 1990-01-01                  
 
Slot 0 Column 2 Offset 0x7 Length 4 Length (physical) 4
myNumber = 1                         
 
Slot 0 Column 3 Offset 0xb Length 995 Length (physical) 995
myColumn = data

Now let’s look at the table that has a non-unique clustered index but no duplicates:

Execute dbo.dba_viewPageData_sp
      @databaseName = 'sandbox'
    , @tableName    = 'sandbox.dbo.nonUniqueNoDups'
    , @indexName    = 'CIX_nonUniqueNoDups';

The m_slotCnt count is also 8 for this page. This time, let’s glance at the first and second records (Slot 0 and 1 respectively):

Slot 0 Offset 0x60 Length 1009
Record Type = PRIMARY_RECORD         Record Attributes =  NULL_BITMAP     Record Size = 1009
Memory Dump @0x62FDC060
00000000:   1000ee03 c3150b01 00000064 61746120 †..î.Ã......data    
[...]
000003F0:   00†††††††††††††††††††††††††††††††††††.                        
 
Slot 0 Column 0 Offset 0x0 Length 4 Length (physical) 0
UNIQUIFIER = 0                       
 
Slot 0 Column 1 Offset 0x4 Length 3 Length (physical) 3
myDate = 1990-01-01                  
 
Slot 0 Column 2 Offset 0x7 Length 4 Length (physical) 4
myNumber = 1                         
 
Slot 0 Column 3 Offset 0xb Length 995 Length (physical) 995
myColumn = data
 
 
Slot 1 Offset 0x451 Length 1009
Record Type = PRIMARY_RECORD         Record Attributes =  NULL_BITMAP     Record Size = 1009
Memory Dump @0x62FDC451
 
00000000:   1000ee03 c4150b02 00000064 61746120 †..î.Ä......data          
[...]
000003F0:   00†††††††††††††††††††††††††††††††††††.                        
 
Slot 1 Column 0 Offset 0x0 Length 4 Length (physical) 0
UNIQUIFIER = 0                       
 
Slot 1 Column 1 Offset 0x4 Length 3 Length (physical) 3
myDate = 1990-01-02                  
 
Slot 1 Column 2 Offset 0x7 Length 4 Length (physical) 4
myNumber = 2                         
 
Slot 1 Column 3 Offset 0xb Length 995 Length (physical) 995
myColumn = data

We now see a new addition to the row, “UNIQUIFIER = 0.” This is SQL Server’s way of managing row uniqueness internally. You’ll notice that, because the clustered key values are unique, the UNIQUIFIER is set to 0 and the row size is still 1009; for all intents and purposes, the UNIQUIFIER is not consuming any space.

Update: The DBCC God himself, Paul Randal, explained that non-dupes actually have a NULL UNIQUIFIER, which DBCC PAGE displays as a 0. Thanks for explaining, Paul! I wondered about that but chalked it up to SQL voodoo.

Now let’s look at our final case, a non-unique clustered index with duplicate key values:

Execute dbo.dba_viewPageData_sp
      @databaseName = 'sandbox'
    , @tableName    = 'sandbox.dbo.nonUniqueDuplicates'
    , @indexName    = 'CIX_nonUniqueDuplicates';

Here’s where things get interesting. The m_slotCnt value is now 7, which means we’re now storing 1 record less per page. Let’s look at the details:

Slot 0 Offset 0x60 Length 1009
Record Type = PRIMARY_RECORD         Record Attributes =  NULL_BITMAP     Record Size = 1009
Memory Dump @0x00A9C060
00000000:   1000ee03 df300b01 00000064 61746120 †..î.ß0.....data  
[...]
000003F0:   00†††††††††††††††††††††††††††††††††††.                        
 
Slot 0 Column 0 Offset 0x0 Length 4 Length (physical) 0
UNIQUIFIER = 0                       
 
Slot 0 Column 1 Offset 0x4 Length 3 Length (physical) 3
myDate = 2009-01-01                  
 
Slot 0 Column 2 Offset 0x7 Length 4 Length (physical) 4
myNumber = 1                         
 
Slot 0 Column 3 Offset 0xb Length 995 Length (physical) 995
myColumn = data
 
 
Slot 1 Offset 0x451 Length 1017
 
Record Type = PRIMARY_RECORD         Record Attributes =  NULL_BITMAP VARIABLE_COLUMNS
Record Size = 1017                   
Memory Dump @0x00A9C451
00000000:   3000ee03 df300b02 00000064 61746120 †0.î.ß0.....data  
[...]
000003F0:   000100f9 03010000 00†††††††††††††††††...ù.....                
 
Slot 1 Column 0 Offset 0x3f5 Length 4 Length (physical) 4
UNIQUIFIER = 1                       
 
Slot 1 Column 1 Offset 0x4 Length 3 Length (physical) 3
myDate = 2009-01-01                  
 
Slot 1 Column 2 Offset 0x7 Length 4 Length (physical) 4
myNumber = 2                         
 
Slot 1 Column 3 Offset 0xb Length 995 Length (physical) 995
myColumn = data

The first record, Slot 0, looks exactly the same as in the previous table; the UNIQUIFIER is 0 and the row size is 1009. The second record (Slot 1), however, now has a UNIQUIFIER value of 1 and the row size is 1017. If you notice, the “Record Attributes” of Slot 1 are also different, with the addition of “VARIABLE_COLUMNS.” This is because the UNIQUIFIER is stored as a variable column. The extra 8 bytes of overhead break down to 4 bytes to store the UNIQUIFIER, 2 bytes to store the variable column offset, and 2 bytes to store the variable count. The tables we created used all fixed-length columns; you may notice some minor overhead differences if your table already contains variable columns.

To summarize, there is indeed a difference in the page structure between a unique clustered index and a non-unique clustered index; however, there’s only a possible performance and space impact when storing duplicate clustered key values. So there you go, more detail than you ever wanted to know about clustered indexes and uniqueness!

Performance Considerations of Data Types

I’ve just finished my first real content for the PASS Performance SIG. I decided to write on “Performance Considerations of Data Types,” as I think this is one of the easiest and most overlooked topics in performance tuning. Here’s a summary:

Selecting inappropriate data types, especially on large tables with millions or billions of rows, can have significant performance implications. In this article, I’ll explain why and offer suggestions on how to select the most appropriate data type for your needs. The primary focus will be on common data types in SQL Server 2005 and 2008, but I’ll also discuss some aspects of clustered indexes and column properties. Most importantly, I’ll show some examples of common data-type misuse.

If you’re interested in this content, you can find it here: Performance Considerations of Data Types.

Special thanks to Paul Randal and Paul Nielsen for providing me with technical reviews and great feedback. You guys are awesome!

Thanks also to Mladen Prajdic and Jeremiah Peschka for their great input. You guys are awesome, too!