A Look at Missing Indexes

Tim Ford (@SQLAgentMan) recently blogged about his Top 5 SQL Server Indexing Best Practices. I thought it was a good list, and it inspired this blog post. I’ve recently been doing a little index spring cleaning, and I thought some people may be interested in the process I go through. So, here it is… a journey through madness an overview of my general missing index process.

I start with my trusty dba_missingIndexStoredProc table. If this table sounds completely foreign to you, check out my post, Find Missing Indexes In Stored Procs. Basically, I have a process that runs every night, scanning the XML of every query plan on the server to find procs that are possibly missing indexes. I then log the details for later action.

So I take a look at my table, and I find 8 stored procedures that are possibly missing indexes. Clicking on the XML link will show me the logged query plan:

Procs With Missing Indexes

Procs With Missing Indexes

Right clicking on the “Missing Index” description will give me the details of the recommended index:

Missing Index Details

Missing Index Details

Here’s an example of what SQL Server will return for you:

/*
Missing Index Details from ExecutionPlan2.sqlplan
The Query Processor estimates that implementing the following index 
could improve the query cost by 85.7327%.
*/
 
/*
USE [msdb]
GO
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[sysjobhistory] ([job_id])
INCLUDE ([instance_id],[step_id],[sql_message_id],[sql_severity],
[run_status],[run_date],[run_time],[run_duration],[operator_id_emailed], 
[operator_id_netsent],[operator_id_paged],[retries_attempted],[server])
GO
*/

I now compare the details of this proposed index to the missing index DMV suggestions, using this query:

SELECT t.name AS 'affected_table'
    , 'Create NonClustered Index IX_' + t.name + '_missing_' 
        + CAST(ddmid.index_handle AS VARCHAR(10))
        + ' On ' + ddmid.STATEMENT 
        + ' (' + IsNull(ddmid.equality_columns,'') 
        + CASE WHEN ddmid.equality_columns IS Not Null 
            And ddmid.inequality_columns IS Not Null THEN ',' 
                ELSE '' END 
        + IsNull(ddmid.inequality_columns, '')
        + ')' 
        + IsNull(' Include (' + ddmid.included_columns + ');', ';'
        ) AS sql_statement
    , ddmigs.user_seeks
    , ddmigs.user_scans
    , CAST((ddmigs.user_seeks + ddmigs.user_scans) 
        * ddmigs.avg_user_impact AS INT) AS 'est_impact'
    , ddmigs.last_user_seek
FROM sys.dm_db_missing_index_groups AS ddmig
INNER JOIN sys.dm_db_missing_index_group_stats AS ddmigs
    ON ddmigs.group_handle = ddmig.index_group_handle
INNER JOIN sys.dm_db_missing_index_details AS ddmid 
    ON ddmig.index_handle = ddmid.index_handle
INNER JOIN sys.tables AS t
    ON ddmid.OBJECT_ID = t.OBJECT_ID
WHERE ddmid.database_id = DB_ID()
    --AND t.name = 'myTableName' 
ORDER BY CAST((ddmigs.user_seeks + ddmigs.user_scans) 
    * ddmigs.avg_user_impact AS INT) DESC;

I usually find the data in both places, but not always. One reason why is because the missing index DMV will only store data since your last reboot. So if I’m taking a look at this DMV on Monday and I just rebooted on Sunday, I may not have enough history to give me meaningful recommendations. This is just something to be aware of.

What I’m looking for in this DMV is the number of user_seeks and the est_impact. Also, if I haven’t rebooted my server in a while, I take a look at last_user_seek so I can determine whether or not it’s still accurate.

Next, I take a look at my existing indexes using Kimberly Tripp’s sp_helpindex2 system stored proc. I use her proc instead of sp_helpindex because I need to see included columns.

If you’re wondering why I’m looking at existing indexes, the reason is because I’m looking for indexes that can be modified slightly to accommodate my missing index needs. By “modified slightly,” I mean that I’d only want to make a change to an existing index if it did not drastically change the size or composition of an index, i.e. adding one or two narrow columns as included columns. I do NOT mean making changes that double the size of your index; in those cases, you’d probably be better off creating a brand new index.

Looking at existing indexes is actually a pretty critical part of the puzzle. If I have a proc that only gets called a few times an hour and could benefit from a better index, I may not create that index if it means adding a wide, expensive index to a busy table. If I can make a small modification to an existing index, then there’s a greater chance I’ll make the change and cover my query.

At this point, I should have enough information to start making decisions. I was going to write out the path I normally take when making decisions, but I thought, “Hey! What a great time for a diagram.” So here you go:

Decision Path

Decision Path

Disclaimer: I’m *not* a Visio wizard, so if I butchered the use of certain symbols in my diagram, please let me know so I can a) fix it, and b) learn from it!

It’s hard to really put all of the decision paths into a single, small diagram like this. There’s a lot of variables that I’m not even touching here. But I think this is a fairly good “generic” representation of the path I take. When I hit an “end” process, it means I don’t create the missing index at this time. Maybe in the future, it’ll become necessary, but I prefer to err on the side of less indexes.

So there you have it, a brief look at my missing index process. Hopefully someone finds it helpful. :)

Filtered Indexes: What You Need To Know

Filtered indexes are probably my favorite feature in 2008. That’s saying a lot, since there are so many great new features to choose from. In this post, I want to explore a little about how filtered indexes work, how they can be applied, and some of the “gotchas” to be aware of.

First, for those of you who may not yet know about filtered indexes, allow me enlighten you. In short, filtered indexes allow you to create an index on a subset of data using a filtering predicate. Filters can only be applied to non-clustered indexes. The general syntax of a filtered index is:

Create NonClustered Index [index_name]
On [table_name] ([column_list])
Include ([column_list])
Where [filtered_criteria];

For our purposes, we’re going to be working with the Sales.SalesOrderDetail table in the AdventureWorks database. Let’s look at a specific example. Suppose we have a query that regularly searches on the [SpecialOfferID] column.

Select SalesOrderID
    , Count(*) As 'CountOfLineItem'
    , Sum(LineTotal) As 'SumOfLineTotal'
From Sales.SalesOrderDetail
Where SpecialOfferID <> 1
Group By SalesOrderID;

We notice that there’s no covering index for this query by looking at the actual execution plan:

Query Plan - Clustered Scan

Query Plan - Clustered Scan

If this is a commonly executed query, then we’d probably want to toss an index on it. Before we get started, let’s take a look at what the distribution of values are on that column:

Select SpecialOfferID
    , Count(*) As 'rows'
From Sales.SalesOrderDetail
Group By SpecialOfferID
Order By Count(*) Desc;

Our distribution of values is:

SpecialOfferID rows
-------------- -----------
1              115884
2              3428
3              606
13             524
14             244
16             169
7              137
8              98
11             84
4              80
9              61
5              2

As you can see, [SpecialOfferID] = 1 accounts for 96% of our values. In 2005, we’d create an index that may look something like this:

Create NonClustered Index IX_Sales_SalesOrderDetail_SpecialOfferID
    On Sales.SalesOrderDetail(SpecialOfferID)
    Include (SalesOrderID, LineTotal);

Now if we re-run our original query, this is what we see:

Indexed Query Plan

Indexed Query Plan

So we’re now performing a non-clustered index seek instead of a clustered index scan. Already this results in some pretty significant performance improvements. To see this, we’re going to use the INDEX query hint to force an index scan. We’re also going to use the DBCC command DROPCLEANBUFFERS, which will allow us to clear the buffer cache and better examine what’s happening with our IO.

Set Statistics IO On;
 
DBCC DropCleanBuffers;
 
Select SalesOrderID
    , Count(*) As 'CountOfLineItem'
    , Sum(LineTotal) As 'SumOfLineTotal'
From Sales.SalesOrderDetail With 
    (Index(PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID))
Where SpecialOfferID <> 1
Group By SalesOrderID;
 
DBCC DropCleanBuffers;
 
Select SalesOrderID
    , Count(*) As 'CountOfLineItem'
    , Sum(LineTotal) As 'SumOfLineTotal'
From Sales.SalesOrderDetail
Where SpecialOfferID <> 1
Group By SalesOrderID;
 
Set Statistics IO Off;
Clustered Index Scan:
Table 'SalesOrderDetail'. Scan count 1, logical reads 1240, physical reads 17, read-ahead reads 1242...
 
NonClustered Index Seek:
Table 'SalesOrderDetail'. Scan count 2, logical reads 30, physical reads 4, read-ahead reads 480...

As you can see, the non-clustered (NC) index seek performs quite a bit better. Now let’s create a filtered index and explore what happens:

Create NonClustered Index FIX_Sales_SalesOrderDetail_SpecialOfferID_Filtered
    On Sales.SalesOrderDetail(SalesOrderID)
    Include (LineTotal)
    Where SpecialOfferID <> 1;

First, let’s look at the pages consumed by each index:

SELECT i.name, ddips.index_depth, ddips.index_level
    , ddips.page_count, ddips.record_count
FROM sys.indexes AS i
Join sys.dm_db_index_physical_stats(DB_ID(), 
    OBJECT_ID(N'Sales.SalesOrderDetail'), Null, Null, N'Detailed') AS ddips
    ON i.OBJECT_ID = ddips.OBJECT_ID
    And i.index_id = ddips.index_id
WHERE i.name In ('IX_Sales_SalesOrderDetail_SpecialOfferID'
    , 'FIX_Sales_SalesOrderDetail_SpecialOfferID_Filtered'
    , 'PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID')
    AND ddips.index_level = 0;
name                                                       index_depth index_level page_count  record_count
---------------------------------------------------------- ----------- ----------- ----------- --------------------
PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID        3           0           1234        121317
IX_Sales_SalesOrderDetail_SpecialOfferID                   3           0           480         121317
FIX_Sales_SalesOrderDetail_SpecialOfferID_Filtered         2           0           19          5433

If you scroll over, you’ll see that the clustered index consumes the most pages, naturally. The non-filtered NC index consumes less pages than the clustered index because it’s narrower; however, it still consumes more pages than the filtered index because it’s storing every data row. The filtered index, with only 5433 rows stored, is by far our smallest index, consuming 96% less space than our non-filtered NC index.

Because we’re using less space to store this index, we should also see an equivalent performance boost. Let’s verify that this is the case:

Set Statistics IO On;
 
DBCC DropCleanBuffers;
 
Select SalesOrderID
    , Count(*) As 'CountOfLineItem'
    , Sum(LineTotal) As 'SumOfLineTotal'
From Sales.SalesOrderDetail With (Index(IX_Sales_SalesOrderDetail_SpecialOfferID))
Where SpecialOfferID <> 1
Group By SalesOrderID;
 
DBCC DropCleanBuffers;
 
Select SalesOrderID
    , Count(*) As 'CountOfLineItem'
    , Sum(LineTotal) As 'SumOfLineTotal'
From Sales.SalesOrderDetail
Where SpecialOfferID <> 1
Group By SalesOrderID;
 
Set Statistics IO Off;
NonClustered Index Seek:
Table 'SalesOrderDetail'. Scan count 2, logical reads 30, physical reads 4, read-ahead reads 480
 
Filtered Index Scan:
Table 'SalesOrderDetail'. Scan count 1, logical reads 24, physical reads 2, read-ahead reads 22
Filtered Query Plan

Filtered Query Plan

As expected, we get the best results with our filtered index scan.

You’ll notice that I did *not* create the index on the [SpecialOfferID] column like I did in [IX_Sales_SalesOrderDetail_SpecialOfferID]. This is because my query doesn’t care what my [SpecialOfferID] value is, just as long as it’s not equal to 1. My non-filtered NC index was created on [SpecialOfferID] because it needed to navigate the B-TREE to find the records where [SpecialOfferID] <> 1. With my filtered index, the query optimizer knows that all of my records already meet the criteria, so doesn’t need to navigate through the index to find the matching results.

We could choose to include the [SpecialOfferID] data in our filtered index, but we’d most likely want to make it an included column rather than part of the index key. In fact, it’s important to note that, if I don’t add [SpecialOfferID] as an included column and I want to return it in the results, i.e.

Select SalesOrderID
    , SpecialOfferID
    , Count(*) As 'CountOfLineItem'
    , Sum(LineTotal) As 'SumOfLineTotal'
From Sales.SalesOrderDetail
Where SpecialOfferID <> 1
Group By SalesOrderID
    , SpecialOfferID;

my filtered index will not be used and I will instead scan on the clustered index once more (assuming [IX_Sales_SalesOrderDetail_SpecialOfferID] does not exist). This is because the filtering criteria is not included anywhere on the actual index page. This is actually good news, in my opinion, since it allows you to create even leaner indexes. And like I already mentioned, if you do need the data returned, you can always add the filtering criteria as included columns.

What if you’re trying to find out whether or not an index is filtered, and what it’s filtered on? The sys.indexes catalog view has been updated in 2008 to include this information:

Select name, has_filter, filter_definition
From sys.indexes 
Where name In ('IX_Sales_SalesOrderDetail_SpecialOfferID'
    , 'FIX_Sales_SalesOrderDetail_SpecialOfferID_Filtered'
    , 'PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID');
name                                                   has_filter filter_definition
------------------------------------------------------ ---------- -------------------------
FIX_Sales_SalesOrderDetail_SpecialOfferID_Filtered     1          ([SpecialOfferID]<>(1))
IX_Sales_SalesOrderDetail_SpecialOfferID               0          NULL
PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID    0          NULL

I personally recommend Kimberly Tripp’s system stored proc, sp_helpindex2. It returns a lot of good information about your indexes, such as included columns and filtering criteria.

That’s all I have for today. Hopefully, you now understand how powerful filtered indexes can be. When used properly, filtered indexes can use less space, consume less IO, and improve overall query performance.

Page Splitting & Rollbacks

So while I was at the grocery store last night, my mind wandered to SQL Server. This time, I was pondering what happens to a page split if the transaction is rolled back. I didn’t know the answer, but my guess was that the page split remained, since it would be less expensive for SQL Server to leave the data where it was sitting. Also, in theory, if the page split occurred once, it could need to occur again, so why undergo that same expense twice?

I decided to run a simple test to see what would happen. First, I created a really wide table and tossed 4 rows into it:

Create Table myTable
(
      id            int Primary Key
    , wideColumn    char(2000) 
);
 
Insert Into myTable
Select 1, Replicate('A', 2000) Union All
Select 2, Replicate('B', 2000) Union All
Select 4, Replicate('C', 2000) Union All
Select 5, Replicate('D', 2000);

I left an open spot for id=3, so I can force a page split. Next, I looked at the page data using the script I posted in February.

Here’s what I saw:

Before

Before

Pay attention to the 2 items boxed in red. m_slotCnt tells us how many records are on the page, and m_nextPage tells us the address of the next page. As you may have guessed, a m_nextPage value of 0:0 means you’re looking at the last page allocated to the object.

Now let’s insert a record, roll it back, and see what happens:

Begin Transaction;
 
Insert Into myTable
Select 3, Replicate('E', 2000);
 
Rollback Transaction;

I ran my DBCC Page command again and here’s what I saw:

After

After

As you can see, m_slotCnt is now 2, and m_nextPage is no longer 0:0 (although your actual page number will probably be different than mine). If I pull up the new page, I find my 2 relocated records, id’s 4 and 5.

So what’s this all mean? In short, page splits are NOT reversed when a transaction is rolled back. Why should you care? Well, you probably wouldn’t care much, unless you roll back a lot of transactions. But this is also a good thing to keep in mind if you have to abort a really large insert or update; if you don’t plan to re-execute the aborted script, you may want to defrag your indexes to fix the splits.

Source: http://sqlfool.com/2009/04/page-splitting-rollbacks/

Automated Index Defrag Script

UPDATE: This script has been significantly updated. Please view the latest version at http://sqlfool.com/2011/06/index-defrag-script-v4-1/.

So after much delay, here’s the latest and greatest version of my index defrag script.

A summary of the changes:

  • Added support for centralized execution. Option to defrag indexes for a single database from another database, or for all non-system databases.
  • Consolidated Enterprise and Standard versions of the script with new @editionCheck variable.
  • Added parametrization for MaxDop restrictions during rebuilds; validates that the value does not exceed the actual number of processors on the server.
  • Changed minimum fragmentation default value from 10 to 5 to match BOL recommendations.
  • Limited defrags to objects with more than 8 pages.
  • Added a debug option to give a little more insight into what’s happening and to assist with troubleshooting.
  • Updated logic for handling partitions and LOBs.

And a couple of notes and explanations:

Don’t you know you can just pass NULL to sys.dm_db_index_physical_stats for the DatabaseID?
Yes, I realize you can do this. But I don’t want to defrag the system databases, i.e. tempdb, so I opted to handle it this way instead.

How long will this take?
It depends. I don’t necessarily recommend running it without specifying a database; at least, not unmonitored. You *can* do that, but it could take a while. For example, to run sys.dm_db_index_physical_stats for all databases and tables, totaling 2TB, took me 4.5 hours; that doesn’t even count the actual defrags.

Where should I put this?
It’s up to you. If you have a database for items like centralized maintenance or scratch tables, that may be a good place for it. If you prefer, you could also put this in each individual database and call it locally, too. I would not put this in the master or msdb databases.

This is pretty close to a complete rewrite, so please let me know if you encounter any bugs. And now… the code!

If Not Exists(Select [object_id] From sys.tables Where name = N'dba_indexDefragLog')
Begin
    -- Drop Table dbo.dba_indexDefragLog
    Create Table dbo.dba_indexDefragLog
    (
          indexDefrag_id    int identity(1,1)   Not Null
        , databaseID        int                 Not Null
        , databaseName      nvarchar(128)       Not Null
        , objectID          int                 Not Null
        , objectName        nvarchar(128)       Not Null
        , indexID           int                 Not Null
        , indexName         nvarchar(128)       Not Null
        , partitionNumber   smallint            Not Null
        , fragmentation     float               Not Null
        , page_count        int                 Not Null
        , dateTimeStart     datetime            Not Null
        , durationSeconds   int                 Not Null
        Constraint PK_indexDefragLog Primary Key Clustered (indexDefrag_id)
    )
 
    Print 'dba_indexDefragLog Table Created';
End
 
If ObjectProperty(Object_ID('dbo.dba_indexDefrag_sp'), N'IsProcedure') = 1
Begin
    Drop Procedure dbo.dba_indexDefrag_sp;
    Print 'Procedure dba_indexDefrag_sp dropped';
End;
Go
 
 
Create Procedure dbo.dba_indexDefrag_sp
 
    /* Declare Parameters */
      @minFragmentation     float           = 5.0  
        /* in percent, will not defrag if fragmentation less than specified */
    , @rebuildThreshold     float           = 30.0  
        /* in percent, greater than @rebuildThreshold will result in rebuild instead of reorg */
    , @executeSQL           bit             = 1     
        /* 1 = execute; 0 = print command only */
    , @database             varchar(128)    = Null
        /* Option to specify a database name; null will return all */
    , @tableName            varchar(4000)   = Null  -- databaseName.schema.tableName
        /* Option to specify a table name; null will return all */
    , @onlineRebuild        bit             = 1     
        /* 1 = online rebuild; 0 = offline rebuild; only in Enterprise */
    , @maxDopRestriction    tinyint         = Null
        /* Option to restrict the number of processors for the operation; only in Enterprise */
    , @printCommands        bit             = 0     
        /* 1 = print commands; 0 = do not print commands */
    , @printFragmentation   bit             = 0
        /* 1 = print fragmentation prior to defrag; 
           0 = do not print */
    , @defragDelay          char(8)         = '00:00:05'
        /* time to wait between defrag commands */
    , @debugMode            bit             = 0
        /* display some useful comments to help determine if/where issues occur */
As
/*********************************************************************************
    Name:       dba_indexDefrag_sp
 
    Author:     Michelle Ufford, http://sqlfool.com
 
    Purpose:    Defrags all indexes for one or more databases
 
    Notes:
 
    CAUTION: TRANSACTION LOG SIZE MUST BE MONITORED CLOSELY WHEN DEFRAGMENTING.
 
      @minFragmentation     defaulted to 10%, will not defrag if fragmentation 
                            is less than that
 
      @rebuildThreshold     defaulted to 30% as recommended by Microsoft in BOL;
                            greater than 30% will result in rebuild instead
 
      @executeSQL           1 = execute the SQL generated by this proc; 
                            0 = print command only
 
      @database             Optional, specify specific database name to defrag;
                            If not specified, all non-system databases will
                            be defragged.
 
      @tableName            Specify if you only want to defrag indexes for a 
                            specific table, format = databaseName.schema.tableName;
                            if not specified, all tables will be defragged.
 
      @onlineRebuild        1 = online rebuild; 
                            0 = offline rebuild
 
      @maxDopRestriction    Option to specify a processor limit for index rebuilds
 
      @printCommands        1 = print commands to screen; 
                            0 = do not print commands
 
      @printFragmentation   1 = print fragmentation to screen;
                            0 = do not print fragmentation
 
      @defragDelay          time to wait between defrag commands; gives the
                            server a little time to catch up 
 
      @debugMode            1 = display debug comments; helps with troubleshooting
                            0 = do not display debug comments
 
    Called by:  SQL Agent Job or DBA
 
    Date        Initials	Description
    ----------------------------------------------------------------------------
    2008-10-27  MFU         Initial Release for public consumption
    2008-11-17  MFU         Added page-count to log table
                            , added @printFragmentation option
    2009-03-17  MFU         Provided support for centralized execution, 
                            , consolidated Enterprise & Standard versions
                            , added @debugMode, @maxDopRestriction
                            , modified LOB and partition logic                            
*********************************************************************************
    Exec dbo.dba_indexDefrag_sp
          @executeSQL           = 0
        , @minFragmentation     = 80
        , @printCommands        = 1
        , @debugMode            = 1
        , @printFragmentation   = 1
        , @database             = 'AdventureWorks'
        , @tableName            = 'AdventureWorks.Sales.SalesOrderDetail';
*********************************************************************************/																
 
Set NoCount On;
Set XACT_Abort On;
Set Quoted_Identifier On;
 
Begin
 
    If @debugMode = 1 RaisError('Dusting off the spiderwebs and starting up...', 0, 42) With NoWait;
 
    /* Declare our variables */
    Declare   @objectID             int
            , @databaseID           int
            , @databaseName         nvarchar(128)
            , @indexID              int
            , @partitionCount       bigint
            , @schemaName           nvarchar(128)
            , @objectName           nvarchar(128)
            , @indexName            nvarchar(128)
            , @partitionNumber      smallint
            , @partitions           smallint
            , @fragmentation        float
            , @pageCount            int
            , @sqlCommand           nvarchar(4000)
            , @rebuildCommand       nvarchar(200)
            , @dateTimeStart        datetime
            , @dateTimeEnd          datetime
            , @containsLOB          bit
            , @editionCheck         bit
            , @debugMessage         varchar(128)
            , @updateSQL            nvarchar(4000)
            , @partitionSQL         nvarchar(4000)
            , @partitionSQL_Param   nvarchar(1000)
            , @LOB_SQL              nvarchar(4000)
            , @LOB_SQL_Param        nvarchar(1000);
 
    /* Create our temporary tables */
    Create Table #indexDefragList
    (
          databaseID        int
        , databaseName      nvarchar(128)
        , objectID          int
        , indexID           int
        , partitionNumber   smallint
        , fragmentation     float
        , page_count        int
        , defragStatus      bit
        , schemaName        nvarchar(128)   Null
        , objectName        nvarchar(128)   Null
        , indexName         nvarchar(128)   Null
    );
 
    Create Table #databaseList
    (
          databaseID        int
        , databaseName      varchar(128)
    );
 
    Create Table #processor 
    (
          [index]           int
        , Name              varchar(128)
        , Internal_Value    int
        , Character_Value   int
    );
 
    If @debugMode = 1 RaisError('Beginning validation...', 0, 42) With NoWait;
 
    /* Just a little validation... */
    If @minFragmentation Not Between 0.00 And 100.0
        Set @minFragmentation = 5.0;
 
    If @rebuildThreshold Not Between 0.00 And 100.0
        Set @rebuildThreshold = 30.0;
 
    If @defragDelay Not Like '00:[0-5][0-9]:[0-5][0-9]'
        Set @defragDelay = '00:00:05';
 
    /* Make sure we're not exceeding the number of processors we have available */
    Insert Into #processor
    Execute xp_msver 'ProcessorCount';
 
    If @maxDopRestriction Is Not Null And @maxDopRestriction > (Select Internal_Value From #processor)
        Select @maxDopRestriction = Internal_Value
        From #processor;
 
    /* Check our server version; 1804890536 = Enterprise, 610778273 = Enterprise Evaluation, -2117995310 = Developer */
    If (Select ServerProperty('EditionID')) In (1804890536, 610778273, -2117995310) 
        Set @editionCheck = 1 -- supports online rebuilds
    Else
        Set @editionCheck = 0; -- does not support online rebuilds
 
    If @debugMode = 1 RaisError('Grabbing a list of our databases...', 0, 42) With NoWait;
 
    /* Retrieve the list of databases to investigate */
    Insert Into #databaseList
    Select database_id
        , name
    From sys.databases
    Where name = IsNull(@database, name)
        And database_id > 4 -- exclude system databases
        And [state] = 0; -- state must be ONLINE
 
    If @debugMode = 1 RaisError('Looping through our list of databases and checking for fragmentation...', 0, 42) With NoWait;
 
    /* Loop through our list of databases */
    While (Select Count(*) From #databaseList) > 0
    Begin
 
        Select Top 1 @databaseID = databaseID
        From #databaseList;
 
        Select @debugMessage = '  working on ' + DB_Name(@databaseID) + '...';
 
        If @debugMode = 1
            RaisError(@debugMessage, 0, 42) With NoWait;
 
       /* Determine which indexes to defrag using our user-defined parameters */
        Insert Into #indexDefragList
        Select
              database_id As databaseID
            , QuoteName(DB_Name(database_id)) As 'databaseName'
            , [object_id] As objectID
            , index_id As indexID
            , partition_number As partitionNumber
            , avg_fragmentation_in_percent As fragmentation
            , page_count 
            , 0 As 'defragStatus' /* 0 = unprocessed, 1 = processed */
            , Null As 'schemaName'
            , Null As 'objectName'
            , Null As 'indexName'
        From sys.dm_db_index_physical_stats (@databaseID, Object_Id(@tableName), Null , Null, N'Limited')
        Where avg_fragmentation_in_percent >= @minFragmentation 
            And index_id > 0 -- ignore heaps
            And page_count > 8 -- ignore objects with less than 1 extent
        Option (MaxDop 1);
 
        Delete From #databaseList
        Where databaseID = @databaseID;
 
    End
 
    Create Clustered Index CIX_temp_indexDefragList
        On #indexDefragList(databaseID, objectID, indexID, partitionNumber);
 
    Select @debugMessage = 'Looping through our list... there''s ' + Cast(Count(*) As varchar(10)) + ' indexes to defrag!'
    From #indexDefragList;
 
    If @debugMode = 1 RaisError(@debugMessage, 0, 42) With NoWait;
 
    /* Begin our loop for defragging */
    While (Select Count(*) From #indexDefragList Where defragStatus = 0) > 0
    Begin
 
        If @debugMode = 1 RaisError('  Picking an index to beat into shape...', 0, 42) With NoWait;
 
        /* Grab the most fragmented index first to defrag */
        Select Top 1 
              @objectID         = objectID
            , @indexID          = indexID
            , @databaseID       = databaseID
            , @databaseName     = databaseName
            , @fragmentation    = fragmentation
            , @partitionNumber  = partitionNumber
            , @pageCount        = page_count
        From #indexDefragList
        Where defragStatus = 0
        Order By fragmentation Desc;
 
        If @debugMode = 1 RaisError('  Looking up the specifics for our index...', 0, 42) With NoWait;
 
        /* Look up index information */
        Select @updateSQL = N'Update idl
            Set schemaName = QuoteName(s.name)
                , objectName = QuoteName(o.name)
                , indexName = QuoteName(i.name)
            From #indexDefragList As idl
            Inner Join ' + @databaseName + '.sys.objects As o
                On idl.objectID = o.object_id
            Inner Join ' + @databaseName + '.sys.indexes As i
                On o.object_id = i.object_id
            Inner Join ' + @databaseName + '.sys.schemas As s
                On o.schema_id = s.schema_id
            Where o.object_id = ' + Cast(@objectID As varchar(10)) + '
                And i.index_id = ' + Cast(@indexID As varchar(10)) + '
                And i.type > 0
                And idl.databaseID = ' + Cast(@databaseID As varchar(10));
 
        Execute sp_executeSQL @updateSQL;
 
        /* Grab our object names */
        Select @objectName  = objectName
            , @schemaName   = schemaName
            , @indexName    = indexName
        From #indexDefragList
        Where objectID = @objectID
            And indexID = @indexID
            And databaseID = @databaseID;
 
        If @debugMode = 1 RaisError('  Grabbing the partition count...', 0, 42) With NoWait;
 
        /* Determine if the index is partitioned */
        Select @partitionSQL = 'Select @partitionCount_OUT = Count(*)
                                    From ' + @databaseName + '.sys.partitions
                                    Where object_id = ' + Cast(@objectID As varchar(10)) + '
                                        And index_id = ' + Cast(@indexID As varchar(10)) + ';'
            , @partitionSQL_Param = '@partitionCount_OUT int OutPut';
 
        Execute sp_executeSQL @partitionSQL, @partitionSQL_Param, @partitionCount_OUT = @partitionCount OutPut;
 
        If @debugMode = 1 RaisError('  Seeing if there''s any LOBs to be handled...', 0, 42) With NoWait;
 
        /* Determine if the table contains LOBs */
        Select @LOB_SQL = ' Select Top 1 @containsLOB_OUT = column_id
                            From ' + @databaseName + '.sys.columns With (NoLock) 
                            Where [object_id] = ' + Cast(@objectID As varchar(10)) + '
                                And (system_type_id In (34, 35, 99)
                                        Or max_length = -1);'
                            /*  system_type_id --> 34 = image, 35 = text, 99 = ntext
                                max_length = -1 --> varbinary(max), varchar(max), nvarchar(max), xml */
                , @LOB_SQL_Param = '@containsLOB_OUT int OutPut';
 
        Execute sp_executeSQL @LOB_SQL, @LOB_SQL_Param, @containsLOB_OUT = @containsLOB OutPut;
 
        If @debugMode = 1 RaisError('  Building our SQL statements...', 0, 42) With NoWait;
 
        /* If there's not a lot of fragmentation, or if we have a LOB, we should reorganize */
        If @fragmentation < @rebuildThreshold Or @containsLOB = 1 Or @partitionCount > 1
        Begin
 
            Set @sqlCommand = N'Alter Index ' + @indexName + N' On ' + @databaseName + N'.' 
                                + @schemaName + N'.' + @objectName + N' ReOrganize';
 
            /* If our index is partitioned, we should always reorganize */
            If @partitionCount > 1
                Set @sqlCommand = @sqlCommand + N' Partition = ' 
                                + Cast(@partitionNumber As nvarchar(10));
 
        End;
 
        /* If the index is heavily fragmented and doesn't contain any partitions or LOB's, rebuild it */
        If @fragmentation >= @rebuildThreshold And IsNull(@containsLOB, 0) != 1 And @partitionCount <= 1
        Begin
 
            /* Set online rebuild options; requires Enterprise Edition */
            If @onlineRebuild = 1 And @editionCheck = 1 
                Set @rebuildCommand = N' Rebuild With (Online = On';
            Else
                Set @rebuildCommand = N' Rebuild With (Online = Off';
 
            /* Set processor restriction options; requires Enterprise Edition */
            If @maxDopRestriction Is Not Null And @editionCheck = 1
                Set @rebuildCommand = @rebuildCommand + N', MaxDop = ' + Cast(@maxDopRestriction As varchar(2)) + N')';
            Else
                Set @rebuildCommand = @rebuildCommand + N')';
 
            Set @sqlCommand = N'Alter Index ' + @indexName + N' On ' + @databaseName + N'.'
                            + @schemaName + N'.' + @objectName + @rebuildCommand;
 
        End;
 
        /* Are we executing the SQL?  If so, do it */
        If @executeSQL = 1
        Begin
 
            If @debugMode = 1 RaisError('  Executing SQL statements...', 0, 42) With NoWait;
 
            /* Grab the time for logging purposes */
            Set @dateTimeStart  = GetDate();
            Execute sp_executeSQL @sqlCommand;
            Set @dateTimeEnd  = GetDate();
 
            /* Log our actions */
            Insert Into dbo.dba_indexDefragLog
            (
                  databaseID
                , databaseName
                , objectID
                , objectName
                , indexID
                , indexName
                , partitionNumber
                , fragmentation
                , page_count
                , dateTimeStart
                , durationSeconds
            )
            Select
                  @databaseID
                , @databaseName
                , @objectID
                , @objectName
                , @indexID
                , @indexName
                , @partitionNumber
                , @fragmentation
                , @pageCount
                , @dateTimeStart
                , DateDiff(second, @dateTimeStart, @dateTimeEnd);
 
            /* Just a little breather for the server */
            WaitFor Delay @defragDelay;
 
            /* Print if specified to do so */
            If @printCommands = 1
                Print N'Executed: ' + @sqlCommand;
        End
        Else
        /* Looks like we're not executing, just printing the commands */
        Begin
            If @debugMode = 1 RaisError('  Printing SQL statements...', 0, 42) With NoWait;
 
            If @printCommands = 1 Print IsNull(@sqlCommand, 'error!');
        End
 
        If @debugMode = 1 RaisError('  Updating our index defrag status...', 0, 42) With NoWait;
 
        /* Update our index defrag list so we know we've finished with that index */
        Update #indexDefragList
        Set defragStatus = 1
        Where databaseID       = @databaseID
          And objectID         = @objectID
          And indexID          = @indexID
          And partitionNumber  = @partitionNumber;
 
    End
 
    /* Do we want to output our fragmentation results? */
    If @printFragmentation = 1
    Begin
 
        If @debugMode = 1 RaisError('  Displaying fragmentation results...', 0, 42) With NoWait;
 
        Select databaseID
            , databaseName
            , objectID
            , objectName
            , indexID
            , indexName
            , fragmentation
            , page_count
        From #indexDefragList;
 
    End;
 
    /* When everything is said and done, make sure to get rid of our temp table */
    Drop Table #indexDefragList;
    Drop Table #databaseList;
    Drop Table #processor;
 
    If @debugMode = 1 RaisError('DONE!  Thank you for taking care of your indexes!  :)', 0, 42) With NoWait;
 
    Set NoCount Off;
	Return 0
End
Go

Thanks to my beta testers, @scoinva, @davidmtate, @jdanton, and @SuperCoolMoss!
Special thanks to SCM for keeping on me to finish this.

Happy Defragging!

Michelle

Source: http://sqlfool.com/2009/03/automated-index-defrag-script/

Index Fragmentation PotW Webcast – Now Available

The Pain-of-the-Week webcast I did last Thursday is now online! You can find it here: http://www.quest.com/common/registration.aspx?requestdefid=23092.

The topic of the webcast is index fragmentation: what is it, how to identify it, and how to fix it. You can find the materials for this webcast in my previous blog post. I’ll also be updating SQLServerPedia with the materials, hopefully tonight.

Thanks to the great feedback I’ve received, I’ll be doing another Pain-of-the-Week webcast, this time with Brent Ozar. More info to follow as we hammer out the specifics.

In a related note, I have updated my index defrag script. I’m just waiting for my awesome beta testers to let me know everything works for them. It works just fine on my servers (lol) but I don’t have access to any Standard editions, etc., so I just want to be extra sure before I unleash it on the SQL world. I should have it online tomorrow.

Source: http://sqlfool.com/2009/03/index-fragmentation-potw-webcast/

Find Missing Indexes in Stored Procs with T-SQL

This post should probably be called “The Power of Twitter.” I’ve said it before, I’ll say it again: I love Twitter. I mostly follow SQL Server people, so it’s become a great source of new information and help when I feel like banging my head against the wall when I get stumped.

So last week, Jonathan Kehayias (@jmkehayias) posted a link to a missing index script on the MSDN forums. Jonathan’s script is modified from something he put together for a question posed by Jeremiah Peschka (@peschkaj).

Jonathan’s script intrigued me. I’d never tried to search a query plan’s XML before, and it certainly presents some interesting possibilities. After dealing with a missing index in production a few weeks ago (caused by an index change), I thought it’d be a great idea to put a regular monitor in place.

Everything was going well until I ran into a problem where I couldn’t get the proc name returned. The results spanned numerous databases; I had the object_id and database_id, but I wanted to store the proc name instead. I tried several different methods, including sp_msforeachdb and sp_executeSQL, and while I had a working solution, it was a little more clunky than I liked. So I asked my awesome followers on Twitter for any tips and within minutes I had half a dozen responses. In the end, @MladenPrajdic solved my problem with a pretty simple solution: put ‘Use ?;’ at the start of my sp_msforeachdb statement. Thanks again, Mladen!

I’ve now had this process running on my server for a few days now, with good success. The stored procedure below will return the database name, proc name, and query plan XML for any stored proc with a missing index. This is a centralized proc that will store the results in a table for later action. If you click on the XML, you should see the actual query plan with the missing index details. Because this looks at cached query plans, your best bet is to run it fairly regularly (maybe daily) to increase your chances of catching any problem procs.

/* Create a stored procedure skeleton */
If ObjectProperty(Object_ID('dbo.dba_missingIndexStoredProc_sp'), N'IsProcedure') Is Null
Begin
    Execute ('Create Procedure dbo.dba_missingIndexStoredProc_sp As Print ''Hello World!''')
    RaisError('Procedure dba_missingIndexStoredProc_sp created.', 10, 1);
End;
Go
 
/* Drop our table if it already exists */
If Exists(Select Object_ID From sys.tables Where [name] = N'dba_missingIndexStoredProc')
Begin
    Drop Table dbo.dba_missingIndexStoredProc
    Print 'dba_missingIndexStoredProc table dropped!';
End
 
/* Create our table */
Create Table dbo.dba_missingIndexStoredProc
(
      missingIndexSP_id int Identity(1,1)   Not Null
    , databaseName      varchar(128)        Not Null
    , databaseID        int                 Not Null
    , objectName        varchar(128)        Not Null
    , objectID          int                 Not Null
    , query_plan        xml                 Not Null
    , executionDate     smalldatetime       Not Null
 
    Constraint PK_missingIndexStoredProc
        Primary Key Clustered(missingIndexSP_id)
);
 
Print 'dba_missingIndexStoredProc Table Created';
 
/* Configure our settings */
Set ANSI_Nulls On;
Set Quoted_Identifier On;
Go
 
Alter Procedure dbo.dba_missingIndexStoredProc_sp
 
        /* Declare Parameters */
            @lastExecuted_inDays    int = 7
          , @minExecutionCount      int = 7
          , @logResults             bit = 1
          , @displayResults         bit = 0
 
As
/*********************************************************************************
    Name:       dba_missingIndexStoredProc_sp
 
    Author:     Michelle Ufford, http://sqlfool.com
 
    Purpose:    Retrieves stored procedures with missing indexes in their
                cached query plans.
 
                @lastExecuted_inDays = number of days old the cached query plan
                                       can be to still appear in the results;
                                       the HIGHER the number, the longer the
                                       execution time.
 
                @minExecutionCount = minimum number of executions the cached
                                     query plan can have to still appear 
                                     in the results; the LOWER the number,
                                     the longer the execution time.
 
                @logResults = store results in dba_missingIndexStoredProc
 
                @displayResults = return results to the caller
 
    Notes:      This is not 100% guaranteed to catch all missing indexes in
                a stored procedure.  It will only catch it if the stored proc's
                query plan is still in cache.  Run regularly to help minimize
                the chance of missing a proc.
 
    Called by:  DBA and/or SQL Agent Job
 
    Date        User    Description
    ----------------------------------------------------------------------------
    2009-03-02  MFU     Initial Release for public consumption
*********************************************************************************
    Exec dbo.dba_missingIndexStoredProc_sp
          @lastExecuted_inDays  = 30
        , @minExecutionCount    = 5
        , @logResults           = 1
        , @displayResults       = 1;
*********************************************************************************/
 
Set NoCount On;
Set XACT_Abort On;
Set Ansi_Padding On;
Set Ansi_Warnings On;
Set ArithAbort On;
Set Concat_Null_Yields_Null On;
Set Numeric_RoundAbort Off;
 
Begin
 
    /* Declare Variables */
    Declare @currentDateTime smalldatetime;
 
    Set @currentDateTime = GetDate();
 
    Declare @plan_handles Table
    (
        plan_handle     varbinary(64)   Not Null
    );
 
    Create Table #missingIndexes
    (
          databaseID    int             Not Null
        , objectID      int             Not Null
        , query_plan    xml             Not Null
 
        Constraint PK_temp_missingIndexes Primary Key Clustered
        (
            databaseID, objectID
        )
    );
 
    Begin Try
 
        /* Perform some data validation */
        If @logResults = 0 And @displayResults = 0
        Begin
 
            /* Log the fact that there were open transactions */
            Execute dbo.dba_logError_sp
                  @errorType            = 'app'
                , @app_errorProcedure   = 'dba_missingIndexStoredProc_sp'
                , @app_errorMessage     = '@logResults = 0 and @displayResults = 0; no action taken, exiting stored proc.'
                , @forceExit            = 1
                , @returnError          = 1;  
 
        End;
 
        Begin Transaction;
 
        /* Retrieve distinct plan handles to minimize dm_exec_query_plan lookups */
        Insert Into @plan_handles
        Select Distinct plan_handle
        From sys.dm_exec_query_stats
        Where last_execution_time > DateAdd(day, -@lastExecuted_inDays, @currentDateTime)
            And execution_count > @minExecutionCount;
 
        With xmlNameSpaces (
            Default 'http://schemas.microsoft.com/sqlserver/2004/07/showplan'
        )
 
        /* Retrieve our query plan's XML if there's a missing index */
        Insert Into #missingIndexes
        Select deqp.[dbid]
            , deqp.objectid
            , deqp.query_plan 
        From @plan_handles As ph
        Cross Apply sys.dm_exec_query_plan(ph.plan_handle) As deqp 
        Where deqp.query_plan.exist('//MissingIndex') = 1
            And deqp.objectid Is Not Null;
 
        /* Do we want to store the results of our process? */
        If @logResults = 1
        Begin
            Insert Into dbo.dba_missingIndexStoredProc
            Execute sp_msForEachDB 'Use ?; 
                                    Select ''?''
                                        , mi.databaseID
                                        , Object_Name(o.object_id)
                                        , o.object_id
                                        , mi.query_plan
                                        , GetDate()
                                    From sys.objects As o 
                                    Join #missingIndexes As mi 
                                        On o.object_id = mi.objectID 
                                    Where databaseID = DB_ID();';
 
        End
        /* We're not logging it, so let's display it */
        Else
        Begin
            Execute sp_msForEachDB 'Use ?; 
                                    Select ''?''
                                        , mi.databaseID
                                        , Object_Name(o.object_id)
                                        , o.object_id
                                        , mi.query_plan
                                        , GetDate()
                                    From sys.objects As o 
                                    Join #missingIndexes As mi 
                                        On o.object_id = mi.objectID 
                                    Where databaseID = DB_ID();';
        End;
 
        /* See above; this part will only work if we've 
           logged our data. */
        If @displayResults = 1 And @logResults = 1
        Begin
            Select *
            From dbo.dba_missingIndexStoredProc
            Where executionDate >= @currentDateTime;
        End;
 
        /* If you have an open transaction, commit it */
        If @@TranCount > 0
            Commit Transaction;
 
    End Try
    Begin Catch
 
        /* Whoops, there was an error... rollback! */
        If @@TranCount > 0
            Rollback Transaction;
 
        /* Return an error message and log it */
        Execute dbo.dba_logError_sp;
 
    End Catch;
 
    /* Clean-Up! */
    Drop Table #missingIndexes;
 
    Set NoCount Off;
    Return 0;
End
Go

Not know what “EXECUTE dbo.dba_logError_sp;” is doing? Check out my blog post on Error Handling in T-SQL.

HTH!

Michelle

Source: http://sqlfool.com/2009/03/find-missing-indexes/

T-SQL Bitwise Operations

I’ve seen bit-product columns from time-to-time, mostly in SQL Server 2000 system tables, but it’s never been something I’ve had to work with. And when I’ve needed to, I’ve known how to figure out which options are selected, i.e. a bit product of 9 means options 8 and 1 are selected. If you’ve ever taken a look at the [status] column on the sysdatabases table (SQL 2000), you’ll know what I’m talking about.

What I’ve never known how to do, until recently, was calculate these options programmatically. That’s why, when I noticed the [freq_interval] on the sysschedules table was a bit-product column, I decided to spend a little time figuring it out. Fortunately for me, a couple of my awesome co-workers, Jeff M. and Jason H., have worked with this before and were able to explain it to me. And, it turns out, it’s actually quite easy.

Let me back up a few steps in case you’re not familiar with this topic. If you check out the Books Online entry for the sysschedules table (2005), you’ll notice the following statement:

freq_interval is one or more of the following:
1 = Sunday
2 = Monday
4 = Tuesday
8 = Wednesday
16 = Thursday
32 = Friday
64 = Saturday

When I looked at the actual value in the table, the schedule has a [freq_interval] value of 42, which is the sum of the bit values for the days selected.

If there were more than 7 options, the bit values would continue to double, i.e. 128, 256, etc. And regardless of how many bit values you select, you’re guaranteed one and only one possible answer, as the sum of all previous bit values will never exceed the next bit value:
1 + 2 = 3
1 + 2 + 4 = 7
1 + 2 + 4 + 8 = 15

Knowing this, I’m able to retrieve the values manually: I start with the highest bit value that does not exceed 42, then subtract it; I repeat until I’m left with 0.

So…
42 – 32 = 10
10 – 8 = 2
2 – 2 = 0

That means my job is scheduled to run on Friday’s (32), Wednesday’s (8), and Monday’s (2).

Now how do I do this with T-SQL? SQL Server provides an operator specifically for this task: the bitwise AND operator (&). For now, I’m going to skip the “why” behind this and just get to the practical application. If you’re interested in the “why,” let me know and I’ll write a follow-up post on binary and logical AND and OR operations.

For example, to use the bitwise AND to find out which days are selected…

Select 42 & 1 As 'Sunday'
    , 42 & 2 As 'Monday'
    , 42 & 4 As 'Tuesday'
    , 42 & 8 As 'Wednesday'
    , 42 & 16 As 'Thursday'
    , 42 & 32 As 'Friday'
    , 42 & 64 As 'Saturday';

… will return …

Sunday      Monday      Tuesday     Wednesday   Thursday    Friday      Saturday
----------- ----------- ----------- ----------- ----------- ----------- -----------
0           2           0           8           0           32          0

If the result is not equal to zero, then that day is selected. Easy as key lime pie, right?

Now let’s take it a step further and create our own working example. Let’s say we’re going to track the characteristics of various objects in a single bit-product column (note: this is not necessarily the best way to accomplish this in the real world, but it’s a good illustration). First, set up a table to use in our example. This table will have a column, [attributes], which will hold the sum of our bit values.

Create Table myTable
(
      id            int identity(1,1)
    , item          varchar(10)
    , attributes    int
);
 
Insert Into myTable
Select 'Broccoli', 200 Union All
Select 'Tomato', 193 Union All
Select 'Car', 276 Union All
Select 'Ball', 292;

Next, we’re going to create a table variable that holds characteristics and their values. We’ll then join these two tables together to see which attributes exist for each item.

Declare @statusLookup Table
(
      attribute int
    , value     varchar(10)
);
 
Insert Into @statusLookup
Select 1, 'Red' Union All
Select 4, 'Blue' Union All
Select 8, 'Green' Union All
Select 16, 'Metal' Union All
 
Select 32, 'Plastic' Union All
Select 64, 'Plant' Union All
Select 128, 'Edible' Union All
Select 256, 'Non-Edible';
 
Select a.item, b.value
From myTable a
Cross Join @statusLookup b
Where a.attributes & b.attribute <> 0
Order By a.item
    , b.value

You should get this result:

item       value
---------- ----------
Ball       Blue
Ball       Non-Edible
Ball       Plastic
Broccoli   Edible
Broccoli   Green
Broccoli   Plant
Car        Blue
Car        Metal
Car        Non-Edible
Tomato     Edible
Tomato     Plant
Tomato     Red

Great, now we know broccoli is edible! Let’s apply a little XML to clean up the results…

Select a.item
    , Replace( Replace( Replace(( 
        Select value 
        From @statusLookup AS b 
        Where a.attributes & b.attribute <> 0 
        Order By b.value For XML Raw)
        , '"/><row value="', ', '), '<row value="', ''), '"/>', '') 
        As 'attributes'
From myTable a
Order By a.item;
item       attributes
----------------------------------------
Ball       Blue, Non-Edible, Plastic
Broccoli   Edible, Green, Plant
Car        Blue, Metal, Non-Edible
Tomato     Edible, Plant, Red

Voila! There you have it, how to use the bitwise AND (&) operator to retrieve multiple values from a bit-product column. Pretty neat stuff!

Special thanks to Jeff M. and Jason H. for their assistance. :)

Happy Coding!

Michelle Ufford (aka SQLFool)

Source: http://sqlfool.com/2009/02/bitwise-operations/

Fragmentation on Replicated Tables

I’ve heard more than one person state that they do not need to defrag their tables, because they’re replicated and the source tables are defragged regularly. Let’s set this myth straight.

Repeat after me: defragmentation processes are not replicated.

Good. Thank you.

For those non-believers, let me prove it:

SET NOCOUNT ON;
 
/* Create publication db */
CREATE DATABASE sandbox_publisher;
GO
 
/* Create subscription db */
CREATE DATABASE sandbox_subscriber;
GO
 
/* Navigate to publisher db */
USE sandbox_publisher;
GO
 
/* Create a table to play with */
CREATE TABLE myTable
(
      myID UNIQUEIDENTIFIER
    , myColumn VARCHAR(2000)
 
    CONSTRAINT PK_myTable PRIMARY KEY CLUSTERED (myID)
);
/* NOTE: please never, ever use this
   type of clustered index in real-life.
   The whole purpose is to generate
   substantial fragmentation for 
   illustrative purposes. */
 
/* Populate it with some non-sequential
   records to generate fragmentation */
DECLARE @iteration INT;
SET @iteration = 0;
 
WHILE @iteration < 1000
BEGIN
    INSERT INTO myTable
    SELECT NewID(), REPLICATE('A', 2000);
 
    SET @iteration = @iteration + 1;
END
 
/* Now let's verify that we really did
   make a mess of our clustered index */
SELECT avg_fragmentation_in_percent
    , page_count
FROM sys.dm_db_index_physical_stats 
(DB_ID(), OBJECT_ID(N'MyTable'), NULL , NULL, N'Limited') 
OPTION (MaxDop 1);
 
/* Results:
 
avg_fragmentation_in_percent page_count
---------------------------- --------------------
99.0157480314961             508
 
99% fragmented, not too shabby!  Your results 
may vary slightly, based on the guid values
generated by NewID().
*/
 
/* Now, let's set up replication */
USE sandbox_publisher;
GO
 
/* Enable publication */
EXECUTE SP_REPLICATIONDBOPTION
      @dbname = N'sandbox_publisher'
    , @optname = N'publish'
    , @VALUE = N'true';
GO
 
/* Create our publication */
EXECUTE SP_ADDPUBLICATION
      @publication = N'myTestPublication' 
    , @sync_method = N'concurrent'
    , @description = N'Test Publication'
    , @status = N'active'
    , @independent_agent = N'true'
    , @immediate_sync = N'true'
    , @retention = 0
GO
 
/* Create our snapshot agent */
EXECUTE SP_ADDPUBLICATION_SNAPSHOT
      @publication = N'myTestPublication'
    , @frequency_type = 1;
 
/* Add our table as an article */
EXECUTE SP_ADDARTICLE
      @publication = N'myTestPublication'
    , @article = N'myTable'
    , @source_owner = N'dbo'
    , @source_object = N'myTable'
    , @type = N'logbased'
    , @destination_table = N'myTable'
    , @destination_owner = N'dbo'
    , @ins_cmd = N'CALL sp_MSins_dbomyTable'
    , @del_cmd = N'CALL sp_MSdel_dbomyTable'
    , @upd_cmd = N'MCALL sp_MSupd_dbomyTable';
GO
 
/* Generate an initial snapshot */
EXECUTE sp_startpublication_snapshot
      @publication = N'myTestPublication';
 
/* Create our subscription */
EXECUTE SP_ADDSUBSCRIPTION 
      @publication = N'myTestPublication'
    , @subscriber = N'YourServerName'
    , @destination_db = N'sandbox_subscriber';
 
/* At this point, basic transactional replication 
   should be configured.  You should now have a 
   copy of myTable in your subscription database. */
 
USE sandbox_subscriber;
GO
 
/* Let's take a look at fragmentation on the 
   subscription database. */
SELECT avg_fragmentation_in_percent
    , page_count
FROM sys.dm_db_index_physical_stats 
(DB_ID(), OBJECT_ID(N'MyTable'), NULL , NULL, N'Limited') 
OPTION (MaxDop 1);
 
/* Results:
 
avg_fragmentation_in_percent page_count
---------------------------- --------------------
0.29940119760479             334
 
Less than 1% fragmented!  Why is this?  During
initialization, a copy of the schema and data
is sent to the subscriber.  The data is sorted
before insertion into the destination table.
*/
 
USE sandbox_publisher;
GO
 
/* Let's insert another 1000 records and see
   what happens... */
DECLARE @iteration INT;
SET @iteration = 0;
 
WHILE @iteration < 1000
BEGIN
    INSERT INTO sandbox_publisher.dbo.myTable
    SELECT NewID(), REPLICATE('A', 2000);
 
    SET @iteration = @iteration + 1;
END
 
SELECT avg_fragmentation_in_percent
    , page_count
FROM sys.dm_db_index_physical_stats 
(DB_ID(), OBJECT_ID(N'MyTable'), NULL , NULL, N'Limited') 
OPTION (MaxDop 1);
 
/* Results:
 
avg_fragmentation_in_percent page_count
---------------------------- --------------------
98.6193293885602             1014
 
The publisher is still very fragmented 
(as expected) */
 
USE sandbox_subscriber;
GO
 
SELECT avg_fragmentation_in_percent
    , page_count
FROM sys.dm_db_index_physical_stats 
(DB_ID(), OBJECT_ID(N'MyTable'), NULL , NULL, N'Limited') 
OPTION (MaxDop 1);
 
/* Results:
 
avg_fragmentation_in_percent page_count
---------------------------- --------------------
95.4034729315628             979
 
But now the subscriber is fragmented too!  This
is because each individual transaction is 
replicated, so out-of-order inserts affect
both the publishing and the subcribing 
databases equally.
*/
 
/* Finally, let's rebuild our index on the 
   publisher and see what happens */
USE sandbox_publisher;
GO
 
ALTER INDEX PK_myTable ON myTable Rebuild;
 
SELECT avg_fragmentation_in_percent
    , page_count
FROM sys.dm_db_index_physical_stats 
(DB_ID(), OBJECT_ID(N'MyTable'), NULL , NULL, N'Limited') 
OPTION (MaxDop 1);
 
/* Results:
 
avg_fragmentation_in_percent page_count
---------------------------- --------------------
0                            667
 
0% fragmentation in the publishing table!  
What about our subscribing table?
*/
 
USE sandbox_subscriber;
GO
 
SELECT avg_fragmentation_in_percent
    , page_count
FROM sys.dm_db_index_physical_stats 
(DB_ID(), OBJECT_ID(N'MyTable'), NULL , NULL, N'Limited') 
OPTION (MaxDop 1);
 
/* Results:
 
avg_fragmentation_in_percent page_count
---------------------------- --------------------
95.4034729315628             979
 
As you can see, the subcribing table was
completely unaffected by the defrag
activities on the publisher.
*/

So there you have it, proof that defrag operations are not replicated! Now go defrag your replicated tables. :)

Oh, and if you need help defragging your indexes, check out my index defrag script (2005, 2008).

Creating a 60 GB Index

Recently, I needed to create an index on a 1.5 billion row table. I’ve created some large indexes before, but this was the largest, so I thought I’d share my experience in case anyone was interested.

The plan was to create the following index:

Create NonClustered Index IX_indexName_unpartitioned
    On databaseName.dbo.tableName(columnList)
    Include (includedColumnList)
    With (MaxDop = 1, Online = On, Sort_In_TempDB = On)
    On [Primary];

This is an unpartitioned, non-clustered index being built on a partitioned table. Normally, when you build an aligned partitioned index, the index creation process requires less memory and has less noticeable impact on the system, because each partition is handled individually, one at a time. But as this is an unpartitioned (unaligned) index, each partition was built concurrently, requiring more memory and causing a greater impact on performance. Because of this, I needed to restrict the process to MaxDop 1; otherwise, the server would suffer because of too much memory pressure.

I chose Sort_In_TempDB = On because:

  • I’m building this index online on a very busy table and cannot afford to impact normal oeprations. By using Sort_In_TempDB = On, index transactions are separated from user transactions, allowing the user transaction log to be truncated. *
  • TempDB is on a different volume and therefore should reduce the duration of the operation.
  • The recovery for the user database is full, and the recovery for the TempDB is simple. Sorting in TempDB would minimize logging.

* Note: the transaction log for the user database still grew at a much faster rate than normal and had to be closely monitored during this operation to ensure enough free space remained.

The size of the indexed columns is 25 bytes. So I ran my calculations and came up with 36gb space requirement. We increased TempDB to 50gb and gave it a go. An hour later… ERROR. The process terminated because there would not be enough space free in TempDB to complete the operation. Obviously, my calculations were incorrect. After speaking with Chris Leonard, a man who is way too smart for his own good, I realized I had not included my clustered index in the size calculations. Doh.

Re-running my estimates, here’s what I came up with:

Index Size 25   bytes
Clustered Index Size 16   bytes
Records per Page 197
Est. Rows 1,575,000,000
Est. Number of Pages 7,995,000
Space Requirements 59   GB

 

Obviously, 50gb of free space just wasn’t going to cut it. I decided to give TempDB a little wiggle room and bumped up the space in TempDB to 70gb (not as easy as you’d imagine, I had to requisition more space on the SAN), then re-ran the operation. Success! The process completed in 3 hours and 24 minutes. There was a mild increase in CPU, but no applications or users experienced any issues.

For those interested in the particulars: this was used for a single-record look-up and could not be filtered by the partitioning key. The non-partitioned version of this index has 80% less reads and 11% less CPU than its partitioned counterpart.

If you’re interested in learning more about indexes, here’s some recommended reading:

Index Clean-Up Scripts

I’ve been spending a lot of time lately looking at indexing in my production environments… dropping un-used ones, adding missing ones, and fine-tuning the ones I already have. I thought I’d share some of the scripts I’ve been using to accomplish this.

Here’s the script I use to find any un-used indexes. This relies heavily on the sys.dm_db_index_usage_stats DMV (2005+). This query will also return the SQL statements needed to drop the indexes for convenience. This does NOT mean you should necessarily drop the index. This is only a guide and a starting point; only you know how your application is used and whether SQL Server’s recommendations make sense.

Un-Used Indexes Script

Declare @dbid int
    , @dbName varchar(100);
 
Select @dbid = DB_ID()
    , @dbName = DB_Name();
 
With partitionCTE (object_id, index_id, row_count, partition_count) 
As
(
    Select [object_id]
        , index_id
        , Sum([rows]) As 'row_count'
        , Count(partition_id) As 'partition_count'
    From sys.partitions
    Group By [object_id]
        , index_id
) 
 
Select Object_Name(i.[object_id]) as objectName
        , i.name
        , Case 
            When i.is_unique = 1 
                Then 'UNIQUE ' 
            Else '' 
          End + i.type_desc As 'indexType'
        , ddius.user_seeks
        , ddius.user_scans
        , ddius.user_lookups
        , ddius.user_updates
        , cte.row_count
        , Case When partition_count > 1 Then 'yes' 
            Else 'no' End As 'partitioned?'
        , Case 
            When i.type = 2 And i.is_unique_constraint = 0
                Then 'Drop Index ' + i.name 
                    + ' On ' + @dbName 
                    + '.dbo.' + Object_Name(ddius.[object_id]) + ';'
            When i.type = 2 And i.is_unique_constraint = 1
                Then 'Alter Table ' + @dbName 
                    + '.dbo.' + Object_Name(ddius.[object_ID]) 
                    + ' Drop Constraint ' + i.name + ';'
            Else '' 
          End As 'SQL_DropStatement'
From sys.indexes As i
Inner Join sys.dm_db_index_usage_stats ddius
    On i.object_id = ddius.object_id
        And i.index_id = ddius.index_id
Inner Join partitionCTE As cte
    On i.object_id = cte.object_id
        And i.index_id = cte.index_id
Where ddius.database_id = @dbid
Order By 
    (ddius.user_seeks + ddius.user_scans + ddius.user_lookups) Asc
    , user_updates Desc;

This next script relies on several DMV’s (2005+) that identify missing indexes. While this is good information, the index recommendations do not always make sense and/or sometimes overlap. Also, these DMV’s store data since the SQL Server was last restarted, so if it’s been a while since your server was rebooted, this data may be out of date. This script also provides a SQL statement, in case you do decide to create the index, but it doesn’t take into consideration advanced parameters (i.e. sort_in_tempDB, Online, MaxDop, etc.) and only provides a basic create statement. Nonetheless, it’s another good starting point.

Missing Index Script

Select t.name As 'affected_table'
    , 'Create NonClustered Index IX_' + t.name + '_missing_' 
        + Cast(ddmid.index_handle As varchar(10))
        + ' On ' + ddmid.statement 
        + ' (' + IsNull(ddmid.equality_columns,'') 
        + Case When ddmid.equality_columns Is Not Null 
            And ddmid.inequality_columns Is Not Null Then ',' 
                Else '' End 
        + IsNull(ddmid.inequality_columns, '')
        + ')' 
        + IsNull(' Include (' + ddmid.included_columns + ');', ';'
        ) As sql_statement
    , ddmigs.user_seeks
    , ddmigs.user_scans
    , Cast((ddmigs.user_seeks + ddmigs.user_scans) 
        * ddmigs.avg_user_impact As int) As 'est_impact'
    , ddmigs.last_user_seek
From sys.dm_db_missing_index_groups As ddmig
Inner Join sys.dm_db_missing_index_group_stats As ddmigs
    On ddmigs.group_handle = ddmig.index_group_handle
Inner Join sys.dm_db_missing_index_details As ddmid 
    On ddmig.index_handle = ddmid.index_handle
Inner Join sys.tables As t
    On ddmid.object_id = t.object_id
Where ddmid.database_id = DB_ID()
    And Cast((ddmigs.user_seeks + ddmigs.user_scans) 
        * ddmigs.avg_user_impact As int) > 100
Order By Cast((ddmigs.user_seeks + ddmigs.user_scans) 
    * ddmigs.avg_user_impact As int) Desc;