Summit 2010 Abstract Submission

Jeremiah Peschka just tweeted about the looming deadline to submit an abstract for Summit 2010. I’ve been trying to think of a good topic to present on, and this finally got my butt in gear to submit one! For those interested, here’s what I submitted:

Heaps of Trouble, Clusters of Glory – A Look At Index Internals

Indexes are a crucial component of SQL Server, especially in performance tuning, yet many DBA’s don’t fully understand how indexes work. In this in-depth session, Michelle will examine the anatomy of indexes, from how they’re stored to how they get fragmented. And to make our inner geeks happy, she’ll look at page data to show you what’s happening behind the scenes. Topics covered will include index structure, fragmentation and defragmentation, index partitioning, and index filtering. If you’ve ever wondered just *what* goes on in an index, don’t miss this session!

Special thanks to Chris for help with the session title. :)

Undocumented Function in SQL 2008

If you’ve been following my blog for a little while, you’ll know that I’m a fan of SQL Server internals. There’s a lot that can be learned or better understood by rolling up your sleeves and getting into the nitty-gritty of data pages (i.e. see my post on Overhead in Non-Unique Clustered Indexes). So imagine how happy I was when my co-worker Jeff shared an undocumented function with me today that retrieves the file number, page number, and slot number of a single record. Very cool! Well, at least to me. So now let’s see how you can use it.

The fn_physLocCracker function can be called in the following way:

Select Top 100 plc.*, soh.SalesOrderID
From Sales.SalesOrderHeader As soh
Cross Apply sys.fn_physLocCracker (%%physloc%%) As plc;

Results (just a sample):

file_id     page_id     slot_id     SalesOrderID
----------- ----------- ----------- ------------
1           14032       0           43659
1           14032       1           43660
1           14032       2           43661
1           14032       3           43662
1           14032       4           43663

If you look at the sp_helptext for sys.fn_physLocCracker, %%physloc%% is apparently a virtual column that contains information on where the record is stored. In fact, you can even append %%physloc%% to your column list if you want to see how the information is stored. But for our purposes, we now have a file number, page number, and slot number. What do we do with it?

Well, you can use the investigation proc I wrote to retrieve the actual data page:

Execute dba_viewPageData_sp 
      @databaseName = 'AdventureWorks'
    , @fileNumber = 1
    , @pageNumber = 14032;

Results (just a sample):

Slot 0 Column 1 Offset 0x4 Length 4 Length (physical) 4
 
SalesOrderID = 43659                 
 
Slot 0 Column 2 Offset 0x8 Length 1 Length (physical) 1
 
RevisionNumber = 1                   
 
Slot 0 Column 3 Offset 0x9 Length 8 Length (physical) 8
 
OrderDate = 2001-07-01 00:00:00.000  
 
Slot 0 Column 4 Offset 0x11 Length 8 Length (physical) 8
 
DueDate = 2001-07-13 00:00:00.000

Neat, huh? So why would you use it to look up the data page and file number when you can just pass the table name and index name to my proc and retrieve data pages? Well, my investigation proc will retrieve data pages for any index type — the fn_physLocCracker function will only retrieve data for the clustered index — but it will not retrieve the data page for a specific record. So just something to be aware of.

That’s all for now. Back to the #24HoursOfPASS! :)

Overhead in Non-Unique Clustered Indexes

I’ve received a couple of questions regarding my article, Performance Considerations of Data Types, and the overhead associated with non-unique clustered indexes. I started to respond via e-mail, but my response was so long I decided to turn it into a blog post instead. ;)

I should start by clarifying that non-unique clustered indexes do not necessarily consume more space and overhead; it depends on the data stored. If you have duplicate clustered key values, the first instance of the value will be handled as though it were unique. Any subsequent values, however, will incur overhead to manage the uniquifier that SQL Server adds to maintain row uniqueness. This same overhead is also incurred in non-clustered indexes, too, adding to the overall expense of this approach.

I think it helps to actually look at the data, so let’s walk through a few different common scenarios. We’ll create a table with a unique clustered index, a table with a non-unique clustered index but no duplicates, and a table with duplicate key values.

Also, a little warning that I started to write this in SQL Server 2008, and since I’m on a 2008 kick, I decided to leave it that way. You can modify this pretty easily to work in 2005, if necessary.

Use sandbox;
Go
 
/* Unique, clustered index, no duplicate values */
Create Table dbo.uniqueClustered
(
      myDate    date        Not Null
    , myNumber  int         Not Null
    , myColumn  char(995)   Not Null
);
 
Create Unique Clustered Index CIX_uniqueClustered
    On dbo.uniqueClustered(myDate);
 
 
/* Non-unique clustered index, but no duplicate values */
Create Table dbo.nonUniqueNoDups
(
      myDate    date        Not Null
    , myNumber  int         Not Null
    , myColumn  char(995)   Not Null
);
 
Create Clustered Index CIX_nonUniqueNoDups
    On dbo.nonUniqueNoDups(myDate);
 
 
/* Non-unique clustered index, duplicate values */
Create Table dbo.nonUniqueDuplicates
(
      myDate    date        Not Null
    , myNumber  int         Not Null
    , myColumn  char(995)   Not Null
);
 
Create Clustered Index CIX_nonUniqueDuplicates
    On dbo.nonUniqueDuplicates(myDate);

I’m going to use the date data type in 2008 for my clustered index key. To ensure uniqueness for the first two tables, I’ll iterate through a few years’ worth of dates. This is typical of what you may see in a data mart, where you’d have one record with an aggregation of each day’s data. For the table with duplicate values, I’m going to insert the same date for each row.

/* Populate some test data */
Set NoCount On;
Declare @myDate date = '1990-01-01'
    , @myNumber int = 1;
 
While @myDate < '2010-01-01'
Begin
 
    Insert Into dbo.uniqueClustered
    Select @myDate, @myNumber, 'data';
 
    Insert Into dbo.nonUniqueNoDups
    Select @myDate, @myNumber, 'data';
 
    Insert Into dbo.nonUniqueDuplicates
    Select '2009-01-01', @myNumber, 'data';
 
    Select @myDate = DateAdd(day, 1, @myDate)
        , @myNumber += 1;
 
End;

After running the above script, each table should have 7,305 records. This is obviously pretty small for a table, but it’ll serve our purposes. Now let’s take a look at the size of our tables:

/* Look at the details of our indexes */
 
/* Unique, clustered index, no duplicate values */
Select 'unique' As 'type', page_count, avg_page_space_used_in_percent, record_count
    , min_record_size_in_bytes, max_record_size_in_bytes
From sys.dm_db_index_physical_stats(DB_ID(), Object_ID(N'uniqueClustered'), Null, Null, N'Detailed') 
Where index_level = 0
Union All
/* Non-unique clustered index, but no duplicate values */
Select 'non-unique, no dups', page_count, avg_page_space_used_in_percent, record_count
    , min_record_size_in_bytes, max_record_size_in_bytes
From sys.dm_db_index_physical_stats(DB_ID(), Object_ID(N'nonUniqueNoDups'), Null, Null, N'Detailed') 
Where index_level = 0
Union All
/* Non-unique clustered index, duplicate values */
Select 'duplicates', page_count, avg_page_space_used_in_percent, record_count
    , min_record_size_in_bytes, max_record_size_in_bytes
From sys.dm_db_index_physical_stats(DB_ID(), Object_ID(N'nonUniqueDuplicates'), Null, Null, N'Detailed') 
Where index_level = 0;

Here’s the results:

type                page_count           avg_page_space_used_in_percent record_count         min_record_size_in_bytes max_record_size_in_bytes
------------------- -------------------- ------------------------------ -------------------- ------------------------ ------------------------
unique              914                  99.8055102545095               7305                 1009                     1009
non-unique, no dups 914                  99.8055102545095               7305                 1009                     1009
duplicates          1044                 88.066036570299                7305                 1009                     1017

I want to point out a couple of things. First, there is no difference in the number of pages between the non-unique clustered index with no duplicates ([nonUniqueNoDups]) and the unique clustered index ([uniqueClustered]). The table with duplicate clustered key values, however, requires 14% more pages to store the same amount of data. Secondly, the [max_record_size_in_bytes] of the [nonUniqueDuplicates] table is 8 bytes more than that of the other two. We’ll discuss why in a minute.

Now let’s take a look at the actual data pages. For this, I’m going to use my page internals proc.

EXECUTE dbo.dba_viewPageData_sp
      @databaseName = 'sandbox'
    , @tableName    = 'sandbox.dbo.uniqueClustered'
    , @indexName    = 'CIX_uniqueClustered';

I’m not going to post the entire results here, but I want to draw your attention to “m_slotCnt = 8″, which is near the top of the page. That means 8 records are stored on this page. Also, when you look near the end of the first record (Slot 0), you should see the following results:

Slot 0 Offset 0x60 Length 1009
Record Type = PRIMARY_RECORD         Record Attributes =  NULL_BITMAP     Record Size = 1009
Memory Dump @0x00A9C060
00000000:   1000ee03 c3150b01 00000064 61746120 †..î.Ã......data 
[...]
000003F0:   00†††††††††††††††††††††††††††††††††††.  
 
Slot 0 Column 1 Offset 0x4 Length 3 Length (physical) 3
myDate = 1990-01-01                  
 
Slot 0 Column 2 Offset 0x7 Length 4 Length (physical) 4
myNumber = 1                         
 
Slot 0 Column 3 Offset 0xb Length 995 Length (physical) 995
myColumn = data

Now let’s look at the table that has a non-unique clustered index but no duplicates:

Execute dbo.dba_viewPageData_sp
      @databaseName = 'sandbox'
    , @tableName    = 'sandbox.dbo.nonUniqueNoDups'
    , @indexName    = 'CIX_nonUniqueNoDups';

The m_slotCnt count is also 8 for this page. This time, let’s glance at the first and second records (Slot 0 and 1 respectively):

Slot 0 Offset 0x60 Length 1009
Record Type = PRIMARY_RECORD         Record Attributes =  NULL_BITMAP     Record Size = 1009
Memory Dump @0x62FDC060
00000000:   1000ee03 c3150b01 00000064 61746120 †..î.Ã......data    
[...]
000003F0:   00†††††††††††††††††††††††††††††††††††.                        
 
Slot 0 Column 0 Offset 0x0 Length 4 Length (physical) 0
UNIQUIFIER = 0                       
 
Slot 0 Column 1 Offset 0x4 Length 3 Length (physical) 3
myDate = 1990-01-01                  
 
Slot 0 Column 2 Offset 0x7 Length 4 Length (physical) 4
myNumber = 1                         
 
Slot 0 Column 3 Offset 0xb Length 995 Length (physical) 995
myColumn = data
 
 
Slot 1 Offset 0x451 Length 1009
Record Type = PRIMARY_RECORD         Record Attributes =  NULL_BITMAP     Record Size = 1009
Memory Dump @0x62FDC451
 
00000000:   1000ee03 c4150b02 00000064 61746120 †..î.Ä......data          
[...]
000003F0:   00†††††††††††††††††††††††††††††††††††.                        
 
Slot 1 Column 0 Offset 0x0 Length 4 Length (physical) 0
UNIQUIFIER = 0                       
 
Slot 1 Column 1 Offset 0x4 Length 3 Length (physical) 3
myDate = 1990-01-02                  
 
Slot 1 Column 2 Offset 0x7 Length 4 Length (physical) 4
myNumber = 2                         
 
Slot 1 Column 3 Offset 0xb Length 995 Length (physical) 995
myColumn = data

We now see a new addition to the row, “UNIQUIFIER = 0.” This is SQL Server’s way of managing row uniqueness internally. You’ll notice that, because the clustered key values are unique, the UNIQUIFIER is set to 0 and the row size is still 1009; for all intents and purposes, the UNIQUIFIER is not consuming any space.

Update: The DBCC God himself, Paul Randal, explained that non-dupes actually have a NULL UNIQUIFIER, which DBCC PAGE displays as a 0. Thanks for explaining, Paul! I wondered about that but chalked it up to SQL voodoo.

Now let’s look at our final case, a non-unique clustered index with duplicate key values:

Execute dbo.dba_viewPageData_sp
      @databaseName = 'sandbox'
    , @tableName    = 'sandbox.dbo.nonUniqueDuplicates'
    , @indexName    = 'CIX_nonUniqueDuplicates';

Here’s where things get interesting. The m_slotCnt value is now 7, which means we’re now storing 1 record less per page. Let’s look at the details:

Slot 0 Offset 0x60 Length 1009
Record Type = PRIMARY_RECORD         Record Attributes =  NULL_BITMAP     Record Size = 1009
Memory Dump @0x00A9C060
00000000:   1000ee03 df300b01 00000064 61746120 †..î.ß0.....data  
[...]
000003F0:   00†††††††††††††††††††††††††††††††††††.                        
 
Slot 0 Column 0 Offset 0x0 Length 4 Length (physical) 0
UNIQUIFIER = 0                       
 
Slot 0 Column 1 Offset 0x4 Length 3 Length (physical) 3
myDate = 2009-01-01                  
 
Slot 0 Column 2 Offset 0x7 Length 4 Length (physical) 4
myNumber = 1                         
 
Slot 0 Column 3 Offset 0xb Length 995 Length (physical) 995
myColumn = data
 
 
Slot 1 Offset 0x451 Length 1017
 
Record Type = PRIMARY_RECORD         Record Attributes =  NULL_BITMAP VARIABLE_COLUMNS
Record Size = 1017                   
Memory Dump @0x00A9C451
00000000:   3000ee03 df300b02 00000064 61746120 †0.î.ß0.....data  
[...]
000003F0:   000100f9 03010000 00†††††††††††††††††...ù.....                
 
Slot 1 Column 0 Offset 0x3f5 Length 4 Length (physical) 4
UNIQUIFIER = 1                       
 
Slot 1 Column 1 Offset 0x4 Length 3 Length (physical) 3
myDate = 2009-01-01                  
 
Slot 1 Column 2 Offset 0x7 Length 4 Length (physical) 4
myNumber = 2                         
 
Slot 1 Column 3 Offset 0xb Length 995 Length (physical) 995
myColumn = data

The first record, Slot 0, looks exactly the same as in the previous table; the UNIQUIFIER is 0 and the row size is 1009. The second record (Slot 1), however, now has a UNIQUIFIER value of 1 and the row size is 1017. If you notice, the “Record Attributes” of Slot 1 are also different, with the addition of “VARIABLE_COLUMNS.” This is because the UNIQUIFIER is stored as a variable column. The extra 8 bytes of overhead break down to 4 bytes to store the UNIQUIFIER, 2 bytes to store the variable column offset, and 2 bytes to store the variable count. The tables we created used all fixed-length columns; you may notice some minor overhead differences if your table already contains variable columns.

To summarize, there is indeed a difference in the page structure between a unique clustered index and a non-unique clustered index; however, there’s only a possible performance and space impact when storing duplicate clustered key values. So there you go, more detail than you ever wanted to know about clustered indexes and uniqueness!

Page Internals – Investigation Proc

As many of you know, I like to crawl around in page internals in my free time. It can be very enlightening, or just a good check to make sure that what you think is happening, is actually happening. To help with this process, I’ve created myself a little stored procedure that I can simply pass a few parameters to and have it return the page data for me. So for those who don’t have anything better to do who are as interested in page internals as I am, here’s my proc:

Create Procedure dbo.dba_viewPageData_sp
 
        /* Declare Parameters */
          @databaseName varchar(128)
        , @tableName    varchar(128)    = Null -- database.schema.tableName
        , @indexName    varchar(128)    = Null
        , @fileNumber   int             = Null
        , @pageNumber   int             = Null
        , @printOption  int             = 3    -- 0, 1, 2, or 3
        , @pageType     char(4)         = 'Leaf' -- Leaf, Root, or IAM
 
As
/*********************************************************************************
    Name:       dba_viewPageData_sp
 
    Author:     Michelle Ufford
 
    Purpose:    Retrieves page data for the specified table/page.
 
    Notes:      Can pass either the table name or the pageID, but must pass one, or
                you'll end up with no results. 
                If the table name is passed, it will return the first page.
 
        @tableName must be '<databaseName>.<schemaName>.<tableName>' in order to
            function correctly for cross-database joins.  
 
        @printOption can be one of following values:
            0 - print just the page header
            1 - page header plus per-row hex dumps and a dump of the page slot array
            2 - page header plus whole page hex dump
            3 - page header plus detailed per-row interpretation
 
        Page Options borrowed from: 
        https://blogs.msdn.com/sqlserverstorageengine/archive/2006/06/10/625659.aspx
 
        @pageType must be one of the following values:
            Leaf - returns the first page of the leaf level of your index or heap
            Root - returns the root page of your index
            IAM - returns the index allocation map chain for your index or heap
 
        Conversions borrowed from:
        http://sqlskills.com/blogs/paul/post/Inside-The-Storage-Engine-
        sp_AllocationMetadata-putting-undocumented-system-catalog-views-to-work.aspx
 
    Called by:  DBA
 
    Date        User    Description
    ----------------------------------------------------------------------------
    2009-05-06  MFU     Initial release for public consumption
*********************************************************************************
    Exec dbo.dba_viewPageData_sp
          @databaseName = 'AdventureWorks'
        , @tableName    = 'AdventureWorks.Sales.SalesOrderDetail'
        , @indexName    = 'IX_SalesOrderDetail_ProductID'
        --, @fileNumber   = 1
        --, @pageNumber   = 38208
        , @printOption  = 3
        , @pageType     = 'Root';
*********************************************************************************/
 
Set NoCount On;
Set XACT_Abort On;
Set Ansi_Padding On;
Set Ansi_Warnings On;
Set ArithAbort On;
Set Concat_Null_Yields_Null On;
Set Numeric_RoundAbort Off;
 
Begin
 
    Declare @fileID         int
        , @pageID           int
        , @sqlStatement     nvarchar(1200)
        , @sqlParameters    nvarchar(255)
        , @errorMessage     varchar(100);
 
    Begin Try
 
        If @fileNumber Is Null And @pageNumber Is Null And @tableName Is Null
        Begin
            Set @errorMessage = 'You must provide either a file/page number, or a table name!';
            RaisError(@errorMessage, 16, 1);
        End;
 
        If @pageType Not In ('Leaf', 'Root', 'IAM')
        Begin
            Set @errorMessage = 'You have entered an invalid page type; valid options are "Leaf", "Root", or "IAM"';
            RaisError(@errorMessage, 16, 1);
        End;
 
        If @fileNumber Is Null Or @pageNumber Is Null
        Begin
 
            Set @sqlStatement = 
            Case When @pageType = 'Leaf' Then
                'Select Top 1 @p_fileID = Convert (varchar(6), Convert (int, 
                    SubString (au.first_page, 6, 1) +
                    SubString (au.first_page, 5, 1)))
                , @p_pageID = Convert (varchar(20), Convert (int, 
                     SubString (au.first_page, 4, 1) +
                     SubString (au.first_page, 3, 1) +
                     SubString (au.first_page, 2, 1) +
                     SubString (au.first_page, 1, 1)))'
            When @pageType = 'Root' Then
                'Select Top 1 @p_fileID = Convert (varchar(6), Convert (int, 
                    SubString (au.root_page, 6, 1) +
                    SubString (au.root_page, 5, 1)))
                , @p_pageID = Convert (varchar(20), Convert (int, 
                     SubString (au.root_page, 4, 1) +
                     SubString (au.root_page, 3, 1) +
                     SubString (au.root_page, 2, 1) +
                     SubString (au.root_page, 1, 1)))'
            When @pageType = 'IAM' Then
                'Select Top 1 @p_fileID = Convert (varchar(6), Convert (int, 
                    SubString (au.first_iam_page, 6, 1) +
                    SubString (au.first_iam_page, 5, 1)))
                , @p_pageID = Convert (varchar(20), Convert (int, 
                     SubString (au.first_iam_page, 4, 1) +
                     SubString (au.first_iam_page, 3, 1) +
                     SubString (au.first_iam_page, 2, 1) +
                     SubString (au.first_iam_page, 1, 1)))'
            End + 
            'From ' + QuoteName(ParseName(@databaseName, 1)) + '.sys.indexes AS i
            Join ' + QuoteName(ParseName(@databaseName, 1)) + '.sys.partitions AS p
                On i.[object_id] = p.[object_id]
                And i.index_id = p.index_id
            Join ' + QuoteName(ParseName(@databaseName, 1)) + '.sys.system_internals_allocation_units AS au
                On p.hobt_id = au.container_id
            Where p.[object_id] = Object_ID(@p_tableName)
                And au.first_page > 0x000000000000 ' 
                + Case When @indexName Is Null 
                    Then ';' 
                    Else 'And i.name = @p_indexName;' End;
 
            Set @sqlParameters = '@p_tableName varchar(128)
                                , @p_indexName varchar(128)
                                , @p_fileID int OUTPUT
                                , @p_pageID int OUTPUT';
 
            Execute sp_executeSQL @sqlStatement
                        , @sqlParameters
                        , @p_tableName = @tableName
                        , @p_indexName = @indexName
                        , @p_fileID = @fileID OUTPUT
                        , @p_pageID = @pageID OUTPUT;
 
            End
            Else
            Begin
                Select @fileID = @fileNumber
                    , @pageID = @pageNumber;
            End;
 
        DBCC TraceOn (3604);
        DBCC Page (@databaseName, @fileID, @pageID, @printOption);
        DBCC TraceOff (3604);
 
    End Try
    Begin Catch
 
        Print @errorMessage;
 
    End Catch;
 
    Set NoCount Off;
    Return 0;
End
Go

This proc does have cross-database support, i.e. you can install it in your DBA database and use it to investigate data in other databases. Here’s an example…

    Exec dbo.dba_viewPageData_sp
          @databaseName = 'AdventureWorks'
        , @tableName    = 'AdventureWorks.Sales.SalesOrderDetail'
        , @indexName    = 'IX_SalesOrderDetail_ProductID';

… will return a nice data page:

DBCC execution completed. If DBCC printed error messages, contact your system administrator.
 
PAGE: (1:11000)
 
BUFFER:
 
BUF @0x0391F140
 
bpage = 0x0C0C0000                   bhash = 0x00000000                   bpageno = (1:11000)
bdbid = 7                            breferences = 0                      bUse1 = 35177
bstat = 0x1c00009                    blog = 0x21212159                    bnext = 0x00000000

[waits for the "oohs" and "aahs" to subside...]

I also give you the option to specify a specific page, in case you want to follow the page trail (i.e. m_nextPage). I’m not really providing support for partitions, although I do have a little dirty piece of code to return a hobt with data if possible (i.e. “first_page > 0×000000000000″).

Update: Special thanks to Jeremiah Peschka and Adam Machanic for showing me
QUOTENAME(PARSENAME(@databaseName, 1))! :)

Also, thanks to Paul Randal for his excellent blog posts on this very topic! Check out his blog post on DBCC Page. The conversion code was borrowed from his sp_AllocationMetadata proc.

Update 2: I’ve made some quick changes based on some feedback from Twitter. Thanks for the suggestions and hopefully you enjoy the updates.

Happy crawling!

Page Splitting & Rollbacks

So while I was at the grocery store last night, my mind wandered to SQL Server. This time, I was pondering what happens to a page split if the transaction is rolled back. I didn’t know the answer, but my guess was that the page split remained, since it would be less expensive for SQL Server to leave the data where it was sitting. Also, in theory, if the page split occurred once, it could need to occur again, so why undergo that same expense twice?

I decided to run a simple test to see what would happen. First, I created a really wide table and tossed 4 rows into it:

Create Table myTable
(
      id            int Primary Key
    , wideColumn    char(2000) 
);
 
Insert Into myTable
Select 1, Replicate('A', 2000) Union All
Select 2, Replicate('B', 2000) Union All
Select 4, Replicate('C', 2000) Union All
Select 5, Replicate('D', 2000);

I left an open spot for id=3, so I can force a page split. Next, I looked at the page data using the script I posted in February.

Here’s what I saw:

Before

Before

Pay attention to the 2 items boxed in red. m_slotCnt tells us how many records are on the page, and m_nextPage tells us the address of the next page. As you may have guessed, a m_nextPage value of 0:0 means you’re looking at the last page allocated to the object.

Now let’s insert a record, roll it back, and see what happens:

Begin Transaction;
 
Insert Into myTable
Select 3, Replicate('E', 2000);
 
Rollback Transaction;

I ran my DBCC Page command again and here’s what I saw:

After

After

As you can see, m_slotCnt is now 2, and m_nextPage is no longer 0:0 (although your actual page number will probably be different than mine). If I pull up the new page, I find my 2 relocated records, id’s 4 and 5.

So what’s this all mean? In short, page splits are NOT reversed when a transaction is rolled back. Why should you care? Well, you probably wouldn’t care much, unless you roll back a lot of transactions. But this is also a good thing to keep in mind if you have to abort a really large insert or update; if you don’t plan to re-execute the aborted script, you may want to defrag your indexes to fix the splits.

Source: http://sqlfool.com/2009/04/page-splitting-rollbacks/

Digging Around in SQL Internals – View Page Data

So lately I’ve been geeking out on SQL internals. My most recent find involves a couple of undocumented toys, DBCC Page and sys.system_internals_allocation_units.

DBCC Page will actually display the contents of a page. I found a blog post by Paul Randal, back when he still worked for Microsoft, describing the DBCC Page command. Here’s a summary:

dbcc page ( {‘dbname’ | dbid}, filenum, pagenum [, printopt={0|1|2|3} ])

The filenum and pagenum parameters are taken from the page IDs that come from various system tables and appear in DBCC or other system error messages. A page ID of, say, (1:354) has filenum = 1 and pagenum = 354.

The printopt parameter has the following meanings:

* 0 – print just the page header
* 1 – page header plus per-row hex dumps and a dump of the page slot array (unless its a page that doesn’t have one, like allocation bitmaps)
* 2 – page header plus whole page hex dump
* 3 – page header plus detailed per-row interpretation

The per-row interpretation work for all page types, including allocation bitmaps.

We’ll come back to actually viewing a page in just a minute. Now that we know we can view page contents, how do we find out which pages contain the data we want to look at? This is where sys.system_internals_allocation_units can help. The sys.system_internals_allocation_units DMV is just like sys.allocation_units, except it has a few additional columns of interest: [first_page], [root_page], and [first_iam_page]. The query below will return the filenum and pagenum values, for use in the DBCC Page command, for a specific table.

Select Object_Name(p.object_id) As 'tableName'
    , i.name As 'indexName'
    , p.partition_number
    , au.type_desc
    , Convert (varchar(6),
      Convert (int, SubString (au.first_page, 6, 1) +
         SubString (au.first_page, 5, 1))) +
   ':' + Convert (varchar(20),
      Convert (int, SubString (au.first_page, 4, 1) +
         SubString (au.first_page, 3, 1) +
         SubString (au.first_page, 2, 1) +
         SubString (au.first_page, 1, 1))) As 'firstPage'
    , Convert (varchar(6),
      Convert (int, SubString (au.root_page, 6, 1) +
         SubString (au.root_page, 5, 1))) +
   ':' + Convert (varchar(20),
      Convert (int, SubString (au.root_page, 4, 1) +
         SubString (au.root_page, 3, 1) +
         SubString (au.root_page, 2, 1) +
         SubString (au.root_page, 1, 1))) As 'rootPage'
    , Convert (varchar(6),
      Convert (int, SubString (au.first_iam_page, 6, 1) +
         SubString (au.first_iam_page, 5, 1))) +
   ':' + Convert (varchar(20),
      Convert (int, SubString (au.first_iam_page, 4, 1) +
         SubString (au.first_iam_page, 3, 1) +
         SubString (au.first_iam_page, 2, 1) +
         SubString (au.first_iam_page, 1, 1))) As 'firstIAM_page'
From sys.indexes As i
Join sys.partitions As p
    On i.object_id = p.object_id
    And i.index_id = p.index_id
Join sys.system_internals_allocation_units As au
    On p.hobt_id = au.container_id
Where Object_Name(p.object_id) = 'ProductReview'
Order By tableName;

Note: the conversion code was borrowed from one of Kimberley Tripp’s posts, see the References section for the link.

Running the preceding query in the AdventureWorks database will produce the following:

tableName  indexName                           partition_number type_desc    firstPage  rootPage  firstIAM_page
---------- ----------------------------------- ---------------- ------------ ---------- --------- --------------
ProductRev PK_ProductReview_ProductReviewID    1                IN_ROW_DATA  1:770      1:773     1:771
ProductRev IX_ProductReview_ProductID_Name     1                IN_ROW_DATA  1:911      1:914     1:912

Great, now we have a starting place! Let’s now take DBCC Page out for a spin and see what it can do. I’m going to use the [firstPage] value for the IX_ProductReview_ProductID_Name index. Remember, the value preceding the colon (:) is your file number, and the value after it is your page number.

DBCC TraceOn (3604);
DBCC Page (AdventureWorks, 1, 911, 3);
DBCC TraceOff (3604);

You should get back something like the following. (Note: for brevity’s sake, I’m only displaying part of the results)

Allocation Status
 
GAM (1:2) = ALLOCATED                SGAM (1:3) = NOT ALLOCATED           
PFS (1:1) = 0x60 MIXED_EXT ALLOCATED   0_PCT_FULL                         DIFF (1:6) = CHANGED
ML (1:7) = NOT MIN_LOGGED            
FileId PageId      Row    Level  ProductID (key) ReviewerName (key)                                 ProductReviewID (key) Comments
------ ----------- ------ ------ --------------- -------------------------------------------------- --------------------- -------------------------------------------------------------------------
1      911         0      0      709             John Smith                                         1                     I can't believe I'm singing the praises of a pair of socks, but...(79025483b74e)

Let’s take a moment to consider why we’re seeing this. You’ll notice there’s 3 columns with (key) next to the name: [ProductID], [ReviewerName], and [ProductReviewID]. There’s one column without (key): [Comments]. This is exactly what I was expecting to see. Why? If you take a look at the details of IX_ProductReview_ProductID_Name, you’ll notice it’s a non-clustered index on only [ProductID, ReviewerName], with an included column, [Comments]. You’ll also notice the clustered index on Production.ProductReview is [ProductReviewID]. Remember, the composition of a non-clustered index includes the clustered index value as a sort of row pointer.

Not sure how to view the details of an index, like included columns? Check out my Index Interrogation Script.

I was interested in the ordering of the data. I wasn’t sure whether I’d find [ProductReviewID] first or last. But after looking at several of these examples, I’ve always found the clustered index to be nestled behind the non-clustered index columns and before the included columns (as exampled above).

That’s all I have for now. Try it out, play with the different printopt values, look at different types of pages, and have fun. Oh, and if anyone has a link to a blog post or a book that explains each of the items in the page header, I’d greatly appreciate it. :)

Happy Digging!

Michelle

References:

Source: http://sqlfool.com/2009/02/view_page_data/