Overhead in Non-Unique Clustered Indexes

I’ve received a couple of questions regarding my article, Performance Considerations of Data Types, and the overhead associated with non-unique clustered indexes. I started to respond via e-mail, but my response was so long I decided to turn it into a blog post instead. ;)

I should start by clarifying that non-unique clustered indexes do not necessarily consume more space and overhead; it depends on the data stored. If you have duplicate clustered key values, the first instance of the value will be handled as though it were unique. Any subsequent values, however, will incur overhead to manage the uniquifier that SQL Server adds to maintain row uniqueness. This same overhead is also incurred in non-clustered indexes, too, adding to the overall expense of this approach.

I think it helps to actually look at the data, so let’s walk through a few different common scenarios. We’ll create a table with a unique clustered index, a table with a non-unique clustered index but no duplicates, and a table with duplicate key values.

Also, a little warning that I started to write this in SQL Server 2008, and since I’m on a 2008 kick, I decided to leave it that way. You can modify this pretty easily to work in 2005, if necessary.

Use sandbox;
Go
 
/* Unique, clustered index, no duplicate values */
Create Table dbo.uniqueClustered
(
      myDate    date        Not Null
    , myNumber  int         Not Null
    , myColumn  char(995)   Not Null
);
 
Create Unique Clustered Index CIX_uniqueClustered
    On dbo.uniqueClustered(myDate);
 
 
/* Non-unique clustered index, but no duplicate values */
Create Table dbo.nonUniqueNoDups
(
      myDate    date        Not Null
    , myNumber  int         Not Null
    , myColumn  char(995)   Not Null
);
 
Create Clustered Index CIX_nonUniqueNoDups
    On dbo.nonUniqueNoDups(myDate);
 
 
/* Non-unique clustered index, duplicate values */
Create Table dbo.nonUniqueDuplicates
(
      myDate    date        Not Null
    , myNumber  int         Not Null
    , myColumn  char(995)   Not Null
);
 
Create Clustered Index CIX_nonUniqueDuplicates
    On dbo.nonUniqueDuplicates(myDate);

I’m going to use the date data type in 2008 for my clustered index key. To ensure uniqueness for the first two tables, I’ll iterate through a few years’ worth of dates. This is typical of what you may see in a data mart, where you’d have one record with an aggregation of each day’s data. For the table with duplicate values, I’m going to insert the same date for each row.

/* Populate some test data */
Set NoCount On;
Declare @myDate date = '1990-01-01'
    , @myNumber int = 1;
 
While @myDate < '2010-01-01'
Begin
 
    Insert Into dbo.uniqueClustered
    Select @myDate, @myNumber, 'data';
 
    Insert Into dbo.nonUniqueNoDups
    Select @myDate, @myNumber, 'data';
 
    Insert Into dbo.nonUniqueDuplicates
    Select '2009-01-01', @myNumber, 'data';
 
    Select @myDate = DateAdd(day, 1, @myDate)
        , @myNumber += 1;
 
End;

After running the above script, each table should have 7,305 records. This is obviously pretty small for a table, but it’ll serve our purposes. Now let’s take a look at the size of our tables:

/* Look at the details of our indexes */
 
/* Unique, clustered index, no duplicate values */
Select 'unique' As 'type', page_count, avg_page_space_used_in_percent, record_count
    , min_record_size_in_bytes, max_record_size_in_bytes
From sys.dm_db_index_physical_stats(DB_ID(), Object_ID(N'uniqueClustered'), Null, Null, N'Detailed') 
Where index_level = 0
Union All
/* Non-unique clustered index, but no duplicate values */
Select 'non-unique, no dups', page_count, avg_page_space_used_in_percent, record_count
    , min_record_size_in_bytes, max_record_size_in_bytes
From sys.dm_db_index_physical_stats(DB_ID(), Object_ID(N'nonUniqueNoDups'), Null, Null, N'Detailed') 
Where index_level = 0
Union All
/* Non-unique clustered index, duplicate values */
Select 'duplicates', page_count, avg_page_space_used_in_percent, record_count
    , min_record_size_in_bytes, max_record_size_in_bytes
From sys.dm_db_index_physical_stats(DB_ID(), Object_ID(N'nonUniqueDuplicates'), Null, Null, N'Detailed') 
Where index_level = 0;

Here’s the results:

type                page_count           avg_page_space_used_in_percent record_count         min_record_size_in_bytes max_record_size_in_bytes
------------------- -------------------- ------------------------------ -------------------- ------------------------ ------------------------
unique              914                  99.8055102545095               7305                 1009                     1009
non-unique, no dups 914                  99.8055102545095               7305                 1009                     1009
duplicates          1044                 88.066036570299                7305                 1009                     1017

I want to point out a couple of things. First, there is no difference in the number of pages between the non-unique clustered index with no duplicates ([nonUniqueNoDups]) and the unique clustered index ([uniqueClustered]). The table with duplicate clustered key values, however, requires 14% more pages to store the same amount of data. Secondly, the [max_record_size_in_bytes] of the [nonUniqueDuplicates] table is 8 bytes more than that of the other two. We’ll discuss why in a minute.

Now let’s take a look at the actual data pages. For this, I’m going to use my page internals proc.

EXECUTE dbo.dba_viewPageData_sp
      @databaseName = 'sandbox'
    , @tableName    = 'sandbox.dbo.uniqueClustered'
    , @indexName    = 'CIX_uniqueClustered';

I’m not going to post the entire results here, but I want to draw your attention to “m_slotCnt = 8″, which is near the top of the page. That means 8 records are stored on this page. Also, when you look near the end of the first record (Slot 0), you should see the following results:

Slot 0 Offset 0x60 Length 1009
Record Type = PRIMARY_RECORD         Record Attributes =  NULL_BITMAP     Record Size = 1009
Memory Dump @0x00A9C060
00000000:   1000ee03 c3150b01 00000064 61746120 †..î.Ã......data 
[...]
000003F0:   00†††††††††††††††††††††††††††††††††††.  
 
Slot 0 Column 1 Offset 0x4 Length 3 Length (physical) 3
myDate = 1990-01-01                  
 
Slot 0 Column 2 Offset 0x7 Length 4 Length (physical) 4
myNumber = 1                         
 
Slot 0 Column 3 Offset 0xb Length 995 Length (physical) 995
myColumn = data

Now let’s look at the table that has a non-unique clustered index but no duplicates:

Execute dbo.dba_viewPageData_sp
      @databaseName = 'sandbox'
    , @tableName    = 'sandbox.dbo.nonUniqueNoDups'
    , @indexName    = 'CIX_nonUniqueNoDups';

The m_slotCnt count is also 8 for this page. This time, let’s glance at the first and second records (Slot 0 and 1 respectively):

Slot 0 Offset 0x60 Length 1009
Record Type = PRIMARY_RECORD         Record Attributes =  NULL_BITMAP     Record Size = 1009
Memory Dump @0x62FDC060
00000000:   1000ee03 c3150b01 00000064 61746120 †..î.Ã......data    
[...]
000003F0:   00†††††††††††††††††††††††††††††††††††.                        
 
Slot 0 Column 0 Offset 0x0 Length 4 Length (physical) 0
UNIQUIFIER = 0                       
 
Slot 0 Column 1 Offset 0x4 Length 3 Length (physical) 3
myDate = 1990-01-01                  
 
Slot 0 Column 2 Offset 0x7 Length 4 Length (physical) 4
myNumber = 1                         
 
Slot 0 Column 3 Offset 0xb Length 995 Length (physical) 995
myColumn = data
 
 
Slot 1 Offset 0x451 Length 1009
Record Type = PRIMARY_RECORD         Record Attributes =  NULL_BITMAP     Record Size = 1009
Memory Dump @0x62FDC451
 
00000000:   1000ee03 c4150b02 00000064 61746120 †..î.Ä......data          
[...]
000003F0:   00†††††††††††††††††††††††††††††††††††.                        
 
Slot 1 Column 0 Offset 0x0 Length 4 Length (physical) 0
UNIQUIFIER = 0                       
 
Slot 1 Column 1 Offset 0x4 Length 3 Length (physical) 3
myDate = 1990-01-02                  
 
Slot 1 Column 2 Offset 0x7 Length 4 Length (physical) 4
myNumber = 2                         
 
Slot 1 Column 3 Offset 0xb Length 995 Length (physical) 995
myColumn = data

We now see a new addition to the row, “UNIQUIFIER = 0.” This is SQL Server’s way of managing row uniqueness internally. You’ll notice that, because the clustered key values are unique, the UNIQUIFIER is set to 0 and the row size is still 1009; for all intents and purposes, the UNIQUIFIER is not consuming any space.

Update: The DBCC God himself, Paul Randal, explained that non-dupes actually have a NULL UNIQUIFIER, which DBCC PAGE displays as a 0. Thanks for explaining, Paul! I wondered about that but chalked it up to SQL voodoo.

Now let’s look at our final case, a non-unique clustered index with duplicate key values:

Execute dbo.dba_viewPageData_sp
      @databaseName = 'sandbox'
    , @tableName    = 'sandbox.dbo.nonUniqueDuplicates'
    , @indexName    = 'CIX_nonUniqueDuplicates';

Here’s where things get interesting. The m_slotCnt value is now 7, which means we’re now storing 1 record less per page. Let’s look at the details:

Slot 0 Offset 0x60 Length 1009
Record Type = PRIMARY_RECORD         Record Attributes =  NULL_BITMAP     Record Size = 1009
Memory Dump @0x00A9C060
00000000:   1000ee03 df300b01 00000064 61746120 †..î.ß0.....data  
[...]
000003F0:   00†††††††††††††††††††††††††††††††††††.                        
 
Slot 0 Column 0 Offset 0x0 Length 4 Length (physical) 0
UNIQUIFIER = 0                       
 
Slot 0 Column 1 Offset 0x4 Length 3 Length (physical) 3
myDate = 2009-01-01                  
 
Slot 0 Column 2 Offset 0x7 Length 4 Length (physical) 4
myNumber = 1                         
 
Slot 0 Column 3 Offset 0xb Length 995 Length (physical) 995
myColumn = data
 
 
Slot 1 Offset 0x451 Length 1017
 
Record Type = PRIMARY_RECORD         Record Attributes =  NULL_BITMAP VARIABLE_COLUMNS
Record Size = 1017                   
Memory Dump @0x00A9C451
00000000:   3000ee03 df300b02 00000064 61746120 †0.î.ß0.....data  
[...]
000003F0:   000100f9 03010000 00†††††††††††††††††...ù.....                
 
Slot 1 Column 0 Offset 0x3f5 Length 4 Length (physical) 4
UNIQUIFIER = 1                       
 
Slot 1 Column 1 Offset 0x4 Length 3 Length (physical) 3
myDate = 2009-01-01                  
 
Slot 1 Column 2 Offset 0x7 Length 4 Length (physical) 4
myNumber = 2                         
 
Slot 1 Column 3 Offset 0xb Length 995 Length (physical) 995
myColumn = data

The first record, Slot 0, looks exactly the same as in the previous table; the UNIQUIFIER is 0 and the row size is 1009. The second record (Slot 1), however, now has a UNIQUIFIER value of 1 and the row size is 1017. If you notice, the “Record Attributes” of Slot 1 are also different, with the addition of “VARIABLE_COLUMNS.” This is because the UNIQUIFIER is stored as a variable column. The extra 8 bytes of overhead break down to 4 bytes to store the UNIQUIFIER, 2 bytes to store the variable column offset, and 2 bytes to store the variable count. The tables we created used all fixed-length columns; you may notice some minor overhead differences if your table already contains variable columns.

To summarize, there is indeed a difference in the page structure between a unique clustered index and a non-unique clustered index; however, there’s only a possible performance and space impact when storing duplicate clustered key values. So there you go, more detail than you ever wanted to know about clustered indexes and uniqueness!

0saves
If you enjoyed this post, please consider leaving a comment or subscribing to the RSS feed to have future articles delivered to your feed reader.
Tagged , , , . Bookmark the permalink.

8 Responses to Overhead in Non-Unique Clustered Indexes

  1. Paul Randal says:

    Hey Michelle,

    Actually in the case where there’s no duplicate in the non-unique clustered index, the UNIQUIFIER column is actually NULL, not 0 – otherwise it would have to take up space in the record. When I wrote DBCC PAGE, I decided to dump it out as a zero, to make more sense in the progression of UNIQUIFIER values when there are duplicates.

    Thanks

  2. Jack Corbett says:

    Great post clearly explaining how clustered indexes work.

  3. Pingback: Log Buffer #147: a Carnival of the Vanities for DBAs | Pythian Group Blog

  4. Rob Boek says:

    Great post. Thanks!

  5. Pingback: Undocumented Function in SQL 2008 : SQL Fool

  6. SQLSharma says:

    Love the post Michelle!
    One question though, how do you arrive at the 2 bytes for variable column offset, and 2 bytes to store the variable count? Or am I missing that output somewhere….
    Thank you!

  7. Natasha says:

    Michelle – what is the overhead of using the ignore_dup_key option ON with a unique clustered index – instead of the non-unique clustered index? which option would be better?

  8. Shrikant says:

    thank you for excellent post!!
    what is impact on fragmentation and page split if I create cluster index on date column which is part of partition key ?

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>