Indexing for Partitioned Tables

So you’ve partitioned your table, and now you’re ready to performance tune. As with any table, indexing is a great place to start. And if you’re like most people new to partitioning, you probably created all of your indexes on the partitioned scheme, either by design or unintentionally.

Let’s take a step back here and discuss what a partitioned index is. A partitioned index, like a partitioned table, separates data into different physical structures (partitions) under one logical name. Specifically, each partition in a nonclustered index contains its own B-tree structure with a subset of rows, based upon the partitioning scheme. By default, an unpartitioned nonclustered index has just one partition.

Keep in mind, when you create an index on a partitioned table, i.e.

Create NonClustered Index IX_myIndex
    On dbo.myTable(myColumn);

… you are creating the index on the partitioned scheme by default. In order to create a NON-partitioned index on that same table, you would need to explicitly declare “On [FileGroup]“, i.e.

Create NonClustered Index IX_myIndex
    On dbo.myTable(myColumn)
    On [Primary];

 

But should you partition your index? That depends on how you use it. In fact, most environments will probably want to use a mix of partitioned and non-partitioned indexes. I’ve found that that partitioned indexes perform better when aggregating data or scanning partitions. Conversely, you’ll probably find that, if you need to locate a single, specific record, nothing performs better than a non-partitioned index on that column.

Let’s walk through some examples and see how they perform. I’ll bring back my trusty ol’ orders table for this.

/* Create a partition function. */
Create Partition Function 
    [test_monthlyDateRange_pf] (datetime)
    As Range Right For Values
    ('2009-01-01', '2009-01-08', '2009-01-15'
    , '2009-01-22', '2009-01-29');
Go
 
/* Associate the partition function with a partition scheme. */
Create Partition Scheme test_monthlyDateRange_ps
    As Partition test_monthlyDateRange_pf
    All To ([Primary]);
Go
 
/* Create a partitioned table. */
Create Table dbo.orders
(
      order_id  int Identity(1,1)   Not Null
    , orderDate datetime            Not Null
    , orderData smalldatetime       Not Null
 
    Constraint PK_orders Primary Key Clustered
    (
        order_id
      , orderDate
    )
) On test_monthlyDateRange_ps(orderDate);
Go
 
/* Create some records to play with. */
Set NoCount On;
 
Declare @endDate datetime = '2009-01-01';
 
While @endDate < '2009-02-01'
Begin
 
    Insert Into dbo.orders
    Select @endDate, @endDate;
 
    Set @endDate = DATEADD(minute, 1, @endDate);
 
End;
 
Set NoCount Off;
 
 
/* Let’s create an aligned, partitioned index. */
Create NonClustered Index IX_orders_aligned
    On dbo.orders(order_id)
    On test_monthlyDateRange_ps(orderDate); 
    /* you don't actually need to declare the last
       line of this unless you want to create the
       index on a different partitioning scheme.   */
 
/* Now let’s create an unpartitioned index. */
Create NonClustered Index IX_orders_unpartitioned
    On dbo.orders(order_id)
    On [Primary];

 

Now that we have both a partitioned and an unpartitioned index, let’s take a look at our sys.partitions table:

/* Let's take a look at our index partitions */
Select i.name
    , i.index_id
    , p.partition_number
    , p.rows
From sys.partitions As p
Join sys.indexes As i
    On p.object_id = i.object_id 
   And p.index_id = i.index_id
Where p.object_id = object_id('orders')
Order By i.index_id, p.partition_number;

 

sys.partitions

sys.partitions

As expected, both of our partitioned indexes, PK_orders and IX_orders_aligned, have 6 partitions, with a subset of rows on each partition. Our unpartitioned non-clustered index, IX_orders_unpartitioned, on the other hand has just 1 partition containing all of the rows.

Now that we have our environment set up, let’s run through some different queries and see the performance impact of each type of index.

/* Query 1, specific record look-up, covered */
Select order_id, orderDate
From dbo.orders With (Index(IX_orders_aligned))
Where order_id = 25000;
 
Select order_id, orderDate
From dbo.orders With (Index(IX_orders_unpartitioned))
Where order_id = 25000;

 

Query 1

Query 1

The unpartitioned index performs significantly better when given a specific record to look-up. Now let’s try the same query, but utilizing a scan instead of a seek:

/* Query 2, specific record look-up, uncovered */
Select order_id, orderDate, orderData
From dbo.orders With (Index(IX_orders_aligned))
Where order_id = 30000;
 
Select order_id, orderDate, orderData
From dbo.orders With (Index(IX_orders_unpartitioned))
Where order_id = 30000;

 

Query 2

Query 2

Again we see that the non-partitioned index performs better with the single-record look-up. This can lead to some pretty dramatic performance implications. So when *would* we want to use a partitioned index? Two instances immediately pop to mind. First, partition switching can only be performed when all indexes on a table are aligned. Secondly, partitioned indexes perform better when manipulating large data sets. To see this in action, let’s try some simple aggregation…

/* Query 3, aggregation */
Select Cast(Round(Cast(orderdate As float), 0, 1) As smalldatetime) As 'order_date'
    , Count(*)
From dbo.orders With (Index(IX_orders_aligned))
Where orderDate Between '2009-01-01' And '2009-01-07 23:59'
Group By Cast(Round(Cast(orderdate As float), 0, 1) As smalldatetime) 
Order By Cast(Round(Cast(orderdate As float), 0, 1) As smalldatetime);
 
Select Cast(Round(Cast(orderdate As float), 0, 1) As smalldatetime) As 'order_date'
    , Count(*)
From dbo.orders With (Index(IX_orders_unpartitioned))
Where orderDate Between '2009-01-01' And '2009-01-07 23:59'
Group By Cast(Round(Cast(orderdate As float), 0, 1) As smalldatetime) 
Order By Cast(Round(Cast(orderdate As float), 0, 1) As smalldatetime);

 

Query 3

Query 3

As you can see, partitioned indexes perform better when aggregating data. This is just a simple example, but the results can be even more dramatic in a large production environment. This is one of the reasons why partitioned tables and indexes are especially beneficial in data warehouses.

So now you have a general idea of what a partitioned index is and when to use a partitioned index vs a non-partitioned index. Ultimately, your indexing needs will depend largely on the application and how the data is used. When in doubt, test, test, test! So to recap…

  • Specify “On [FileGroup]“ to create an unpartitioned index on a partitioned table
  • Consider using non-partitioned indexes for single-record look-ups
  • Use partitioned indexes for multiple records and data aggregations
  • To enable partition switching, all indexes on the table must be aligned.

For more information on partitioning, check out my other partitioning articles:

Partitioning Example
Partitioning 101
Tips for Large Data Stores

0saves
If you enjoyed this post, please consider leaving a comment or subscribing to the RSS feed to have future articles delivered to your feed reader.
Tagged , , , . Bookmark the permalink.

14 Responses to Indexing for Partitioned Tables

  1. senthilsjc says:

    Great..good artcile .. keep it up..thaks lot ..this what i expect for the one month..

  2. mahesh says:

    valuable points.. thanks for posting.

  3. veerabhadram says:

    hi,
    it is very understandable, keep it up.

    Here i have one doubt based on “index on partition” .
    As per my understanding, if we create index on partition table that data pages will be in an order and the leaf level pages will be added to the based on partition key if the statement is correct then why there is a difference in execution cost. I believe my thought was wrong, can you tell me how the internal architecture of index if we have partitions.

  4. Yogish Bhat says:

    Nice Article explained in detail. Thank You

  5. Why wouldn’t the clustered index (in this case, the PK) be
    (orderDate, order_ID)
    instead of
    (order_ID, orderDate)?

    Isn’t the first job of the partition table to determine which partition range to work with?

  6. Maulin Thaker says:

    Hi,

    First of all a very good article…
    I want to partition Few TABLEs having trillions of data in it. Most of the data is historical. The issue is that the tables are already either in MERGE or in Transactional replication. How to partition the table to get the best performance.

    awating for your reply.
    Maulin.

  7. Pingback: Partition Table TXT | MaulinThaker

  8. Pingback: Less Than Dot - Blog - Awesome

  9. umasankar says:

    Nice article

  10. Pingback: SQL Server Partitioning 101 | phoebix

  11. Pingback: SQL Server Partitioning 101 - SQL Server - SQL Server - Toad World

  12. Thomas Franz says:

    Your example is correct but you forgot to mention the main purpose for this behavior: you queried in the slow examples only for order_id = xyz.

    There are many articles (even from MS) online, that strongly recommend, that – if you are using partitioning – EVERY query should contain the partitioning key (order_date in your example) in the WHERE / JOIN condition.

    If you ignore this suggestion the SQL server has to scan every partial index for the order_id in the aligned index, which will be slower as to scan only one full non-aligned index.

    When you put an “order_date = xxx” or at least an “order_date between ” into your query the aligned index would be faster than the unaligned.

    If you are not able to do this (because you don’t know the order_date), you would have had choosen the order_id as partition key.

    I did this for one of our big tables and created a SQL Agent job, that triggers at midnight of the first day of every month (or week or quarter or year or just midnight …) and adds another file to the partition scheme and did a split on the partition function with the max(order_id) at this moment:


    -- untested code example (just to demonstrade)
    ALTER PARTITION SCHEME partition_scheme_name NEXT USED [ filegroup_name ];

    declare @maxId int = (select max(order_id) from dbo.orders);
    ALTER PARTITION FUNCTION test_monthlyDateRange_pf
    SPLIT RANGE (@maxId);

    Alternatively you could define, that every order_id (should be a bigint) has to start with the date f.e. yyyymmdd (f.e. order_id =20090101000001 instead of id = 1) or yyyymm or just the number of months since f.e. 2000-01-01. In this case you would only have to modify your sequence / sequence table / identity column in the job to start with the new range and could create a partition function that splits at order_id = 20090101000000, 20090108000000, 20090115000001, 20090122000001 …

    Downside of this (compared to the first suggestion (new SPLIT)):
    - you have eighter to to use bigints or can’t split daily or may have only few entries per day (in this case you should not split per day)
    - if you have an sequence table and a procedure, that returns you the next sequence number you can’t use bigints because procedures can only return ints (except you are using an output parameter)

  13. Thomas Franz says:

    PS:
    There is another reason to use the order_id as partition key:
    The partition key has to be part of the PRIMARY KEY and of every UNIQUE KEY in your table.

    Okay, you may think, than I would just add the order_date to the primary key and create a trigger that enforces the uniquess of ther order_id over the table (slow because you don’t have the order_date this time).

    But what if you are using FOREIGN KEYs? Those are assigned to the PRIMARY KEY and if you add the order_date to the PRIMARY KEY, you have to add it to every table that have a foreign key alignement to the orders table too.

    And this would have many impacts to your whole application and would result in tons of code lines that have to been modified – except you would use another trigger to manual enforce the consistency, but even in this case every JOIN would be slow because of the missing order_date in the child tables.

    PPS: if you are using an UNID instead an int / bigint as primary key you are lost …

  14. praveen says:

    i am getting this error by alter switch statement moving the data one partition table to another partition table . can you please me to solve this error

    Msg 7733, Level 16, State 4, Line 1

    ‘ALTER TABLE SWITCH’ statement failed. The table ‘ABIDataWarehouse.dim.Type1′ is partitioned while index ‘UK_type1′ is not partitioned.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>