T-SQL Script for Estimating Compression Savings

A couple of weeks ago, I was working on a Microsoft PDW proof-of-concept (POC) and had to measure compression ratios. In order to do this, I fired up SSMS and wrote a little script. The script will iterate through all tables in a database and run the sp_estimate_data_compression_savings stored procedure. This will only work in SQL Server 2008+ versions running Enterprise edition.

If you’re not familiar with this stored procedure, it basically will tell you what effect PAGE or ROW compression will have on your table/index/partition, etc. There are pro’s and con’s with compression. What I’ve tended to see is that compression has very positive results on space, IO, and query duration, with a negative impact on CPU and write speed. Like most things, it’s a trade-off and the results will vary by environment, so I recommend you do some testing before you apply compression to all tables. I tend to use compression mostly for my historical tables and partitions and leave my recent data uncompressed. And, back to the script, I use this stored procedure to estimate the impact of compression and to determine whether to use PAGE or ROW compression. PAGE is a higher level of compression, which means it’s also more expensive in terms of CPU, so if the difference between the two results is negligible, I’m more apt to just use ROW compression.

Now that my impromptu compression discussion is done, let’s get to the actual script. One final word of caution, however. This is an IO intensive process, so you may want to run it after peak business hours.

DECLARE @printOnly  BIT = 0 -- change to 1 if you don't want to execute, just print commands
    , @tableName    VARCHAR(256)
    , @schemaName   VARCHAR(100)
    , @sqlStatement NVARCHAR(1000)
    , @tableCount   INT
    , @statusMsg    VARCHAR(1000);
IF EXISTS(SELECT * FROM tempdb.sys.tables WHERE name LIKE '%#tables%')
    DROP TABLE #tables; 
      database_name     sysname
    , schemaName        sysname NULL
    , tableName         sysname NULL
    , processed         bit
IF EXISTS(SELECT * FROM tempdb.sys.tables WHERE name LIKE '%#compression%')
    DROP TABLE #compressionResults;
IF NOT EXISTS(SELECT * FROM tempdb.sys.tables WHERE name LIKE '%#compression%')
    CREATE TABLE #compressionResults
          objectName                    varchar(100)
        , schemaName                    varchar(50)
        , index_id                      int
        , partition_number              int
        , size_current_compression      bigint
        , size_requested_compression    bigint
        , sample_current_compression    bigint
        , sample_requested_compression  bigint
    , SCHEMA_NAME([schema_id])
    , name
    , 0 -- unprocessed
FROM sys.tables;
SELECT @tableCount = COUNT(*) FROM #tables;
WHILE EXISTS(SELECT * FROM #tables WHERE processed = 0)
    SELECT TOP 1 @tableName = tableName
        , @schemaName = schemaName
    FROM #tables WHERE processed = 0;
    SELECT @statusMsg = 'Working on ' + CAST(((@tableCount - COUNT(*)) + 1) AS VARCHAR(10)) 
        + ' of ' + CAST(@tableCount AS VARCHAR(10))
    FROM #tables
    WHERE processed = 0;
    RAISERROR(@statusMsg, 0, 42) WITH NOWAIT;
    SET @sqlStatement = 'EXECUTE sp_estimate_data_compression_savings ''' 
                        + @schemaName + ''', ''' + @tableName + ''', NULL, NULL, ''PAGE'';' -- ROW, PAGE, or NONE
    IF @printOnly = 1
        SELECT @sqlStatement;
        INSERT INTO #compressionResults
        EXECUTE sp_executesql @sqlStatement;
    UPDATE #tables
    SET processed = 1
    WHERE tableName = @tableName
        AND schemaName = @schemaName;
FROM #compressionResults;

Partitioning Tricks

For those of you who are using partitioning, or who are considering using partitioning, allow me to share some tips with you.

Easy Partition Staging Tables

Switching partitions (or more specifically, hobts) in and out of a partitioned table requires the use of a staging table. The staging table has very specific requirements: it must be completely identical to the partitioned table, including indexing structures, and it must have a check constraint that limits data to the partitioning range. Thanks to my co-worker Jeff, I’ve recently started using the SQL Server Partition Management tool on CodePlex. I haven’t used the automatic partition switching feature — frankly, using any sort of data modification tool in a production environment makes me nervous — but I’ve been using the scripting option to create staging tables in my development environment, which I then copy to production for use. It’s nothing you can’t do yourself, but it does make the whole process easy and painless, plus it saves you from annoying typos. But be careful when using this tool to just create the table and check constraints automatically, because you may need to…

Add Check Constraints After Loading Data

Most of the time, I add the check constraint when I create the staging table, then I load data and perform the partition switch. However, for some reason, I was receiving the following error:

.Net SqlClient Data Provider: Msg 4972, Level 16, State 1, Line 1
ALTER TABLE SWITCH statement failed. Check constraints or partition function of source table ‘myStagingTable’ allows values that are not allowed by check constraints or partition function on target table ‘myDestinationTable’.

This drove me crazy. I confirmed my check constraints were correct, that I had the correct partition number, and that all schema and indexes matched identically. After about 30 minutes of this, I decided to drop and recreate the constraint. For some reason, it fixed the issue. Repeat tests produced the same results: the check constraint needed to be added *after* data was loaded. This error is occurring on a SQL Server 2008 SP1 box; to be honest, I’m not sure what’s causing the error, so if you know, please leave me a comment. But I figured I’d share so that anyone else running into this issue can hopefully save some time and headache. :)

Replicating Into Partitioned and Non-Partitioned Tables

Recently, we needed to replicate a non-partitioned table to two different destinations. We wanted to use partitioning for Server A, which has 2008 Enterprise; Server B, which is on 2005 Standard, could not take advantage of partitioning. The solution was really easy: create a pre-snapshot and post-snapshot script for the publication, then modify to handle each server group differently. Using pseudo-code, it looked something like this:

/* Identify which servers get the partitioned version */
If @@ServerName In ('yourServerNameList') 
    /* Create your partitioning scheme if necessary */
    If Not Exists(Select * From sys.partition_schemes Where name = 'InsertPartitionScheme')
        CREATE PARTITION SCHEME InsertPartitionScheme 
            AS PARTITION InsertPartitionFunction ALL TO ([PRIMARY]);    
    /* Create your partitioning function if necessary */
    If Not Exists(Select * From sys.partition_functions Where name = 'InsertPartitionFunction')
        CREATE PARTITION FUNCTION InsertPartitionFunction (smalldatetime) 
            AS RANGE RIGHT FOR VALUES ('insertValues');    
    /* Create a partitioned version of your table */
    CREATE TABLE [dbo].[yourTableName] (
    ) ON InsertPartitionScheme([partitioningKey]);
    /* Create a non-partitioned version of your table */
    CREATE TABLE [dbo].[yourTableName] (
    ) ON [Primary];

You could also use an edition check instead of a server name check, if you prefer. The post-snapshot script basically looked the same, except you create partitioned indexes instead.

Compress Old Partitions

Did you know you can set different compression levels for individual partitions? It’s true! I’ve just completed doing this on our largest partitioned table. Here’s how:

/* Apply compression to your partitioned table */
Alter Table dbo.yourTableName
Rebuild Partition = All
      Data_Compression = Page On Partitions(1 to 9)
    , Data_Compression = Row  On Partitions(10 to 11) 
    , Data_Compression = None On Partitions(12)
/* Apply compression to your partitioned index */
Alter Index YourPartitionedIndex
    On dbo.yourTableName
    Rebuild Partition = All
      Data_Compression = Page On Partitions(1 to 9)
    , Data_Compression = Row  On Partitions(10 to 11) 
    , Data_Compression = None On Partitions(12)
/* Apply compression to your unpartitioned index */
Alter Index YourUnpartitionedIndex
    On dbo.yourTableName
    Rebuild With (Data_Compression = Row);

A couple of things to note. In all of our proof-of-concept testing, we found that compression significantly reduced query execution time, reads (IO), and storage. However, CPU was also increased significantly. The results were more dramatic, both good and bad, with page compression versus row compression. Still, for our older partitions, which aren’t queried regularly, it made sense to turn on page compression. The newer partitions receive row compression, and the newest partitions, which are still queried very regularly by routine processes, were left completely uncompressed. This seems to strike a nice balance in our environment, but of course, results will vary depending on how you use your data.

Something to be aware of is that compressing your clustered index does *not* compress your non-clustered indexes; those are separate operations. Lastly, for those who are curious, it took us about 1 minute to apply row compression and about 7 minutes to apply page compression to partitions averaging 30 million rows.

Looking for more information on table partitioning? Check out my overview of partitioning, my example code, and my article on indexing on partitioned tables.