T-SQL Script for Estimating Compression Savings
A couple of weeks ago, I was working on a Microsoft PDW proof-of-concept (POC) and had to measure compression ratios. In order to do this, I fired up SSMS and wrote a little script. The script will iterate through all tables in a database and run the sp_estimate_data_compression_savings stored procedure. This will only work in SQL Server 2008+ versions running Enterprise edition.
If you're not familiar with this stored procedure, it basically will tell you what effect PAGE or ROW compression will have on your table/index/partition, etc. There are pro's and con's with compression. What I've tended to see is that compression has very positive results on space, IO, and query duration, with a negative impact on CPU and write speed. Like most things, it's a trade-off and the results will vary by environment, so I recommend you do some testing before you apply compression to all tables. I tend to use compression mostly for my historical tables and partitions and leave my recent data uncompressed. And, back to the script, I use this stored procedure to estimate the impact of compression and to determine whether to use PAGE or ROW compression. PAGE is a higher level of compression, which means it's also more expensive in terms of CPU, so if the difference between the two results is negligible, I'm more apt to just use ROW compression.
Now that my impromptu compression discussion is done, let's get to the actual script. One final word of caution, however. This is an IO intensive process, so you may want to run it after peak business hours.
SET NOCOUNT ON; DECLARE @printOnly BIT = 0 -- change to 1 if you don't want to execute, just print commands , @tableName VARCHAR(256) , @schemaName VARCHAR(100) , @sqlStatement NVARCHAR(1000) , @tableCount INT , @statusMsg VARCHAR(1000); IF EXISTS(SELECT * FROM tempdb.sys.tables WHERE name LIKE '%#tables%') DROP TABLE #tables; CREATE TABLE #tables ( database_name sysname , schemaName sysname NULL , tableName sysname NULL , processed bit ); IF EXISTS(SELECT * FROM tempdb.sys.tables WHERE name LIKE '%#compression%') DROP TABLE #compressionResults; IF NOT EXISTS(SELECT * FROM tempdb.sys.tables WHERE name LIKE '%#compression%') BEGIN CREATE TABLE #compressionResults ( objectName varchar(100) , schemaName varchar(50) , index_id int , partition_number int , size_current_compression bigint , size_requested_compression bigint , sample_current_compression bigint , sample_requested_compression bigint ); END; INSERT INTO #tables SELECT DB_NAME() , SCHEMA_NAME([schema_id]) , name , 0 -- unprocessed FROM sys.tables; SELECT @tableCount = COUNT(*) FROM #tables; WHILE EXISTS(SELECT * FROM #tables WHERE processed = 0) BEGIN SELECT TOP 1 @tableName = tableName , @schemaName = schemaName FROM #tables WHERE processed = 0; SELECT @statusMsg = 'Working on ' + CAST(((@tableCount - COUNT(*)) + 1) AS VARCHAR(10)) + ' of ' + CAST(@tableCount AS VARCHAR(10)) FROM #tables WHERE processed = 0; RAISERROR(@statusMsg, 0, 42) WITH NOWAIT; SET @sqlStatement = 'EXECUTE sp_estimate_data_compression_savings ''' + @schemaName + ''', ''' + @tableName + ''', NULL, NULL, ''PAGE'';' -- ROW, PAGE, or NONE IF @printOnly = 1 BEGIN SELECT @sqlStatement; END ELSE BEGIN INSERT INTO #compressionResults EXECUTE sp_executesql @sqlStatement; END; UPDATE #tables SET processed = 1 WHERE tableName = @tableName AND schemaName = @schemaName; END; SELECT * FROM #compressionResults;
Partitioning Tricks
For those of you who are using partitioning, or who are considering using partitioning, allow me to share some tips with you.
Easy Partition Staging Tables
Switching partitions (or more specifically, hobts) in and out of a partitioned table requires the use of a staging table. The staging table has very specific requirements: it must be completely identical to the partitioned table, including indexing structures, and it must have a check constraint that limits data to the partitioning range. Thanks to my co-worker Jeff, I've recently started using the SQL Server Partition Management tool on CodePlex. I haven't used the automatic partition switching feature -- frankly, using any sort of data modification tool in a production environment makes me nervous -- but I've been using the scripting option to create staging tables in my development environment, which I then copy to production for use. It's nothing you can't do yourself, but it does make the whole process easy and painless, plus it saves you from annoying typos. But be careful when using this tool to just create the table and check constraints automatically, because you may need to...
Add Check Constraints After Loading Data
Most of the time, I add the check constraint when I create the staging table, then I load data and perform the partition switch. However, for some reason, I was receiving the following error:
.Net SqlClient Data Provider: Msg 4972, Level 16, State 1, Line 1
ALTER TABLE SWITCH statement failed. Check constraints or partition function of source table 'myStagingTable' allows values that are not allowed by check constraints or partition function on target table 'myDestinationTable'.
This drove me crazy. I confirmed my check constraints were correct, that I had the correct partition number, and that all schema and indexes matched identically. After about 30 minutes of this, I decided to drop and recreate the constraint. For some reason, it fixed the issue. Repeat tests produced the same results: the check constraint needed to be added *after* data was loaded. This error is occurring on a SQL Server 2008 SP1 box; to be honest, I'm not sure what's causing the error, so if you know, please leave me a comment. But I figured I'd share so that anyone else running into this issue can hopefully save some time and headache.
Replicating Into Partitioned and Non-Partitioned Tables
Recently, we needed to replicate a non-partitioned table to two different destinations. We wanted to use partitioning for Server A, which has 2008 Enterprise; Server B, which is on 2005 Standard, could not take advantage of partitioning. The solution was really easy: create a pre-snapshot and post-snapshot script for the publication, then modify to handle each server group differently. Using pseudo-code, it looked something like this:
/* Identify which servers get the partitioned version */ If @@ServerName In ('yourServerNameList') Begin /* Create your partitioning scheme if necessary */ If Not Exists(Select * From sys.partition_schemes Where name = 'InsertPartitionScheme') CREATE PARTITION SCHEME InsertPartitionScheme AS PARTITION InsertPartitionFunction ALL TO ([PRIMARY]); /* Create your partitioning function if necessary */ If Not Exists(Select * From sys.partition_functions Where name = 'InsertPartitionFunction') CREATE PARTITION FUNCTION InsertPartitionFunction (smalldatetime) AS RANGE RIGHT FOR VALUES ('insertValues'); /* Create a partitioned version of your table */ CREATE TABLE [dbo].[yourTableName] ( [yourTableSchema] ) ON InsertPartitionScheme([partitioningKey]); End Else Begin /* Create a non-partitioned version of your table */ CREATE TABLE [dbo].[yourTableName] ( [yourTableSchema] ) ON [Primary]; End
You could also use an edition check instead of a server name check, if you prefer. The post-snapshot script basically looked the same, except you create partitioned indexes instead.
Compress Old Partitions
Did you know you can set different compression levels for individual partitions? It's true! I've just completed doing this on our largest partitioned table. Here's how:
/* Apply compression to your partitioned table */ Alter Table dbo.yourTableName Rebuild Partition = All With ( Data_Compression = Page On Partitions(1 to 9) , Data_Compression = Row On Partitions(10 to 11) , Data_Compression = None On Partitions(12) ); /* Apply compression to your partitioned index */ Alter Index YourPartitionedIndex On dbo.yourTableName Rebuild Partition = All With ( Data_Compression = Page On Partitions(1 to 9) , Data_Compression = Row On Partitions(10 to 11) , Data_Compression = None On Partitions(12) ); /* Apply compression to your unpartitioned index */ Alter Index YourUnpartitionedIndex On dbo.yourTableName Rebuild With (Data_Compression = Row);
A couple of things to note. In all of our proof-of-concept testing, we found that compression significantly reduced query execution time, reads (IO), and storage. However, CPU was also increased significantly. The results were more dramatic, both good and bad, with page compression versus row compression. Still, for our older partitions, which aren't queried regularly, it made sense to turn on page compression. The newer partitions receive row compression, and the newest partitions, which are still queried very regularly by routine processes, were left completely uncompressed. This seems to strike a nice balance in our environment, but of course, results will vary depending on how you use your data.
Something to be aware of is that compressing your clustered index does *not* compress your non-clustered indexes; those are separate operations. Lastly, for those who are curious, it took us about 1 minute to apply row compression and about 7 minutes to apply page compression to partitions averaging 30 million rows.
Looking for more information on table partitioning? Check out my overview of partitioning, my example code, and my article on indexing on partitioned tables.
Categories
- Business Intelligence
- Internals
- Miscellaneous
- PASS
- Performance & Tuning
- Presentations
- SQL 2008
- SQL Tips
- Syndication
- T-SQL Scripts
Subscribe to my blog!
| Like what you see? Subscribe! |
![]() |
Around the Web
Recent Tweets
- RT @Phil_Factor: SQL Server table columns under the hood http://t.co/vp7gQ77B < what a great post
- @tradney haha that's awesome :)
- OH: @AdamMachanic's #sqlpass session was one of the best I've ever seen.
Archives
- November 2011
- October 2011
- September 2011
- July 2011
- June 2011
- May 2011
- April 2011
- January 2011
- December 2010
- November 2010
- October 2010
- September 2010
- July 2010
- June 2010
- May 2010
- April 2010
- February 2010
- January 2010
- November 2009
- October 2009
- September 2009
- August 2009
- July 2009
- June 2009
- May 2009
- April 2009
- March 2009
- February 2009
- January 2009
- December 2008
- November 2008
- October 2008


