Performance Considerations of Data Types

I’ve just finished my first real content for the PASS Performance SIG. I decided to write on “Performance Considerations of Data Types,” as I think this is one of the easiest and most overlooked topics in performance tuning. Here’s a summary:

Selecting inappropriate data types, especially on large tables with millions or billions of rows, can have significant performance implications. In this article, I’ll explain why and offer suggestions on how to select the most appropriate data type for your needs. The primary focus will be on common data types in SQL Server 2005 and 2008, but I’ll also discuss some aspects of clustered indexes and column properties. Most importantly, I’ll show some examples of common data-type misuse.

If you’re interested in this content, you can find it here: Performance Considerations of Data Types.

Special thanks to Paul Randal and Paul Nielsen for providing me with technical reviews and great feedback. You guys are awesome!

Thanks also to Mladen Prajdic and Jeremiah Peschka for their great input. You guys are awesome, too!

0saves
If you enjoyed this post, please consider leaving a comment or subscribing to the RSS feed to have future articles delivered to your feed reader.
Tagged , , , , . Bookmark the permalink.

11 Responses to Performance Considerations of Data Types

  1. Great article! One note regarding clustered indexes on uniqueidentifiers:

    “Uniqueidentifiers are an expensive pointer because they are typically not sequential and because they consume 16 bytes. One environment in which I’ve worked contained two tables, each with over a billion rows, clustered on a non-sequential uniqueidentifier, and 99.998% fragmented.”

    I have run into similar situations. Depending on how and where the GUID is being generated, NEWSEQUENTIALID() can be a great alternative to refactoring the database. http://msdn.microsoft.com/en-us/library/ms189786.aspx You still have the 16 byte vs 8 byte consideration, but the fragmentation problem is largely mitigated.

    Caleb

  2. Pingback: SQL Server and Cloud Links for the Week | Brent Ozar - SQL Server DBA

  3. David Hay says:

    Great article! I’m having a bit of a problem understanding the non unique clusterd index portion though, particularly about dates.

    “However, creating a non-unique clustered index on a column with many duplicate values, perhaps on a column of date data type where you might have thousands of records with the same clustered key value, could result in a significant amount of internal overhead.”

    We have sales data that we mainly query by date or date range, by putting the clustered index over the date column that has significantly increased the performance of the related queries. Is the overhead only in space used, or is there more to consider that I am missing?

  4. @Caleb thanks! Yes, you’re exactly right about NEWSEQUENTIALID(), but one of the most prevalent reasons to use GUIDs appears to be the fact that they’re generated in the .NET tier and then passed to SQL Server. So although I work with many applications that store UNIQUEIDENTIFIERS, I’ve never once had the occasion to actually use NEWSEQUENTIALID(). That may not be true in other environments, though, but has been my experience. Thanks for the comment! :)

  5. @David Thanks! I’m going to write a blog post about this to help clarify a little bit what’s happening. Basically, your searches would perform faster if you’re clustering on date and searching on date, but yes, there’s overhead in terms of the space used to store the clustered index. And because your clustering key is also stored in your non-clustered keys, that overhead is felt in every other index, too. This isn’t such a big deal in smaller tables, and it’s not necessarily a bad thing anyway. Every environment and every application is different, so it’s impossible to say more without knowing your specifics. Are you OLTP, DSS, OLAP? What’s more important, fast reads or fast writes? All of this is important to consider and makes a difference when choosing a clustering key. The point my article was making is really that performance can, in general, be improved by using an unique clustered key.

    I hope that helps clarify, and like I said, I’m working on a blog post to help clarify things further. Thanks for the comment! :)

  6. Pingback: Overhead in Non-Unique Clustered Indexes : SQL Fool

  7. Stephen says:

    I can’t find this article on the SQL Pass site, it seems they reorg’d the site. Is there another place I can find this article?

  8. Pingback: Another Super Bowl, Another 27k TPS « SQL Fool

  9. Pingback: Important considerations when performance tuning | Paul S. Randal

  10. Pingback: Effective Clustered Indexes « T-SQL

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>