SQL RAP Contest!

I’ve been surprised at how many messages I’ve received regarding the SQL RAP I hastily wrote last night, like:

loriedwards: @sqlfool Much love for your sql rap. Does this make you an MC DBA?

AndyLeonard: Green tea all over my laptop monitor – this is hilarious @sqlfool !

Jimmy May: OMG!!! I am howling!!! All y’all have soitenly gotten me beat. So to speak…

If you haven’t read Ward’s excellent response to my poor excuse for a rap, you can find it on his blog. Ward beat me soundly (I definitely “hit a nuke with rubber bands!”), but in my defense, I did not know he was an accomplished singer and song writer! The most musical talent I have is beating Guitar Hero on the Hard difficulty mode. Hey, it’s impressive if you’re me. :)

But what I’ve been surprised to find, aside from Ward’s musical genius, is how many *other* DBA’s are inclined to spin a little rhyme:

SQLChicken: @sqlfool you know w/ who u be messin?Take your broke ass rhymes and learn this lesson.I’ll purge u like bad data,so step off and see u lata!

mike_walsh: @sqlfool – @sqltwit thinks he’s a master of prose but the emperors got no clothes. Webcam? What a sham… #sqlrapsaga

jeffrush: @sqlfool @sqltwit @andyleonard you are all crazy like an orange / rhyming isn’t hard at… DOH!

Clearly, we need to give these folks an avenue to vent their repressed rhythms; the last thing we need is more maniacal DBA’s running around. So here’s my chance to save the world here’s your chance to show off your rap skillz. I am officially throwing down the proverbial gauntlet to the ENTIRE WORLD the SQL Server and .NET community to submit your own SQL RAP.

Why should you participate? Well, for street cred, of course! Oh, you want something more? How about a shiny new iPod Touch?

Now that I have your attention, here’s the rules:

  • Your rap has to be original, but may parody existing songs.
  • Your rap has to relate to SQL Server in some form; the geekier, the better. Bonus points for working SQL Server 2008 features or SQL bloggers into the song.
  • You only need to submit the lyrics to your song. Bonus points for video and/or audio accompaniments.

Meet your judges:

Brent Ozar – a SQL Server expert by trade and a smart-ass at heart, Brent will be judging your rap on its overall entertainment value.

Ward Pond – a SQL Server guru by day, songwriter by night, Ward will be judging your rap’s overall “musical flow”… whatever that means.

Michelle Ufford – she’s already proved she has no ability to rap, but it’s her contest, so she gets to be a judge. She’ll be judging your song on its je ne sais quoi… literally.

Realistically, we may not get that many submissions, so this may be the best odds you ever have at winning something. Plus, I hear Brent is pretty easy to bribe.

Send your raps to me at michelle @ sqlfool dot com.  The deadline for submissions is 8pm CST on Monday, 18 May 2009. I’ll post the entries and the winners around a week after that, depending on how busy I am. :)

So what are you waiting for? Go throw on some Snoop and/or a Kimberly Tripp webcast and get rappin’!

Ward’s RAP

Ward Pond has been talking about RAP all day today on Twitter. I’m not normally one to RAP, but I’ve been inspired to put out a little lyrical verse in Ward’s honor.

[ahem]

Ward’s RAP:

My name is Ward, last name Pond
SQL Server is of what I’m fond
I like my data relational
and my processes transactional
(say what?)
If you don’t know about t-logs
Then you best be reading my blogs
‘Cause I’m a real SQL master
I knows how to make the SQL go faster
Command line is my only tool
Not like that SSMS junkie, SQLFool
So don’t mess with me, or you’ll be TechEd skooled

Word. And I don’t mean Office.

[/pain]

A Look at Missing Indexes

Tim Ford (@SQLAgentMan) recently blogged about his Top 5 SQL Server Indexing Best Practices. I thought it was a good list, and it inspired this blog post. I’ve recently been doing a little index spring cleaning, and I thought some people may be interested in the process I go through. So, here it is… a journey through madness an overview of my general missing index process.

I start with my trusty dba_missingIndexStoredProc table. If this table sounds completely foreign to you, check out my post, Find Missing Indexes In Stored Procs. Basically, I have a process that runs every night, scanning the XML of every query plan on the server to find procs that are possibly missing indexes. I then log the details for later action.

So I take a look at my table, and I find 8 stored procedures that are possibly missing indexes. Clicking on the XML link will show me the logged query plan:

Procs With Missing Indexes

Procs With Missing Indexes

Right clicking on the “Missing Index” description will give me the details of the recommended index:

Missing Index Details

Missing Index Details

Here’s an example of what SQL Server will return for you:

/*
Missing Index Details from ExecutionPlan2.sqlplan
The Query Processor estimates that implementing the following index 
could improve the query cost by 85.7327%.
*/
 
/*
USE [msdb]
GO
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[sysjobhistory] ([job_id])
INCLUDE ([instance_id],[step_id],[sql_message_id],[sql_severity],
[run_status],[run_date],[run_time],[run_duration],[operator_id_emailed], 
[operator_id_netsent],[operator_id_paged],[retries_attempted],[server])
GO
*/

I now compare the details of this proposed index to the missing index DMV suggestions, using this query:

SELECT t.name AS 'affected_table'
    , 'Create NonClustered Index IX_' + t.name + '_missing_' 
        + CAST(ddmid.index_handle AS VARCHAR(10))
        + ' On ' + ddmid.STATEMENT 
        + ' (' + IsNull(ddmid.equality_columns,'') 
        + CASE WHEN ddmid.equality_columns IS Not Null 
            And ddmid.inequality_columns IS Not Null THEN ',' 
                ELSE '' END 
        + IsNull(ddmid.inequality_columns, '')
        + ')' 
        + IsNull(' Include (' + ddmid.included_columns + ');', ';'
        ) AS sql_statement
    , ddmigs.user_seeks
    , ddmigs.user_scans
    , CAST((ddmigs.user_seeks + ddmigs.user_scans) 
        * ddmigs.avg_user_impact AS INT) AS 'est_impact'
    , ddmigs.last_user_seek
FROM sys.dm_db_missing_index_groups AS ddmig
INNER JOIN sys.dm_db_missing_index_group_stats AS ddmigs
    ON ddmigs.group_handle = ddmig.index_group_handle
INNER JOIN sys.dm_db_missing_index_details AS ddmid 
    ON ddmig.index_handle = ddmid.index_handle
INNER JOIN sys.tables AS t
    ON ddmid.OBJECT_ID = t.OBJECT_ID
WHERE ddmid.database_id = DB_ID()
    --AND t.name = 'myTableName' 
ORDER BY CAST((ddmigs.user_seeks + ddmigs.user_scans) 
    * ddmigs.avg_user_impact AS INT) DESC;

I usually find the data in both places, but not always. One reason why is because the missing index DMV will only store data since your last reboot. So if I’m taking a look at this DMV on Monday and I just rebooted on Sunday, I may not have enough history to give me meaningful recommendations. This is just something to be aware of.

What I’m looking for in this DMV is the number of user_seeks and the est_impact. Also, if I haven’t rebooted my server in a while, I take a look at last_user_seek so I can determine whether or not it’s still accurate.

Next, I take a look at my existing indexes using Kimberly Tripp’s sp_helpindex2 system stored proc. I use her proc instead of sp_helpindex because I need to see included columns.

If you’re wondering why I’m looking at existing indexes, the reason is because I’m looking for indexes that can be modified slightly to accommodate my missing index needs. By “modified slightly,” I mean that I’d only want to make a change to an existing index if it did not drastically change the size or composition of an index, i.e. adding one or two narrow columns as included columns. I do NOT mean making changes that double the size of your index; in those cases, you’d probably be better off creating a brand new index.

Looking at existing indexes is actually a pretty critical part of the puzzle. If I have a proc that only gets called a few times an hour and could benefit from a better index, I may not create that index if it means adding a wide, expensive index to a busy table. If I can make a small modification to an existing index, then there’s a greater chance I’ll make the change and cover my query.

At this point, I should have enough information to start making decisions. I was going to write out the path I normally take when making decisions, but I thought, “Hey! What a great time for a diagram.” So here you go:

Decision Path

Decision Path

Disclaimer: I’m *not* a Visio wizard, so if I butchered the use of certain symbols in my diagram, please let me know so I can a) fix it, and b) learn from it!

It’s hard to really put all of the decision paths into a single, small diagram like this. There’s a lot of variables that I’m not even touching here. But I think this is a fairly good “generic” representation of the path I take. When I hit an “end” process, it means I don’t create the missing index at this time. Maybe in the future, it’ll become necessary, but I prefer to err on the side of less indexes.

So there you have it, a brief look at my missing index process. Hopefully someone finds it helpful. :)

SQL Quiz #4

Chris Shaw has started up round 4 of his excellent SQL Quiz series. I was tagged by that crazy dynamic SQL guy, Jeremiah Peschka, who was tagged by the SQLBatman, who was in turn tagged by Mr. Shaw himself.

The question this time is…

Who has been a great leader in your career and what made them a great leader?

I’ve had some great managers, and I’ve known some great leaders. It’s hard to differentiate, and even harder to narrow it done to just one person. But if I have to choose just one, then I’m going with the person who first had a major impact on my career: Richard.

Richard is the type of person that just commands silent respect. When he speaks, people listen. But the other side of that is, when other people speak, he listens too. You never have a problem doing what he asks, because you know that he’d never ask you to do something that he would not do himself. In fact, chances are he’s already doing the work of two people, just so he doesn’t have to overload *you*. You know he has your best interests at heart, even when it’s not necessarily the best for him or the company. By the time something comes up where he needs help, he’s got a contingent of people willing to bend over backwards for him, whether it’s working late or just making the coffee, because he’s earned their trust, respect, and admiration.

No doubt about it, Richard is a great manager. But take all of these great managerial and interpersonal traits and add vision, and now you have a great leader who’s capable of moving proverbial mountains. And Richard has vision in spades.

I’ve unfortunately lost touch with Richard over the years, but I’ve never forgotten the lessons I’ve learned from him. Richard, if you ever stumble across this… thank you.

Tag! You’re it. Don’t break the chain, or SQLBatman will break YOU.

Filtered Indexes: What You Need To Know

Filtered indexes are probably my favorite feature in 2008. That’s saying a lot, since there are so many great new features to choose from. In this post, I want to explore a little about how filtered indexes work, how they can be applied, and some of the “gotchas” to be aware of.

First, for those of you who may not yet know about filtered indexes, allow me enlighten you. In short, filtered indexes allow you to create an index on a subset of data using a filtering predicate. Filters can only be applied to non-clustered indexes. The general syntax of a filtered index is:

Create NonClustered Index [index_name]
On [table_name] ([column_list])
Include ([column_list])
Where [filtered_criteria];

For our purposes, we’re going to be working with the Sales.SalesOrderDetail table in the AdventureWorks database. Let’s look at a specific example. Suppose we have a query that regularly searches on the [SpecialOfferID] column.

Select SalesOrderID
    , Count(*) As 'CountOfLineItem'
    , Sum(LineTotal) As 'SumOfLineTotal'
From Sales.SalesOrderDetail
Where SpecialOfferID <> 1
Group By SalesOrderID;

We notice that there’s no covering index for this query by looking at the actual execution plan:

Query Plan - Clustered Scan

Query Plan - Clustered Scan

If this is a commonly executed query, then we’d probably want to toss an index on it. Before we get started, let’s take a look at what the distribution of values are on that column:

Select SpecialOfferID
    , Count(*) As 'rows'
From Sales.SalesOrderDetail
Group By SpecialOfferID
Order By Count(*) Desc;

Our distribution of values is:

SpecialOfferID rows
-------------- -----------
1              115884
2              3428
3              606
13             524
14             244
16             169
7              137
8              98
11             84
4              80
9              61
5              2

As you can see, [SpecialOfferID] = 1 accounts for 96% of our values. In 2005, we’d create an index that may look something like this:

Create NonClustered Index IX_Sales_SalesOrderDetail_SpecialOfferID
    On Sales.SalesOrderDetail(SpecialOfferID)
    Include (SalesOrderID, LineTotal);

Now if we re-run our original query, this is what we see:

Indexed Query Plan

Indexed Query Plan

So we’re now performing a non-clustered index seek instead of a clustered index scan. Already this results in some pretty significant performance improvements. To see this, we’re going to use the INDEX query hint to force an index scan. We’re also going to use the DBCC command DROPCLEANBUFFERS, which will allow us to clear the buffer cache and better examine what’s happening with our IO.

Set Statistics IO On;
 
DBCC DropCleanBuffers;
 
Select SalesOrderID
    , Count(*) As 'CountOfLineItem'
    , Sum(LineTotal) As 'SumOfLineTotal'
From Sales.SalesOrderDetail With 
    (Index(PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID))
Where SpecialOfferID <> 1
Group By SalesOrderID;
 
DBCC DropCleanBuffers;
 
Select SalesOrderID
    , Count(*) As 'CountOfLineItem'
    , Sum(LineTotal) As 'SumOfLineTotal'
From Sales.SalesOrderDetail
Where SpecialOfferID <> 1
Group By SalesOrderID;
 
Set Statistics IO Off;
Clustered Index Scan:
Table 'SalesOrderDetail'. Scan count 1, logical reads 1240, physical reads 17, read-ahead reads 1242...
 
NonClustered Index Seek:
Table 'SalesOrderDetail'. Scan count 2, logical reads 30, physical reads 4, read-ahead reads 480...

As you can see, the non-clustered (NC) index seek performs quite a bit better. Now let’s create a filtered index and explore what happens:

Create NonClustered Index FIX_Sales_SalesOrderDetail_SpecialOfferID_Filtered
    On Sales.SalesOrderDetail(SalesOrderID)
    Include (LineTotal)
    Where SpecialOfferID <> 1;

First, let’s look at the pages consumed by each index:

SELECT i.name, ddips.index_depth, ddips.index_level
    , ddips.page_count, ddips.record_count
FROM sys.indexes AS i
Join sys.dm_db_index_physical_stats(DB_ID(), 
    OBJECT_ID(N'Sales.SalesOrderDetail'), Null, Null, N'Detailed') AS ddips
    ON i.OBJECT_ID = ddips.OBJECT_ID
    And i.index_id = ddips.index_id
WHERE i.name In ('IX_Sales_SalesOrderDetail_SpecialOfferID'
    , 'FIX_Sales_SalesOrderDetail_SpecialOfferID_Filtered'
    , 'PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID')
    AND ddips.index_level = 0;
name                                                       index_depth index_level page_count  record_count
---------------------------------------------------------- ----------- ----------- ----------- --------------------
PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID        3           0           1234        121317
IX_Sales_SalesOrderDetail_SpecialOfferID                   3           0           480         121317
FIX_Sales_SalesOrderDetail_SpecialOfferID_Filtered         2           0           19          5433

If you scroll over, you’ll see that the clustered index consumes the most pages, naturally. The non-filtered NC index consumes less pages than the clustered index because it’s narrower; however, it still consumes more pages than the filtered index because it’s storing every data row. The filtered index, with only 5433 rows stored, is by far our smallest index, consuming 96% less space than our non-filtered NC index.

Because we’re using less space to store this index, we should also see an equivalent performance boost. Let’s verify that this is the case:

Set Statistics IO On;
 
DBCC DropCleanBuffers;
 
Select SalesOrderID
    , Count(*) As 'CountOfLineItem'
    , Sum(LineTotal) As 'SumOfLineTotal'
From Sales.SalesOrderDetail With (Index(IX_Sales_SalesOrderDetail_SpecialOfferID))
Where SpecialOfferID <> 1
Group By SalesOrderID;
 
DBCC DropCleanBuffers;
 
Select SalesOrderID
    , Count(*) As 'CountOfLineItem'
    , Sum(LineTotal) As 'SumOfLineTotal'
From Sales.SalesOrderDetail
Where SpecialOfferID <> 1
Group By SalesOrderID;
 
Set Statistics IO Off;
NonClustered Index Seek:
Table 'SalesOrderDetail'. Scan count 2, logical reads 30, physical reads 4, read-ahead reads 480
 
Filtered Index Scan:
Table 'SalesOrderDetail'. Scan count 1, logical reads 24, physical reads 2, read-ahead reads 22
Filtered Query Plan

Filtered Query Plan

As expected, we get the best results with our filtered index scan.

You’ll notice that I did *not* create the index on the [SpecialOfferID] column like I did in [IX_Sales_SalesOrderDetail_SpecialOfferID]. This is because my query doesn’t care what my [SpecialOfferID] value is, just as long as it’s not equal to 1. My non-filtered NC index was created on [SpecialOfferID] because it needed to navigate the B-TREE to find the records where [SpecialOfferID] <> 1. With my filtered index, the query optimizer knows that all of my records already meet the criteria, so doesn’t need to navigate through the index to find the matching results.

We could choose to include the [SpecialOfferID] data in our filtered index, but we’d most likely want to make it an included column rather than part of the index key. In fact, it’s important to note that, if I don’t add [SpecialOfferID] as an included column and I want to return it in the results, i.e.

Select SalesOrderID
    , SpecialOfferID
    , Count(*) As 'CountOfLineItem'
    , Sum(LineTotal) As 'SumOfLineTotal'
From Sales.SalesOrderDetail
Where SpecialOfferID <> 1
Group By SalesOrderID
    , SpecialOfferID;

my filtered index will not be used and I will instead scan on the clustered index once more (assuming [IX_Sales_SalesOrderDetail_SpecialOfferID] does not exist). This is because the filtering criteria is not included anywhere on the actual index page. This is actually good news, in my opinion, since it allows you to create even leaner indexes. And like I already mentioned, if you do need the data returned, you can always add the filtering criteria as included columns.

What if you’re trying to find out whether or not an index is filtered, and what it’s filtered on? The sys.indexes catalog view has been updated in 2008 to include this information:

Select name, has_filter, filter_definition
From sys.indexes 
Where name In ('IX_Sales_SalesOrderDetail_SpecialOfferID'
    , 'FIX_Sales_SalesOrderDetail_SpecialOfferID_Filtered'
    , 'PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID');
name                                                   has_filter filter_definition
------------------------------------------------------ ---------- -------------------------
FIX_Sales_SalesOrderDetail_SpecialOfferID_Filtered     1          ([SpecialOfferID]<>(1))
IX_Sales_SalesOrderDetail_SpecialOfferID               0          NULL
PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID    0          NULL

I personally recommend Kimberly Tripp’s system stored proc, sp_helpindex2. It returns a lot of good information about your indexes, such as included columns and filtering criteria.

That’s all I have for today. Hopefully, you now understand how powerful filtered indexes can be. When used properly, filtered indexes can use less space, consume less IO, and improve overall query performance.

380PASS – Another Successful Meeting!

Our 2nd 380PASS meeting was another success, by all accounts. We had 31 attendees! This is twice what registered and remains on par with our first meeting. The meeting was held in one of the very nice, state-of-the-art conference rooms at the University of Iowa. We began the meeting with a new feature, quick and easy SQL tips. I gave the first tip, which was how SSMS 2008 will display missing indexes. Afterwards, Zakir Durumeric from the Research Information Systems (RIS) team at the University spoke on database mirroring. Zakir gave an excellent overview of high availability options, the pros and cons of various methods, and some of the obstacles and solutions he’s faced in his environment. The materials for last night’s meeting will be uploaded to http://380pass.org later today.

Following Zakir’s presentation, Russ Allen shared some powerful and time-saving tips on how to use Central Management Servers to manage multi-server environments. The meeting ended with an interesting and informative discussion on Red Gate’s software. The conversation primarily revolved around SQL Compare, SQL Data Compare, and SQL Prompt, and the positive impact those applications have had on our lives.

We’re pleased to announce SQL Server MVP Hilary Cotter as the guest speaker for our May meeting. Hilary will be presenting remotely on performance tuning topics. More details to follow as we finalize them.

Thank you for all who attended and have helped make our new chapter such a success!

Interview with SQLBatman

Recently, SQLBatman asked me for an interview via e-mail. Why the Dark SQL Knight would want to interview me is anyone’s guess. But I agreed, hoping for a chance to meet and defeat the SQL Joker, my arch nemesis (on principle, of course). Alas, the SQL Joker did not make an appearance and continues to be the bane of my existence. But for those interested, you can find the details of the interview at the SQLBatCave.

380PASS – Meeting Tomorrow Night!

Hi, everyone! Just a reminder that tomorrow night will be our 2nd meeting of the 380PASS chapter. Zakir Durumeric from the University of Iowa will be presenting on Database Mirroring topics. We had 31 members at our first meeting, and we’re hoping to keep the momentum going. This 2nd meeting will be held at the University of Iowa in Iowa City. Due to scheduling conflicts, this meeting will start at 6:00pm, with the presentation starting at 6:30pm. This is a one-time schedule change; future meetings will resume at 5:30pm. Additional details and directions can be found on our website at http://380pass.org.

If you haven’t already, please register for the event on our website at http://380pass.org. This will help ensure I order enough pizza. :)

To register for an event, you’ll first need to log into the site. If you’ve never logged in before, you’ll need to register for the site as well. Registration is free, quick, and fairly painless. Send me an e-mail at michelle @ sqlfool dot com if you’re having any trouble.

I hope to see you there!

Page Splitting & Rollbacks

So while I was at the grocery store last night, my mind wandered to SQL Server. This time, I was pondering what happens to a page split if the transaction is rolled back. I didn’t know the answer, but my guess was that the page split remained, since it would be less expensive for SQL Server to leave the data where it was sitting. Also, in theory, if the page split occurred once, it could need to occur again, so why undergo that same expense twice?

I decided to run a simple test to see what would happen. First, I created a really wide table and tossed 4 rows into it:

Create Table myTable
(
      id            int Primary Key
    , wideColumn    char(2000) 
);
 
Insert Into myTable
Select 1, Replicate('A', 2000) Union All
Select 2, Replicate('B', 2000) Union All
Select 4, Replicate('C', 2000) Union All
Select 5, Replicate('D', 2000);

I left an open spot for id=3, so I can force a page split. Next, I looked at the page data using the script I posted in February.

Here’s what I saw:

Before

Before

Pay attention to the 2 items boxed in red. m_slotCnt tells us how many records are on the page, and m_nextPage tells us the address of the next page. As you may have guessed, a m_nextPage value of 0:0 means you’re looking at the last page allocated to the object.

Now let’s insert a record, roll it back, and see what happens:

Begin Transaction;
 
Insert Into myTable
Select 3, Replicate('E', 2000);
 
Rollback Transaction;

I ran my DBCC Page command again and here’s what I saw:

After

After

As you can see, m_slotCnt is now 2, and m_nextPage is no longer 0:0 (although your actual page number will probably be different than mine). If I pull up the new page, I find my 2 relocated records, id’s 4 and 5.

So what’s this all mean? In short, page splits are NOT reversed when a transaction is rolled back. Why should you care? Well, you probably wouldn’t care much, unless you roll back a lot of transactions. But this is also a good thing to keep in mind if you have to abort a really large insert or update; if you don’t plan to re-execute the aborted script, you may want to defrag your indexes to fix the splits.

Source: http://sqlfool.com/2009/04/page-splitting-rollbacks/

PASS Summit – My Abstract, Deadline Extended!

PASS has extended the deadline for abstract submissions to Friday, 24 April at midnight.
If you’re even thinking about submitting an abstract, just do it.

In related news, yesterday I received the green light from GD legal to submit my abstract. This is my first-ever abstract for PASS, and I’m pretty excited about the topic. After some technical difficulties that resulted in SQL statements being displayed in the error message on-screen (tsk tsk), I finally submitted and received my confirmation number. Here’s an overview, just in case you plan to speak on the same topic…

Super Bowl, Super Load – A Look at Performance Tuning for VLDB’s

Few DBA’s have the opportunity to experience a real-life load test in their production environment. Michelle Ufford works for GoDaddy.com, a company that has experienced phenomenal success with its Super Bowl ads. These ads are designed to drive traffic to the company’s websites, which puts the database servers under high load. In her presentation, Michelle will explore the performance tuning techniques that have resulted in an 80% reduction in server response times and allowed her VLDB’s to reach rates of 27k transactions per second. Topics will include vertical and horizontal partitioning, bulk operations, table design, and indexing.

Source: http://sqlfool.com/2009/04/pass-summit-my-abstract-deadline-extended