Check VLF Counts
Today I stumbled across a database with 87,302 VLF's. Yes, that's right... 87 THOUSAND. Most of our databases have a few dozen VLF's, but this was an old database that had grown to 1.5 TB and had the default autogrowth settings left in tact. How did we discover this? During a routine reboot of the server, this database took 30 minutes to recover, but there were no error messages or status messages in the log.
Now, this blog post is not about VLF's or why you should keep the number of VLF's to a small, manageable number -- although I hear under 50 is a good rule of thumb. No, the purpose of this blog post is to share a little script I wrote to check the number of VLF's each database uses:
CREATE TABLE #stage( FileID INT , FileSize BIGINT , StartOffset BIGINT , FSeqNo BIGINT , [Status] BIGINT , Parity BIGINT , CreateLSN NUMERIC(38) ); CREATE TABLE #results( Database_Name sysname , VLF_count INT ); EXEC sp_msforeachdb N'Use ?; Insert Into #stage Exec sp_executeSQL N''DBCC LogInfo(?)''; Insert Into #results Select DB_Name(), Count(*) From #stage; Truncate Table #stage;' SELECT * FROM #results ORDER BY VLF_count DESC; DROP TABLE #stage; DROP TABLE #results;
This script is low-impact and is safe to run on large, production databases during business hours. However, just be aware that it's using some undocumented commands.
For more information on VLF's, check out these excellent articles:
Filtered Indexes Work-Around
Recently, I needed to create a stored procedure that queried a rather large table. The table has a filtered index on a date column, and it covers the query. However, the Query Optimizer was not using the index, which was increasing the execution time (not to mention IO!) by at least 10x. This wasn't the first time I've had the Optimizer fail to use a filtered index. Normally when this happens, I use a table hint to force the filtered index -- after I verify that it is indeed faster, of course. However, since this was a stored procedure, I was receiving the following error message whenever I tried to execute the proc:
Query processor could not produce a query plan because of the hints defined in this query. Resubmit the query without specifying any hints and without using SET FORCEPLAN.
SQL Server would not allow me to execute the stored procedure using the filtered index hint. If I removed the hint, it executed, but it used a different, non-covering and far more expensive index. For those of you not familiar with this issue, allow me to illustrate the problem.
First, create a table to play with and populate it with some bogus data:
CREATE TABLE dbo.filteredIndexTest ( myID INT IDENTITY(1,3) , myDate SMALLDATETIME , myData CHAR(100) CONSTRAINT PK_filteredIndexTest PRIMARY KEY CLUSTERED(myID) ); SET NOCOUNT ON; DECLARE @DATE SMALLDATETIME = '2010-01-01'; WHILE @DATE < '2010-02-01' BEGIN INSERT INTO dbo.filteredIndexTest ( myDate , myData ) SELECT @DATE , 'Date: ' + CONVERT(VARCHAR(20), @DATE, 102); SET @DATE = DATEADD(MINUTE, 1, @DATE); END; SELECT COUNT(*) FROM dbo.filteredIndexTest;
It looks like this will generate 44,640 rows of test data... plenty enough for our purposes. Now, let's create our filtered index and write a query that will use it:
CREATE NONCLUSTERED INDEX IX_filteredIndexTest_1 ON dbo.filteredIndexTest(myDate) Include (myData) WHERE myDate >= '2010-01-27'; SELECT DISTINCT myData FROM dbo.filteredIndexTest WHERE myDate >= '2010-01-28';
If you look at the execution plan for this query, you'll notice that the Optimizer is using the filtered index. Perfect! Now let's parameterize it.
DECLARE @myDate1 SMALLDATETIME = '2010-01-28'; SELECT DISTINCT myData FROM dbo.filteredIndexTest WHERE myDate >= @myDate1;
Uh oh. Looking at the execution plan, we see that SQL Server is no longer using the filtered index. Instead, it's scanning the clustered index! Why is this? There's actually a good explanation for it. The reason is that I could, in theory, pass a date to my parameter that fell outside of the filtered date range. If that's the case, then SQL Server could not utilize the filtered index. Personally, I think it's a bug and SQL Server should identify whether or not a filtered index could be used based on the actual value submitted, but... that's a whole other blog post.
So what can we do? Well, dynamic SQL may be able to help us out in this case. Let's give it a go. First, let's try parameterized dynamic SQL.
DECLARE @mySQL1 NVARCHAR(2000) , @myParam NVARCHAR(2000) = '@p_myDate2 smalldatetime' , @myDate2 SMALLDATETIME = '2010-01-28'; SET @mySQL1 = 'Select Distinct myData From dbo.filteredIndexTest Where myDate >= @p_myDate2'; EXECUTE SP_EXECUTESQL @mySQL1, @myParam, @p_myDate2 = @myDate2;
Looking at the execution plan, we see we're still scanning on the clustered index. This is because the parameterized dynamic SQL resolves to be the exact same query as the one above it. Let's try unparameterized SQL instead:
DECLARE @mySQL2 NVARCHAR(2000) , @myDate3 SMALLDATETIME = '2010-01-28'; SET @mySQL2 = 'Select Distinct myData From dbo.filteredIndexTest Where myDate >= ''' + CAST(@myDate3 AS VARCHAR(20)) + ''''; EXECUTE SP_EXECUTESQL @mySQL2; -- Drop Table dbo.filteredIndexTest;
Voila! We have a seek on our filtered index. Why? Because the statement resolves to be identical to our first query, where we hard-coded the date value in the WHERE clause.
Now, I want to stress this fact: you should always, ALWAYS use parameterized dynamic SQL whenever possible. Not only is it safer, but it's also faster, because it can reuse cached plans. But sometimes you just cannot accomplish the same tasks with it. This is one of those times. If you do end up needing to use unparameterized dynamic SQL as a work-around, please make sure you're validating your input, especially if you're interfacing with any sort of external source.
There's an even easier work-around for this problem that Dave (http://www.crappycoding.com) shared with me: recompile.
Adding "Option (Recompile)" to the end of your statements will force the Optimizer to re-evaluate which index will best meet the needs of your query every time the statement is executed. More importantly, it evaluates the plan based on the actual values passed to the parameter... just like in our hard-coded and dynamic SQL examples. Let's see it in action:
DECLARE @myDate4 SMALLDATETIME = '2010-01-28'; SELECT DISTINCT myData FROM dbo.filteredIndexTest WHERE myDate >= @myDate4 OPTION (RECOMPILE); DECLARE @myDate5 SMALLDATETIME = '2010-01-20'; SELECT DISTINCT myData FROM dbo.filteredIndexTest WHERE myDate >= @myDate5 OPTION (RECOMPILE);
If we look at the execution plans for the 2 queries above, we see that the first query seeks on the filtered index, and the second query scans on the clustered index. This is because the second query cannot be satisfied with the filtered index because we initially limited our index to dates greater than or equal to 1/27/2010.
There are, of course, trade-offs associated with each approach, so use whichever one best meets your needs. Do you have another work-around for this issue? If so, please let me know.
Update:
Alex Kuznetsov (http://www.simple-talk.com/author/alex-kuznetsov/) shared this method too:
DECLARE @myDate1 SMALLDATETIME = '2010-01-28'; SELECT DISTINCT myData FROM dbo.filteredIndexTest WHERE myDate = @myDate1 AND myDate >= '2010-01-27';
Like the other examples, this will result in an index seek on the filtered index. Basically, by explicitly declaring the start date of your filter, you're letting the Optimizer know that the filtered index can satisfy the request, regardless of the parameter value passed. Thanks for the tip, Alex!
Replication Bug with Partitioned Tables
Recently, we came across a bug in SQL Server 2005 on one of our production servers. Apparently, if you execute an ALTER TABLE statement on a replicated table with more than 128 partitions, the log reader will fail. A relatively obscure bug, I know. Microsoft has recognized this as a confirmed bug, but I couldn't find it anywhere on the intertubes, thus the inspiration for this blog post. Microsoft's official solution for this issue is to upgrade to SQL Server 2008.
For various reasons, we were unable to execute an upgrade at the time. And since this was a 2 terabyte database, we wanted to come up with a solution that wouldn't involve reinitializing the entire publication. Our quick-fix while we were troubleshooting the issue was to create a linked server to the production box. Not ideal, I know, but it worked in a pinch and minimized exposure of the issue. Fortunately for us, we were able to solve the problem on the publication database pretty easily. All of the affected partition functions had empty partitions created several months in the future, so we simply merged any empty partition ranges for future dates. Our solution to our now-out-of-date subscribers was to apply static row filtering to any table with more than 100 million records. While this would introduce some overhead with the replication of these tables, it would allow us a much faster recovery time. We decided to use the start of the most recent hour as our filtering criteria, just to give us a "clean" filter, so we had to delete data from any table where we were going to apply the filter. After that, it was simply a matter of resuming replication.
All things considered, it took us a little over a day to recover from the issue. Most of that time was spent troubleshooting the problem and identifying a workable solution; actual execution of the changes was pretty quick. Moral of the story? Upgrade to SQL Server 2008.
Partitioning Tricks
For those of you who are using partitioning, or who are considering using partitioning, allow me to share some tips with you.
Easy Partition Staging Tables
Switching partitions (or more specifically, hobts) in and out of a partitioned table requires the use of a staging table. The staging table has very specific requirements: it must be completely identical to the partitioned table, including indexing structures, and it must have a check constraint that limits data to the partitioning range. Thanks to my co-worker Jeff, I've recently started using the SQL Server Partition Management tool on CodePlex. I haven't used the automatic partition switching feature -- frankly, using any sort of data modification tool in a production environment makes me nervous -- but I've been using the scripting option to create staging tables in my development environment, which I then copy to production for use. It's nothing you can't do yourself, but it does make the whole process easy and painless, plus it saves you from annoying typos. But be careful when using this tool to just create the table and check constraints automatically, because you may need to...
Add Check Constraints After Loading Data
Most of the time, I add the check constraint when I create the staging table, then I load data and perform the partition switch. However, for some reason, I was receiving the following error:
.Net SqlClient Data Provider: Msg 4972, Level 16, State 1, Line 1
ALTER TABLE SWITCH statement failed. Check constraints or partition function of source table 'myStagingTable' allows values that are not allowed by check constraints or partition function on target table 'myDestinationTable'.
This drove me crazy. I confirmed my check constraints were correct, that I had the correct partition number, and that all schema and indexes matched identically. After about 30 minutes of this, I decided to drop and recreate the constraint. For some reason, it fixed the issue. Repeat tests produced the same results: the check constraint needed to be added *after* data was loaded. This error is occurring on a SQL Server 2008 SP1 box; to be honest, I'm not sure what's causing the error, so if you know, please leave me a comment. But I figured I'd share so that anyone else running into this issue can hopefully save some time and headache.
Replicating Into Partitioned and Non-Partitioned Tables
Recently, we needed to replicate a non-partitioned table to two different destinations. We wanted to use partitioning for Server A, which has 2008 Enterprise; Server B, which is on 2005 Standard, could not take advantage of partitioning. The solution was really easy: create a pre-snapshot and post-snapshot script for the publication, then modify to handle each server group differently. Using pseudo-code, it looked something like this:
/* Identify which servers get the partitioned version */ IF @@SERVERNAME In ('yourServerNameList') BEGIN /* Create your partitioning scheme if necessary */ IF Not Exists(SELECT * FROM sys.partition_schemes WHERE name = 'InsertPartitionScheme') CREATE PARTITION SCHEME InsertPartitionScheme AS PARTITION InsertPartitionFunction ALL TO ([PRIMARY]); /* Create your partitioning function if necessary */ IF Not Exists(SELECT * FROM sys.partition_functions WHERE name = 'InsertPartitionFunction') CREATE PARTITION FUNCTION InsertPartitionFunction (SMALLDATETIME) AS RANGE RIGHT FOR VALUES ('insertValues'); /* Create a partitioned version of your table */ CREATE TABLE [dbo].[yourTableName] ( [yourTableSchema] ) ON InsertPartitionScheme([partitioningKey]); END ELSE BEGIN /* Create a non-partitioned version of your table */ CREATE TABLE [dbo].[yourTableName] ( [yourTableSchema] ) ON [PRIMARY]; END
You could also use an edition check instead of a server name check, if you prefer. The post-snapshot script basically looked the same, except you create partitioned indexes instead.
Compress Old Partitions
Did you know you can set different compression levels for individual partitions? It's true! I've just completed doing this on our largest partitioned table. Here's how:
/* Apply compression to your partitioned table */ ALTER TABLE dbo.yourTableName Rebuild Partition = All WITH ( Data_Compression = Page ON Partitions(1 TO 9) , Data_Compression = ROW ON Partitions(10 TO 11) , Data_Compression = NONE ON Partitions(12) ); /* Apply compression to your partitioned index */ ALTER INDEX YourPartitionedIndex ON dbo.yourTableName Rebuild Partition = All WITH ( Data_Compression = Page ON Partitions(1 TO 9) , Data_Compression = ROW ON Partitions(10 TO 11) , Data_Compression = NONE ON Partitions(12) ); /* Apply compression to your unpartitioned index */ ALTER INDEX YourUnpartitionedIndex ON dbo.yourTableName Rebuild WITH (Data_Compression = ROW);
A couple of things to note. In all of our proof-of-concept testing, we found that compression significantly reduced query execution time, reads (IO), and storage. However, CPU was also increased significantly. The results were more dramatic, both good and bad, with page compression versus row compression. Still, for our older partitions, which aren't queried regularly, it made sense to turn on page compression. The newer partitions receive row compression, and the newest partitions, which are still queried very regularly by routine processes, were left completely uncompressed. This seems to strike a nice balance in our environment, but of course, results will vary depending on how you use your data.
Something to be aware of is that compressing your clustered index does *not* compress your non-clustered indexes; those are separate operations. Lastly, for those who are curious, it took us about 1 minute to apply row compression and about 7 minutes to apply page compression to partitions averaging 30 million rows.
Looking for more information on table partitioning? Check out my overview of partitioning, my example code, and my article on indexing on partitioned tables.
Find Recently Executed Stored Procedures
This past weekend, we had an issue where replication fell far behind on one of our databases. The replicated database is used for all sorts of reporting, so the immediate need was to identify processes that may have been affected by the incomplete data.
Now, there's hundreds of stored procedures that reference the affected database; the trick is finding out which ones are relevant. To do this, I used the sys.dm_exec_query_stats DMV. This does two things for me. One, it shows me a list of stored procedures in cache, meaning they've been executed relatively recently and are probably relevant to the search. Secondly, it shows me the last execution time, which in some cases may have been before the issue, meaning I do not need to worry about re-running those processes.
Here's the query I used:
SELECT DB_NAME(dest.[dbid]) AS 'databaseName' , OBJECT_NAME(dest.objectid, dest.[dbid]) AS 'procName' , MAX(deqs.last_execution_time) AS 'last_execution' FROM sys.dm_exec_query_stats AS deqs Cross Apply sys.dm_exec_sql_text(deqs.sql_handle) AS dest WHERE dest.[TEXT] Like '%yourTableName%' -- replace And dest.[dbid] IS Not Null -- exclude ad-hocs GROUP BY DB_NAME(dest.[dbid]) , OBJECT_NAME(dest.objectid, dest.[dbid]) ORDER BY databaseName , procName OPTION (MaxDop 1);
This will return results similar to:
databaseName procName last_execution -------------------- ------------------------------ ----------------------- AdventureWorks ufnGetProductListPrice 2009-08-03 09:57:25.390 AdventureWorksDW DimProductCategoryGet_sp 2009-08-03 09:59:05.820 AdventureWorksDW DimProductGet_sp 2009-08-03 09:58:38.370
I want to stress that this is *not* a list of all referencing objects, but rather a list of recently executed stored procedures that are still in memory. This list may not be accurate if your cache has recently been flushed or if you've recently rebooted your server.
Poor (Wo)Man’s Graph
Lary shared this poor (wo)man's graph with me today, and I thought it was pretty awesome:
SELECT OrderDate , COUNT(*) AS 'orders' , REPLICATE('=', COUNT(*)) AS 'orderGraph' , SUM(TotalDue) AS 'revenue' , REPLICATE('$', SUM(TotalDue)/1000) AS 'revenueGraph' FROM AdventureWorks.Sales.SalesOrderHeader WHERE OrderDate Between '2003-07-15' And '2003-07-31' GROUP BY OrderDate ORDER BY OrderDate;
This will return a simple but effective "graph" for you:
orderDate orders orderGraph revenue revenueGraph ---------- ------ ------------------------------ -------- ---------------------------------------- 2003-07-15 19 =================== 34025.24 $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ 2003-07-16 14 ============== 26687.65 $$$$$$$$$$$$$$$$$$$$$$$$$$$ 2003-07-17 16 ================ 32411.93 $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ 2003-07-18 9 ========= 18634.91 $$$$$$$$$$$$$$$$$$$ 2003-07-19 13 ============= 19603.23 $$$$$$$$$$$$$$$$$$$$ 2003-07-20 24 ======================== 47522.80 $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ 2003-07-21 9 ========= 11781.62 $$$$$$$$$$$$ 2003-07-22 17 ================= 32322.50 $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ 2003-07-23 15 =============== 30906.44 $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ 2003-07-24 28 ============================ 51107.90 $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ 2003-07-25 15 =============== 27058.10 $$$$$$$$$$$$$$$$$$$$$$$$$$$ 2003-07-26 18 ================== 41076.49 $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ 2003-07-27 15 =============== 22169.88 $$$$$$$$$$$$$$$$$$$$$$ 2003-07-28 16 ================ 23945.80 $$$$$$$$$$$$$$$$$$$$$$$$ 2003-07-29 25 ========================= 51122.95 $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ 2003-07-30 12 ============ 23476.44 $$$$$$$$$$$$$$$$$$$$$$$ 2003-07-31 18 ================== 36266.76 $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
Who needs Reporting Services when you've got REPLICATE?
Random Number Generator in T-SQL
Ever need to generate a random number in T-SQL? I have, on a couple of different occasions. I'm pretty sure that there's several different ways of doing this in T-SQL, but here's what I use:
DECLARE @maxRandomValue TINYINT = 100 , @minRandomValue TINYINT = 0; SELECT CAST(((@maxRandomValue + 1) - @minRandomValue) * RAND() + @minRandomValue AS TINYINT) AS 'randomNumber';
This approach uses the RAND() function to generate a random seed; it also ensures that the value returned is between the specified min and max value. I've been using this method in one stored procedure that's called a couple of hundred times per second, and it seems to perform pretty well.
What method do YOU use to generate a random number? Is it faster than this method?
Webcast Tomorrow!
I'm excited to be doing a webcast tomorrow with the infamous illustrious Brent Ozar for Quest's Pain-of-the-Week. The title is "Getting Started with SQL Server Management Studio," and as you've probably gathered, it's pretty entry-level stuff. If you read my blog, then chances are you don't need to watch this webcast. But if you know anyone who's trying to learn SQL Server or is trying to make the upgrade from 2000 to 2005/2008, this may be a good webcast for them.
I've also got a few other speaking engagements coming up:
June 2nd: Cedar Valley .NET User Group
I'll be reprising my Iowa Code Camp presentation on "SQL Server for the .NET Developer" for CVINETA. This presentation focuses on what you need to know about good table design, indexing strategies, and fragmentation... you know, what you wish every .NET developer knew about SQL Server.
June 11th: PoTW: Time-Saving SQL Server Management Studio Tips & Tricks
I'll also be doing this webcast with @BrentO as a follow-up to our webcast tomorrow. It will focus on how to save time and improve your sanity by using some neat little tricks in SSMS 2008.
Performance Considerations of Data Types
I've just finished my first real content for the PASS Performance SIG. I decided to write on "Performance Considerations of Data Types," as I think this is one of the easiest and most overlooked topics in performance tuning. Here's a summary:
Selecting inappropriate data types, especially on large tables with millions or billions of rows, can have significant performance implications. In this article, I’ll explain why and offer suggestions on how to select the most appropriate data type for your needs. The primary focus will be on common data types in SQL Server 2005 and 2008, but I’ll also discuss some aspects of clustered indexes and column properties. Most importantly, I’ll show some examples of common data-type misuse.
If you're interested in this content, you can find it here: Performance Considerations of Data Types.
Special thanks to Paul Randal and Paul Nielsen for providing me with technical reviews and great feedback. You guys are awesome!
Thanks also to Mladen Prajdic and Jeremiah Peschka for their great input. You guys are awesome, too!
Generate Columns for Update Statements
I'm not a fan of most CRUD generators. The formatting doesn't match my style, and I usually spend about as much time modifying the generated code as I would spend just writing it from scratch. But there's been times when I've considered using CRUD generators, mainly when I'm writing updates on wide tables. If you've never written an update for a table with many columns, it's not sexy. You're wasting valuable time on a tedious task that you could instead spend reading SQL Server 2008 Internals or chewing the cud with the SQL Twitterati.
Fortunately, Dave Carlile shared another tip with me that helps with this and has made it's way into my little bag of tricks.
Let's assume you having the following outline:
UPDATE sales SET ['insert really long column list'] FROM Sales.vStoreWithDemographics AS sales Join myTempTable AS mtt ON sales.someColumn = mtt.someColumn;
You could use the following code to generate a list of columns for you:
SELECT name + ' = sales.' + name + ',' FROM sys.columns WHERE OBJECT_ID = OBJECT_ID('Sales.vStoreWithDemographics') ORDER BY column_id;
Just replace [Sales.vStoreWithDemographics] with a table of your choice, and replace "sales." with the appropriate alias.This will return a list of nicely formatted columns for you. Best of all, no potential for column typos! Just don't forget to remove the very last comma, otherwise you'll get a syntax error.
CustomerID = sales.CustomerID, Name = sales.Name, ContactType = sales.ContactType, (etc.)
I know, nothing earth shattering, but definitely one of those "huh, why didn't I think of that?" moments. So, thanks, Dave!
Source: http://sqlfool.com/2009/03/generate-columns-for-update-statements
Categories
- Business Intelligence
- Internals
- Miscellaneous
- PASS
- Performance & Tuning
- Presentations
- SQL 2008
- SQL Tips
- Syndication
- T-SQL Scripts
Subscribe to my blog!
| Like what you see? Subscribe! |
![]() |
Around the Web
Recent Tweets
- @zippy1981 I'm actually using @RedGate SQL Compare right now. It's worth every penny. #sqlhelp #redgate
- +1 :) RT @onpnt: Very well said, Janice :) @JaniceCLee your blog if full of WIN http://bit.ly/aZ4wPR
- @SQLDBA You're flying out of Orlando so there's def the possibility of a better deal. But I wouldn't do it unless you're a morning person :)


