SQL Fool Adventures in SQL Tuning – a blog for the rest of us

2Nov/111

Are You Approaching Your Partition Range Limits?

In my post last week, How To Estimate Data Utilization, I said that it may be my last post for a while. Well... apparently I lied. :)

For those of you who use table partitioning, you know that you need to define a partitioning scheme and function prior to applying partitioning to an index. Personally, I tend to build the function for a couple of years out, and I tend to create them through the end of a calendar year. Now, if I failed to expand a partition range at the end of the year, then come January 1st, all of my data would be written to the same partition. Not the end of the world, no, but it causes all kinds of nasty performance and maintenance issues. Thus, as part of my end-of-year / maternity-leave preparations, I'm in the process of examining all partitioned functions to identify those that need to have their partition ranges expanded. For those interested, here's the script I used:

IF OBJECT_ID('tempdb..#Results') IS NOT NULL
    DROP TABLE #Results;
 
CREATE TABLE #Results
(
      databaseName  NVARCHAR(128)
    , schemaName    NVARCHAR(128)
    , functionName  NVARCHAR(128)
    , data_space_id INT
    , maxRangeValue SQL_VARIANT
)
 
/* Grab results for each database and store in our temp table.  
   And no, I don't *need* to select from sys.indexes and perform 
   left joins, but I'm overly cautious and want to make sure 
   I'm not accidentally missing any databases. :) */
 
--EXECUTE master.dbo.sp_msforeachdb
EXECUTE sp_foreachdb 'USE ?;
INSERT INTO #Results
SELECT DB_NAME() AS databaseName
    , sps.name AS schemaName
    , spf.name AS functionName
    , sps.data_space_id 
    , MAX(prv.value) AS maxRangeValue
FROM sys.indexes AS i
LEFT JOIN sys.partition_schemes AS sps WITH (NOLOCK)
    ON i.data_space_id = sps.data_space_id
LEFT JOIN sys.partition_functions AS spf WITH (NOLOCK)
    ON sps.function_id = spf.function_id
LEFT JOIN sys.partition_range_values AS prv WITH (NOLOCK)
    ON spf.function_id = prv.function_id
GROUP BY sps.name
    , spf.name
    , sps.data_space_id;';
/*  
    sp_foreachdb was written by SQL MVP Aaron Bertrand and can be downloaded 
    at http://www.mssqltips.com/sqlservertip/2201/making-a-more-reliable-and-flexible-spmsforeachdb/
    Alternatively, you can also use sys.sp_MSforeachdb
*/
 
/* Make sure we're not missing any major databases */
SELECT * FROM sys.databases WHERE name NOT IN (SELECT databaseName FROM #Results);
 
/* Retrieve our results */
SELECT * 
FROM #Results
WHERE schemaName IS NOT NULL
ORDER BY maxRangeValue;

Example Results:

databaseName        schemaName                      functionName                          data_space_id   maxRangeValue
------------------- ------------------------------- ------------------------------------- --------------- -------------------------
HistoricalMart      dailyRangeDate_ps               dailyRangeDate_pf                     65609           2011-12-31 00:00:00.000
AdventureWorks      yearlyRangeSmallDateTime_ps     yearlyRangeSmallDateTime_pf           65605           2012-01-01 00:00:00.000
dbaTools            monthlyRangeDateTime_ps         monthlyRangeDateTime_pf               65604           2012-12-01 00:00:00.000
28Oct/112

How To Estimate Data Utilization

Recently, on a conference call presenting data growth rates and database capacity projections, I had a top-line executive ask, "But how much of that data are we actually using today?" The question was met with silence; unless you have rigorous auditing in place -- and kudos to you if you do -- it's a difficult question to answer. But it begs the question, is there some way to gleam this information from SQL Server? I think the answer is "yes," if you make some assumptions and understand what you're looking at.

SQL Server collects stats about every time an index is used and how it is used (i.e. whether a user seeked or scanned the index, etc.). It also provides a DMV to view these stats: sys.dm_db_index_usage_stats.

This DMV provides a wealth of great information, but to answer our question of "What data is actually being used?", we have to refine our criteria. Are we talking in terms of table counts or data size? I'd argue that data size is more important than table counts; one unqueried millow-row table is more wasteful than a hundred ten-row tables.

Also, are we looking at indexes or content? From a database perspective, I'm more interested in indexes: how much space are we wasting on unused indexes? To identify this, I need to look at the activity on each individual index.

From a business perspective, I would be more interested in content (i.e. tables): how much business information is being stored that no one is even looking at? To answer this question, I need to roll up all index usage to see if *any* of the indexes on a table were used. Since both were of interest to me, I decided to write queries to answer both questions.

Lastly, we need to understand the flaws with this data. Chiefly, I cannot tell whether a user requested one row from a million-row table, or if [s]he needed all of the data in the table. This is a pretty important issue, especially with large historical data stores, and it's where I have to make the biggest assumption: if even one person looked at one row in the table, I count all pages in the table as having been accessed.

Now, you may make different decisions than I did above, and that's fine... each environment and project has different needs. But these assumptions are very important to understanding the output of the query below:

USE master;
GO
 
/* 
    This will give you an approximation of how much data is being utilized on a server.
    Since the data is only valid as of the last server reboot, we should start off with
    an idea of how much data we've accrued.  
*/
 
/* Find out when the server was last rebooted */
-- 2008
SELECT sqlserver_start_time FROM sys.dm_os_sys_info;
-- 2005
SELECT create_date FROM sys.databases WHERE name = 'tempdb';
 
 
/* Create a temporary table to hold our data, since we're going to iterate through databases */
IF OBJECT_ID('tempdb..#Results') IS NOT NULL
    DROP TABLE #Results;
 
CREATE TABLE #Results
(
      databaseName  NVARCHAR(128)
    , tableName     NVARCHAR(128)
    , indexID       INT
    , records       BIGINT
    , activity      BIGINT
    , totalPages    BIGINT
);
 
/*  
    sp_foreachdb was written by SQL MVP Aaron Bertrand and can be downloaded 
    at http://www.mssqltips.com/sqlservertip/2201/making-a-more-reliable-and-flexible-spmsforeachdb/
    Alternatively, you can also use sys.sp_MSforeachdb
*/
--EXECUTE master.dbo.sp_foreachdb
EXECUTE sys.sp_MSforeachdb
'   USE ?; 
 
    -- You can gleam a lot of information about historical data usage from partitions
    -- but for now, we will just roll up any partitions we may have
    WITH myCTE AS
    (
        SELECT p.[object_id] AS objectID
            , p.index_id
            , SUM(p.[rows]) AS records
            , SUM(au.total_pages) AS totalPages
        FROM sys.partitions AS p WITH (NOLOCK)
        JOIN sys.allocation_units AS au WITH (NOLOCK)
            ON p.hobt_id = au.container_id
        GROUP BY p.[object_id] 
            , p.index_id
    )
 
    -- Grab all tables and join to our usage stats DMV
    INSERT INTO #Results
    SELECT DB_NAME() AS databaseName
        , t.name
        , x.index_id
        , MAX(x.records) AS records
        , ISNULL(SUM(us.user_lookups + us.user_scans + us.user_seeks), 0) AS activity
        , SUM(x.totalPages) AS totalPages
    FROM sys.tables AS t WITH (NOLOCK)
    JOIN myCTE AS x
        ON t.[object_id] = x.objectID
    LEFT JOIN sys.dm_db_index_usage_stats AS us WITH (NOLOCK)
        ON us.[object_id] = x.objectID
        AND us.index_id = x.index_id
        AND us.database_id = DB_ID()
    GROUP BY t.name
    , x.index_id;'
 
/* Because we're looping through databases, make sure we're not missing any major ones */
SELECT * FROM sys.databases WHERE name NOT IN (SELECT databaseName FROM #Results);
 
/* Retrieve actual % data utilization, which is performed at the index level */
SELECT databaseName
    , SUM(queriedPages) AS TotalQueriedPages
    , SUM(totalPages) AS TotalPages
    , CAST(SUM(queriedPages) AS FLOAT) / REPLACE(SUM(totalPages), 0, 1) AS '%DataUtil'
FROM (
    SELECT databaseName
        , tableName
        , indexID
        , CASE -- If we have any activity at all on an index, count it as activity
            WHEN activity = 0 THEN 0.0
            ELSE totalPages
          END AS queriedPages
        , totalPages
    FROM #Results
    WHERE databaseName NOT IN ('master', 'tempdb', 'msdb', 'model')
) x
GROUP BY databaseName
ORDER BY databaseName;
 
/* Retrieve % content utilization, which is performed at the table level */
SELECT databaseName
    , SUM(queriedPages) AS TotalQueriedPages
    , SUM(totalPages) AS TotalPages
    , CAST(SUM(queriedPages) AS FLOAT) / REPLACE(SUM(totalPages), 0, 1) AS '%ContentUtil'
FROM (
    SELECT databaseName
        , tableName
        , MAX(records) AS records
        , CASE WHEN SUM(activity) > 0 THEN SUM(totalPages) ELSE 0 END AS queriedPages
        , SUM(totalPages) AS totalPages
    FROM #Results
    WHERE databaseName NOT IN ('master', 'tempdb', 'msdb', 'model')
    GROUP BY databaseName
        , tableName
) x
GROUP BY databaseName
ORDER BY databaseName;

Results:

databaseName               TotalQueriedPages   TotalPages           %DataUtil
-------------------------- ------------------- -------------------- ----------------------
Database1 		   127618701           130607247            0.969619893356378
Database2 		   567188              1614958              0.351209133612143
Database3 		   34269036            34579469             0.991022620966216
Database4 		   137970594           170733391            0.803399928206158
Database5 		   74632930            101543575            0.66909214627557
Database6 		   55809933            72884205             0.765734157938039
Database7 		   560810026           620609815            0.902175272517656
 
databaseName               TotalQueriedPages   TotalPages           %ContentUtil
-------------------------- ------------------- -------------------- ----------------------
Database1 		   127763715           130607247            0.970721679051682
Database2 		   571125              1614958              0.353646967908763
Database3 		   34269036            34579469             0.991022620966216
Database4 		   137970921           170733391            0.803399928206158
Database5 		   96144726            101543575            0.861947682777784
Database6 		   72269666            72884205             0.991568146820268
Database7 		   620525938           620609815            0.998240279711804

The first result set examines the utilization of indexes, and the second result set examines the utilization of data at the content (table) level. For example, if we look at Database6, we'll see that we are only utilizing 77% of our indexes, but we're looking at 99% of our table data. So this is a good indicator that we have some unused indexes to clean up in that database.

Know a better way to answer this question using SQL Server DMV's? Please leave me a comment so I can learn from your experience. :)

In unrelated news, this may be my last blog post for a little while. I'm due with my second child a week from today and expect all of my free time to be consumed by him for a little while. That and, quite frankly, I do not trust myself near a computer, especially a database, in such a sleep-deprived state. :)

13Jul/115

Calculate Rows Inserted per Second for All Tables

Ever needed to calculate the number of rows inserted every second, for every table in every database on a server? Or, have you ever needed to validate that all processes have stopped writing to tables? These types of questions come up routinely for me. To help with this, I've written the following script, which examines metadata values using sys.partitions. This method isn't as accurate as running SELECT COUNT(*) FROM, but it's much faster. Keep in mind, since it's just looking at row counts, it's not much help on tables that have a lot of update/delete activity. But it does what I need it to do, and I use it pretty regularly, so I thought I'd share in case anyone else can benefit from it too. :)

/* Declare Parameters */
DECLARE @newBaseline BIT = 1 -- change to 0 when you don't want to replace the baseline, i.e. after initial run
  , @delay CHAR(8) = '00:00:30'; -- change as needed
 
IF @newBaseline = 1 
BEGIN
    IF OBJECT_ID('tempdb..#baseline') IS NOT NULL
        DROP TABLE #baseline;
 
    CREATE TABLE #baseline
    (
         database_name  SYSNAME
       , table_name     SYSNAME
       , table_rows     BIGINT
       , captureTime    DATETIME NULL
    );
END
 
IF OBJECT_ID('tempdb..#current') IS NOT NULL
    DROP TABLE #current;
 
CREATE TABLE #current
(
     database_name  SYSNAME
   , table_name     SYSNAME
   , table_rows     BIGINT
   , captureTime    DATETIME NULL
);
 
IF @newBaseline = 1 
BEGIN
    EXECUTE sp_MSforeachdb 'USE ?; 
        INSERT INTO #baseline
        SELECT DB_NAME()
            , o.name As [tableName]
            , SUM(p.[rows]) As [rowCnt]
            , GETDATE() As [captureTime]
        FROM sys.indexes As i
        JOIN sys.partitions As p
            ON i.[object_id] = p.[object_id]
           AND i.index_id  = p.index_id
        JOIN sys.objects As o
            ON i.[object_id] = o.[object_id]
        WHERE i.[type] = 1
        GROUP BY o.name;'
 
    WAITFOR DELAY @delay;
END
 
EXECUTE sp_MSforeachdb 'USE ?; 
INSERT INTO #current
SELECT DB_NAME()
    , o.name As [tableName]
    , SUM(p.[rows]) As [rowCnt]
    , GETDATE() As [captureTime]
FROM sys.indexes As i
JOIN sys.partitions As p
    ON i.[object_id] = p.[object_id]
   AND i.index_id  = p.index_id
JOIN sys.objects As o
    ON i.[object_id] = o.[object_id]
WHERE i.[type] = 1
GROUP BY o.name;'
 
SELECT  c.*
      , c.table_rows - b.table_rows AS 'new_rows'
      , DATEDIFF(second, b.captureTime, c.captureTime) AS 'time_diff'
      , (c.table_rows - b.table_rows) / DATEDIFF(second, b.captureTime, c.captureTime) AS 'rows_per_sec'
FROM #baseline AS b
JOIN #current AS c
    ON b.table_name = c.table_name
   AND b.database_name = c.database_name
ORDER BY new_rows DESC;
10Jun/115

T-SQL Script for Estimating Compression Savings

A couple of weeks ago, I was working on a Microsoft PDW proof-of-concept (POC) and had to measure compression ratios. In order to do this, I fired up SSMS and wrote a little script. The script will iterate through all tables in a database and run the sp_estimate_data_compression_savings stored procedure. This will only work in SQL Server 2008+ versions running Enterprise edition.

If you're not familiar with this stored procedure, it basically will tell you what effect PAGE or ROW compression will have on your table/index/partition, etc. There are pro's and con's with compression. What I've tended to see is that compression has very positive results on space, IO, and query duration, with a negative impact on CPU and write speed. Like most things, it's a trade-off and the results will vary by environment, so I recommend you do some testing before you apply compression to all tables. I tend to use compression mostly for my historical tables and partitions and leave my recent data uncompressed. And, back to the script, I use this stored procedure to estimate the impact of compression and to determine whether to use PAGE or ROW compression. PAGE is a higher level of compression, which means it's also more expensive in terms of CPU, so if the difference between the two results is negligible, I'm more apt to just use ROW compression.

Now that my impromptu compression discussion is done, let's get to the actual script. One final word of caution, however. This is an IO intensive process, so you may want to run it after peak business hours.

SET NOCOUNT ON;
 
DECLARE @printOnly  BIT = 0 -- change to 1 if you don't want to execute, just print commands
    , @tableName    VARCHAR(256)
    , @schemaName   VARCHAR(100)
    , @sqlStatement NVARCHAR(1000)
    , @tableCount   INT
    , @statusMsg    VARCHAR(1000);
 
IF EXISTS(SELECT * FROM tempdb.sys.tables WHERE name LIKE '%#tables%')
    DROP TABLE #tables; 
 
CREATE TABLE #tables
(
      database_name     sysname
    , schemaName        sysname NULL
    , tableName         sysname NULL
    , processed         bit
);
 
IF EXISTS(SELECT * FROM tempdb.sys.tables WHERE name LIKE '%#compression%')
    DROP TABLE #compressionResults;
 
IF NOT EXISTS(SELECT * FROM tempdb.sys.tables WHERE name LIKE '%#compression%')
BEGIN 
 
    CREATE TABLE #compressionResults
    (
          objectName                    varchar(100)
        , schemaName                    varchar(50)
        , index_id                      int
        , partition_number              int
        , size_current_compression      bigint
        , size_requested_compression    bigint
        , sample_current_compression    bigint
        , sample_requested_compression  bigint
    );
 
END;
 
INSERT INTO #tables
SELECT DB_NAME()
    , SCHEMA_NAME([schema_id])
    , name
    , 0 -- unprocessed
FROM sys.tables;
 
SELECT @tableCount = COUNT(*) FROM #tables;
 
WHILE EXISTS(SELECT * FROM #tables WHERE processed = 0)
BEGIN
 
    SELECT TOP 1 @tableName = tableName
        , @schemaName = schemaName
    FROM #tables WHERE processed = 0;
 
    SELECT @statusMsg = 'Working on ' + CAST(((@tableCount - COUNT(*)) + 1) AS VARCHAR(10)) 
        + ' of ' + CAST(@tableCount AS VARCHAR(10))
    FROM #tables
    WHERE processed = 0;
 
    RAISERROR(@statusMsg, 0, 42) WITH NOWAIT;
 
    SET @sqlStatement = 'EXECUTE sp_estimate_data_compression_savings ''' 
                        + @schemaName + ''', ''' + @tableName + ''', NULL, NULL, ''PAGE'';' -- ROW, PAGE, or NONE
 
    IF @printOnly = 1
    BEGIN 
 
        SELECT @sqlStatement;
 
    END
    ELSE
    BEGIN
 
        INSERT INTO #compressionResults
        EXECUTE sp_executesql @sqlStatement;
 
    END;
 
    UPDATE #tables
    SET processed = 1
    WHERE tableName = @tableName
        AND schemaName = @schemaName;
 
END;
 
SELECT * 
FROM #compressionResults;
13Jan/115

Identity Columns: Are You Nearing The Limits?

I use identity columns frequently. After all, identity columns make great clustering keys. But it's important when using identity columns to check on the amount of values you have left before you reach the limit of your data type. An identity column has a fixed amount of values it can use based upon whether you specified tinyint, smallint, int, or bigint when you defined the column. If you reach this limit, your inserts will blow up and cause a Chernobyl-like SQL meltdown will begin to fail. I just finished an audit of my tables and thought I'd share the script. I would like to warn that this script is *not* perfect... namely, it doesn't handle negative integer values very elegantly. It also doesn't know if you started your seed at zero, approached your max positive limit, then reseeded to the negative limit (see my "quick and dirty fix" tip at the end of this article).

/* Define how close we are to the value limit
   before we start throwing up the red flag.
   The higher the value, the closer to the limit. */
Declare @threshold decimal(3,2) = .85;
 
/* Create a temp table */
Create Table #identityStatus
(
      database_name     varchar(128)
    , table_name        varchar(128)
    , column_name       varchar(128)
    , data_type         varchar(128)
    , last_value        bigint
    , max_value         bigint
);
 
/* Use an undocumented command to run a SQL statement
   in each database on a server */
Execute sp_msforeachdb '
    Use [?];
    Insert Into #identityStatus
    Select ''?'' As [database_name]
        , Object_Name(id.object_id, DB_ID(''?'')) As [table_name]
        , id.name As [column_name]
        , t.name As [data_type]
        , Cast(id.last_value As bigint) As [last_value]
        , Case 
            When t.name = ''tinyint''   Then 255 
            When t.name = ''smallint''  Then 32767 
            When t.name = ''int''       Then 2147483647 
            When t.name = ''bigint''    Then 9223372036854775807
          End As [max_value]
    From sys.identity_columns As id
    Join sys.types As t
        On id.system_type_id = t.system_type_id
    Where id.last_value Is Not Null';
 
/* Retrieve our results and format it all prettily */
Select database_name
    , table_name
    , column_name
    , data_type
    , last_value
    , Case 
        When last_value < 0 Then 100
        Else (1 - Cast(last_value As float(4)) / max_value) * 100 
      End As [percentLeft]
    , Case 
        When Cast(last_value As float(4)) / max_value >= @threshold
            Then 'warning: approaching max limit'
        Else 'okay'
        End As [id_status]
From #identityStatus
Order By percentLeft;
 
/* Clean up after ourselves */
Drop Table #identityStatus;

If you find yourself quickly approaching your max limit and need to implement a quick and dirty fix, you can reseed your identity column. Of course, this only works if you started at zero instead of the actual lower, negative limit.

20Dec/105

SQL Agent Job Script

This blog post is brought to you by the awesomeness that is the SQL Server Twitter community. Contributions were made by several awesome Tweeples, including Denny Cherry, Kendra Little, Ken Simmons, and Magnus Ahlkvist, among others.

What started this is something you've all probably run into at one time or another. We're currently having some resource contention on our server when a ton of processes kickoff and overlap in the morning. Now, I have a script that I've used in the past for monitoring SQL agent jobs, but this time I wanted to add some historical run-time information. I know the sysjobhistory table contains the information I need, but it has some... let's say, creative ways of storing the data. Opinions on the reasons why vary:



Regardless the reason, I needed to do some conversions. Denny shared with us the msdb.dbo.agent_datetime function to convert run_date and run_time into an actual datetime value. I have to say, this certainly cleans up the code quite a bit! Then Magnus shared a method to convert run_duration into seconds, which he modified from a script on SQLServerCentral. I was able to use these two tidbits to update my old script. You can now run this script to get back a list of all enabled procs, a script that will disable them, a breakdown of the schedule, and a historical run-time average.

/*  How many days do you want to include in your run-time average?
    Recent values tend to be more useful. */
Declare @daysToAverage smallint = 30;
 
Declare @avgRunTime Table
(
      job_id      uniqueidentifier
    , avgRunTime  int
);
 
/* We need to parse the schedule into something we can understand */
Declare @weekDay Table (
      mask       int
    , maskValue  varchar(32)
);
 
Insert Into @weekDay
Select 1, 'Sunday'  Union All
Select 2, 'Monday'  Union All
Select 4, 'Tuesday'  Union All
Select 8, 'Wednesday'  Union All
Select 16, 'Thursday'  Union All
Select 32, 'Friday'  Union All
Select 64, 'Saturday';
 
/* First, let's get our run-time average */
Insert Into @avgRunTime
Select job_id
    , Avg((run_duration/10000) * 3600 + (run_duration/100%100)*60 + run_duration%100) As 'avgRunTime' /* convert HHMMSS to seconds */
From msdb.dbo.sysjobhistory
Where step_id = 0 -- only grab our total run-time
    And run_status = 1 -- only grab successful executions
    And msdb.dbo.agent_datetime(run_date, run_time) >= DateAdd(day, -@daysToAverage, GetDate())
Group By job_id;
 
/* Now let's get our schedule information */
With myCTE
As(
    Select sched.name As 'scheduleName'
        , sched.schedule_id
        , jobsched.job_id
        , Case When sched.freq_type = 1 Then 'Once' 
            When sched.freq_type = 4 
                And sched.freq_interval = 1 
                    Then 'Daily'
            When sched.freq_type = 4 
                Then 'Every ' + Cast(sched.freq_interval As varchar(5)) + ' days'
            When sched.freq_type = 8 Then 
                Replace( Replace( Replace(( 
                    Select maskValue 
                    From @weekDay As x 
                    Where sched.freq_interval & x.mask <> 0 
                    Order By mask For XML Raw)
                , '"/><row maskValue="', ', '), '<row maskValue="', ''), '"/>', '') 
                + Case When sched.freq_recurrence_factor <> 0 
                        And sched.freq_recurrence_factor = 1 
                            Then '; weekly' 
                    When sched.freq_recurrence_factor <> 0 Then '; every ' 
                + Cast(sched.freq_recurrence_factor As varchar(10)) + ' weeks' End
            When sched.freq_type = 16 Then 'On day ' 
                + Cast(sched.freq_interval As varchar(10)) + ' of every '
                + Cast(sched.freq_recurrence_factor As varchar(10)) + ' months' 
            When sched.freq_type = 32 Then 
                Case When sched.freq_relative_interval = 1 Then 'First'
                    When sched.freq_relative_interval = 2 Then 'Second'
                    When sched.freq_relative_interval = 4 Then 'Third'
                    When sched.freq_relative_interval = 8 Then 'Fourth'
                    When sched.freq_relative_interval = 16 Then 'Last'
                End + 
                Case When sched.freq_interval = 1 Then ' Sunday'
                    When sched.freq_interval = 2 Then ' Monday'
                    When sched.freq_interval = 3 Then ' Tuesday'
                    When sched.freq_interval = 4 Then ' Wednesday'
                    When sched.freq_interval = 5 Then ' Thursday'
                    When sched.freq_interval = 6 Then ' Friday'
                    When sched.freq_interval = 7 Then ' Saturday'
                    When sched.freq_interval = 8 Then ' Day'
                    When sched.freq_interval = 9 Then ' Weekday'
                    When sched.freq_interval = 10 Then ' Weekend'
                End
                + Case When sched.freq_recurrence_factor <> 0 
                        And sched.freq_recurrence_factor = 1 Then '; monthly'
                    When sched.freq_recurrence_factor <> 0 Then '; every ' 
                + Cast(sched.freq_recurrence_factor As varchar(10)) + ' months' End
            When sched.freq_type = 64 Then 'StartUp'
            When sched.freq_type = 128 Then 'Idle'
          End As 'frequency'
        , IsNull('Every ' + Cast(sched.freq_subday_interval As varchar(10)) + 
            Case When sched.freq_subday_type = 2 Then ' seconds'
                When sched.freq_subday_type = 4 Then ' minutes'
                When sched.freq_subday_type = 8 Then ' hours'
            End, 'Once') As 'subFrequency'
        , Replicate('0', 6 - Len(sched.active_start_time)) 
            + Cast(sched.active_start_time As varchar(6)) As 'startTime'
        , Replicate('0', 6 - Len(sched.active_end_time)) 
            + Cast(sched.active_end_time As varchar(6)) As 'endTime'
        , Replicate('0', 6 - Len(jobsched.next_run_time)) 
            + Cast(jobsched.next_run_time As varchar(6)) As 'nextRunTime'
        , Cast(jobsched.next_run_date As char(8)) As 'nextRunDate'
    From msdb.dbo.sysschedules As sched
    Join msdb.dbo.sysjobschedules As jobsched
        On sched.schedule_id = jobsched.schedule_id
    Where sched.enabled = 1
)
 
/* Finally, let's look at our actual jobs and tie it all together */
Select job.name As 'jobName'
    , sched.scheduleName
    , sched.frequency
    , sched.subFrequency
    , SubString(sched.startTime, 1, 2) + ':' 
        + SubString(sched.startTime, 3, 2) + ' - ' 
        + SubString(sched.endTime, 1, 2) + ':' 
        + SubString(sched.endTime, 3, 2) 
        As 'scheduleTime' -- HH:MM
    , SubString(sched.nextRunDate, 1, 4) + '/' 
        + SubString(sched.nextRunDate, 5, 2) + '/' 
        + SubString(sched.nextRunDate, 7, 2) + ' ' 
        + SubString(sched.nextRunTime, 1, 2) + ':' 
        + SubString(sched.nextRunTime, 3, 2) As 'nextRunDate'
      /* Note: the sysjobschedules table refreshes every 20 min, 
        so nextRunDate may be out of date */
    , 'Execute msdb.dbo.sp_update_job @job_id = ''' 
        + Cast(job.job_id As char(36)) + ''', @enabled = 0;' As 'disableScript'
    , art.avgRunTime As 'avgRunTime_inSec' -- in seconds
    , (art.avgRunTime / 60) As 'avgRunTime_inMin' -- convert to minutes
From msdb.dbo.sysjobs As job
Join myCTE As sched
    On job.job_id = sched.job_id
Left Join @avgRunTime As art
    On job.job_id = art.job_id
Where job.enabled = 1 -- do not display disabled jobs
Order By nextRunDate;

If this doesn't do exactly what you want, check out SQLJobVis, which Ken recommended. It's a free tool that helps visualize the job history of SQL jobs.

28Oct/101

Metadata for Table Valued Parameters

Table-valued parameters (TVP) are a great feature that was new in SQL Server 2008 that allow you to insert a dataset into a table. Previously, the most common way of doing this was by passing and parsing XML. As I've previously posted, TVP's perform an astounding 94% faster than singleton inserts and 75% faster than XML inserts. But for some reason, TVP's still aren't widely used and understood. In this post, I'll walk you through how to use these and how to query the metadata for TVP's.

I've previously posted about what TVP's are and how to use them. But in honor of Halloween this week, I've updated my demo script:

/* Create some tables to work with */
CREATE TABLE dbo.orders
(
      order_id      INT IDENTITY(1,1)   Not Null
    , orderDate     DATE                Not Null
    , customer      VARCHAR(20)         Not Null
 
    CONSTRAINT PK_orders
        PRIMARY KEY CLUSTERED(order_id)
);
 
CREATE TABLE dbo.orderDetails
(
      orderDetail_id    INT IDENTITY(1,1)   Not Null
    , order_id          INT                 Not Null
    , lineItem          INT                 Not Null
    , product           VARCHAR(20)         Not Null
 
    CONSTRAINT PK_orderDetails
        PRIMARY KEY CLUSTERED(orderDetail_id)
 
    CONSTRAINT FK_orderDetails_orderID
        FOREIGN KEY(order_id)
        REFERENCES dbo.orders(order_id)
);
 
 
/* Create our new table types */
CREATE TYPE dbo.orderTable AS TABLE 
( 
      orderDate     DATE
    , customer      VARCHAR(20)
);
GO
 
CREATE TYPE dbo.orderDetailTable AS TABLE 
( 
      lineItem      INT
    , product       VARCHAR(20)
);
GO
 
 
/* Create a new procedure using a table-valued parameter */
CREATE PROCEDURE dbo.insert_orderTVP_sp
      @myOrderTable orderTable READONLY
    , @myOrderDetailTable orderDetailTable READONLY
AS
BEGIN
 
    SET NOCOUNT ON;
 
    DECLARE @myOrderID INT;
 
    INSERT INTO dbo.orders
    SELECT orderDate
        , customer
    FROM @myOrderTable;
 
    SET @myOrderID = SCOPE_IDENTITY();
 
    INSERT INTO dbo.orderDetails
    SELECT @myOrderID
        , lineItem
        , product
    FROM @myOrderDetailTable;
 
    SET NOCOUNT OFF;
 
END
GO
 
 
/* Call our new proc! */
DECLARE @myTableHeaderData AS orderTable
    , @myTableDetailData AS orderDetailTable;
 
INSERT INTO @myTableHeaderData
(orderDate, customer)
VALUES (GETDATE(), 'Zombie');
 
INSERT INTO @myTableDetailData
(lineItem, product)
SELECT 10, 'Brains' UNION ALL
SELECT 20, 'More Brains';
 
EXECUTE dbo.insert_orderTVP_sp 
      @myTableHeaderData
    , @myTableDetailData;
 
DELETE FROM @myTableHeaderData;
DELETE FROM @myTableDetailData;
 
INSERT INTO @myTableHeaderData
(orderDate, customer)
VALUES (GETDATE(), 'Vampire');
 
INSERT INTO @myTableDetailData
(lineItem, product)
SELECT 10, 'Blood Type O+' UNION ALL
SELECT 20, 'Blood Type B-' UNION ALL
SELECT 30, 'Blood Type AB+' UNION ALL
SELECT 40, 'Blood Type A+';
 
EXECUTE dbo.insert_orderTVP_sp 
      @myTableHeaderData
    , @myTableDetailData;
 
 
/* Check our data */
SELECT * FROM dbo.orders;
SELECT * FROM dbo.orderDetails;

Once you've run this, you should see the following data:

order_id    orderDate  customer
----------- ---------- --------------------
1           2010-10-28 Zombie
2           2010-10-28 Vampire
 
(2 row(s) affected)
 
orderDetail_id order_id    lineItem    product
-------------- ----------- ----------- --------------------
1              1           10          Brains
2              1           20          More Brains
3              2           10          Blood Type O+
4              2           20          Blood Type B-
5              2           30          Blood Type AB+
6              2           40          Blood Type A+
 
(6 row(s) affected)

Now that we've successfully created a couple of table types to support our TVP's, how do we go back and find out which objects we've created? You can query the sys.types catalog view to find out. Just search for system_type_id 243, which identifies the record as a table type.

/* Let's check out our new data types */
SELECT name
    , system_type_id
    , is_table_type
FROM sys.types
WHERE system_type_id = 243;
GO
name                 system_type_id is_table_type
-------------------- -------------- -------------
orderTable           243            1
orderDetailTable     243            1
 
(2 row(s) affected)

Even better yet, you can use the sys.table_types catalog view. This gives us the same information as sys.types but also gives us the type_table_object_id, which we'll need shortly.

SELECT name
    , system_type_id
    , is_table_type
    , type_table_object_id
FROM sys.table_types;
name                 system_type_id is_table_type type_table_object_id
-------------------- -------------- ------------- --------------------
orderTable           243            1             917578307
orderDetailTable     243            1             933578364
 
(2 row(s) affected)

What if you need to look up the table type definition? You can do this using the type_table_object_id and joining to sys.columns.

SELECT tt.name AS 'table_type_name'
    , c.name AS 'column_name'
    , t.name AS 'data_type'
FROM sys.table_types AS tt
JOIN sys.columns AS c
    ON tt.type_table_object_id = c.object_id
JOIN sys.types As t
    ON c.system_type_id = t.system_type_id;
table_type_name      column_name     data_type
-------------------- --------------- ---------------
orderTable           orderDate       date
orderDetailTable     lineItem        int
orderTable           customer        varchar
orderDetailTable     product         varchar
 
(4 row(s) affected)

And last, but certainly not least, how do we see if any procs are currently using the table types? SQL Server 2008 makes this easy for us with the sys.dm_sql_referencing_entities DMV.

SELECT referencing_schema_name, referencing_entity_name, referencing_id
FROM sys.dm_sql_referencing_entities ('dbo.orderTable', 'TYPE');
referencing_schema_name referencing_entity_name referencing_id
----------------------- ----------------------- --------------
dbo                     insert_orderTVP_sp      949578421
 
(1 row(s) affected)

If you're wondering how to implement SQL Server TVP's in your .NET code, well... I can't tell you how to do it, but I can point you to a place that can. Stephen Forte has a post that explains how easy it is to do.

So now that you have a better understanding of how to work with TVP's, why don't you go implement one in your environment and see how for yourself just how awesome it is? :)

Oh, and Happy Halloween!

14Sep/102

Rename Database Objects En Masse

Ever need to rename all objects in a database? Yeah, okay... it doesn't happen very often, but when it does, it can be time consuming. This recently came up as something I needed to do. When you consider all the tables involved and you add in defaults, indexes, and foreign keys, well... you can imagine how the number of objects adds up quickly. After doing a few tedious renames, it occurred to me that I could write a script to generate the rename scripts for me. Work smarter, not harder, right? For anyone interested, here's the script.

Select name
    , [object_id] 
    , Case 
        When [type] = 'U' Then 'Execute sp_rename N''' + name + ''', N''old_' + name + ''''
        When [type] IN ('D', 'PK', 'F') Then 
            'Execute sp_rename N''' + name + ''', N''old_' + name + ''', N''OBJECT'''
        End As 'renameScript'
    , Case When parent_object_id > 0 Then 0 Else 1 End As 'sortOrder'
From sys.objects
Where [type] In ('D', 'PK', 'U', 'F')
Union ALL
Select i.name
    , o.[object_id]
    , 'Execute sp_rename N''' + o.name + '.' + i.name + ''', N''old_' + i.name + ''', N''INDEX''' As 'renameScript'
    , 0 As 'sortOrder'
From sys.indexes As i
JOIN sys.objects As o
    On i.object_id = o.object_id
Where i.is_primary_key = 0 -- exclude PKs, we take care of that above
    AND i.type <> 0 -- exclude heaps
    AND o.type Not In ('S', 'IT') -- exclude system & internal tables
Order By sortOrder;

Be forewarned that I only tested this on a couple of databases, but it seemed to run without problem for tables, indexes, primary keys, defaults, and foreign keys. The sortOrder column is there only to ensure that table renames are performed last. Otherwise, your index renames would fail. This will only run on SQL Server 2005 or 2008. If you have any problems with the script, please let me know. :)

8Jun/1018

Check VLF Counts

Today I stumbled across a database with 87,302 VLF's. Yes, that's right... 87 THOUSAND. Most of our databases have a few dozen VLF's, but this was an old database that had grown to 1.5 TB and had the default autogrowth settings left in tact. How did we discover this? During a routine reboot of the server, this database took 30 minutes to recover, but there were no error messages or status messages in the log.

Now, this blog post is not about VLF's or why you should keep the number of VLF's to a small, manageable number -- although I hear under 50 is a good rule of thumb. No, the purpose of this blog post is to share a little script I wrote to check the number of VLF's each database uses:

Create Table #stage(
    FileID      int
  , FileSize    bigint
  , StartOffset bigint
  , FSeqNo      bigint
  , [Status]    bigint
  , Parity      bigint
  , CreateLSN   numeric(38)
);
 
Create Table #results(
    Database_Name   sysname
  , VLF_count       int 
);
 
Exec sp_msforeachdb N'Use ?; 
            Insert Into #stage 
            Exec sp_executeSQL N''DBCC LogInfo(?)''; 
 
            Insert Into #results 
            Select DB_Name(), Count(*) 
            From #stage; 
 
            Truncate Table #stage;'
 
Select * 
From #results
Order By VLF_count Desc;
 
Drop Table #stage;
Drop Table #results;

This script is low-impact and is safe to run on large, production databases during business hours. However, just be aware that it's using some undocumented commands.

For more information on VLF's, check out these excellent articles:

1Feb/106

Filtered Indexes Work-Around

Recently, I needed to create a stored procedure that queried a rather large table. The table has a filtered index on a date column, and it covers the query. However, the Query Optimizer was not using the index, which was increasing the execution time (not to mention IO!) by at least 10x. This wasn't the first time I've had the Optimizer fail to use a filtered index. Normally when this happens, I use a table hint to force the filtered index -- after I verify that it is indeed faster, of course. However, since this was a stored procedure, I was receiving the following error message whenever I tried to execute the proc:

Query processor could not produce a query plan because of the hints defined in this query. Resubmit the query without specifying any hints and without using SET FORCEPLAN.

SQL Server would not allow me to execute the stored procedure using the filtered index hint. If I removed the hint, it executed, but it used a different, non-covering and far more expensive index. For those of you not familiar with this issue, allow me to illustrate the problem.

First, create a table to play with and populate it with some bogus data:

Create Table dbo.filteredIndexTest
(
      myID   int Identity(1,3)
    , myDate smalldatetime 
    , myData char(100)
 
    Constraint PK_filteredIndexTest
        Primary Key Clustered(myID)
);
 
Set NoCount On;
Declare @date smalldatetime = '2010-01-01';
 
While @date < '2010-02-01'
Begin
 
    Insert Into dbo.filteredIndexTest
    (
          myDate
        , myData
    )
    Select @date
        , 'Date: ' + Convert(varchar(20), @date, 102);
 
    Set @date = DateAdd(minute, 1, @date);
 
End;
 
Select Count(*) From dbo.filteredIndexTest;

It looks like this will generate 44,640 rows of test data... plenty enough for our purposes. Now, let's create our filtered index and write a query that will use it:

Create NonClustered Index IX_filteredIndexTest_1
    On dbo.filteredIndexTest(myDate)
    Include (myData)
    Where myDate >= '2010-01-27';
 
Select Distinct myData
From dbo.filteredIndexTest
Where myDate >= '2010-01-28';

If you look at the execution plan for this query, you'll notice that the Optimizer is using the filtered index. Perfect! Now let's parameterize it.

Declare @myDate1 smalldatetime = '2010-01-28';
 
Select Distinct myData
From dbo.filteredIndexTest
Where myDate >= @myDate1;

Uh oh. Looking at the execution plan, we see that SQL Server is no longer using the filtered index. Instead, it's scanning the clustered index! Why is this? There's actually a good explanation for it. The reason is that I could, in theory, pass a date to my parameter that fell outside of the filtered date range. If that's the case, then SQL Server could not utilize the filtered index. Personally, I think it's a bug and SQL Server should identify whether or not a filtered index could be used based on the actual value submitted, but... that's a whole other blog post. :)

So what can we do? Well, dynamic SQL may be able to help us out in this case. Let's give it a go. First, let's try parameterized dynamic SQL.

Declare @mySQL1 nvarchar(2000)
    , @myParam nvarchar(2000) = '@p_myDate2 smalldatetime'
    , @myDate2 smalldatetime = '2010-01-28';
 
Set @mySQL1 = 'Select Distinct myData
              From dbo.filteredIndexTest
              Where myDate >= @p_myDate2';
 
Execute sp_executeSQL @mySQL1, @myParam, @p_myDate2 = @myDate2;

Looking at the execution plan, we see we're still scanning on the clustered index. This is because the parameterized dynamic SQL resolves to be the exact same query as the one above it. Let's try unparameterized SQL instead:

Declare @mySQL2 nvarchar(2000)
    , @myDate3 smalldatetime = '2010-01-28';
 
Set @mySQL2 = 'Select Distinct myData
              From dbo.filteredIndexTest
              Where myDate >= ''' + Cast(@myDate3 As varchar(20)) + '''';
 
Execute sp_executeSQL @mySQL2;
 
-- Drop Table dbo.filteredIndexTest;

Voila! We have a seek on our filtered index. Why? Because the statement resolves to be identical to our first query, where we hard-coded the date value in the WHERE clause.

Now, I want to stress this fact: you should always, ALWAYS use parameterized dynamic SQL whenever possible. Not only is it safer, but it's also faster, because it can reuse cached plans. But sometimes you just cannot accomplish the same tasks with it. This is one of those times. If you do end up needing to use unparameterized dynamic SQL as a work-around, please make sure you're validating your input, especially if you're interfacing with any sort of external source.

There's an even easier work-around for this problem that Dave (http://www.crappycoding.com) shared with me: recompile.

Adding "Option (Recompile)" to the end of your statements will force the Optimizer to re-evaluate which index will best meet the needs of your query every time the statement is executed. More importantly, it evaluates the plan based on the actual values passed to the parameter... just like in our hard-coded and dynamic SQL examples. Let's see it in action:

DECLARE @myDate4 SMALLDATETIME = '2010-01-28';
 
SELECT DISTINCT myData
FROM dbo.filteredIndexTest 
WHERE myDate >= @myDate4
OPTION (RECOMPILE);
 
DECLARE @myDate5 SMALLDATETIME = '2010-01-20';
 
SELECT DISTINCT myData
FROM dbo.filteredIndexTest 
WHERE myDate >= @myDate5
OPTION (RECOMPILE);

If we look at the execution plans for the 2 queries above, we see that the first query seeks on the filtered index, and the second query scans on the clustered index. This is because the second query cannot be satisfied with the filtered index because we initially limited our index to dates greater than or equal to 1/27/2010.

There are, of course, trade-offs associated with each approach, so use whichever one best meets your needs. Do you have another work-around for this issue? If so, please let me know. :)

Update:

Alex Kuznetsov (http://www.simple-talk.com/author/alex-kuznetsov/) shared this method too:

DECLARE @myDate1 SMALLDATETIME = '2010-01-28';
 
SELECT DISTINCT myData
FROM dbo.filteredIndexTest
WHERE myDate = @myDate1
AND myDate >= '2010-01-27';

Like the other examples, this will result in an index seek on the filtered index. Basically, by explicitly declaring the start date of your filter, you're letting the Optimizer know that the filtered index can satisfy the request, regardless of the parameter value passed. Thanks for the tip, Alex! :)