SQL Agent Job Script

I’ve been working on a script for a demo in my upcoming 24 Hours of PASS presentation, which I thought I’d share on my blog. In the presentation, I use this script to demonstrate how to execute one or more queries dynamically against a list of SQL Server instances.

The script itself explores SQL Agent Job metadata to get job statuses — when the job last ran, when it will run again, an aggregate count of the number of successful and failed executions in the queried time period, T-SQL code to disable the job, etc. I’ve only tested this in a handful of DEV and PROD environments, so please tell me if you experience any issues in your environment when running the script. As with most of my scripts, this will only work on SQL Server 2005 and newer.

I was planning to send the SQL Server development team a case of #bacon had they finally fixed the sysjob% date and time columns in SQL Server 2012, but alas, it seems they shall sadly remain pork-free. Credit for the logic that handles the time conversion goes to some awesome community member who posted it on the MSDN forum. I wish I had the URL, but it was a long time ago.

DECLARE @jobHistory TABLE
(
      job_id                UNIQUEIDENTIFIER
    , success               INT
    , cancel                INT
    , fail                  INT
    , retry                 INT
    , last_execution_id     INT
    , last_duration         CHAR(8)
    , last_execution_start  DATETIME
);
 
WITH lastExecution
AS
(
    SELECT job_id
    , MAX(instance_id) AS last_instance_id
FROM msdb.dbo.sysjobhistory
WHERE step_id = 0
GROUP BY job_id
)
 
INSERT INTO @jobHistory
SELECT sjh.job_id
    , SUM(CASE WHEN sjh.run_status = 1 AND step_id = 0 THEN 1 ELSE 0 END) AS success
    , SUM(CASE WHEN sjh.run_status = 3 AND step_id = 0 THEN 1 ELSE 0 END) AS cancel
    , SUM(CASE WHEN sjh.run_status = 0 AND step_id = 0 THEN 1 ELSE 0 END) AS fail
    , SUM(CASE WHEN sjh.run_status = 2 THEN 1 ELSE 0 END) AS retry
    , MAX(CASE WHEN sjh.step_id = 0 THEN instance_id ELSE NULL END) last_execution_id
    , SUBSTRING(CAST(MAX(CASE WHEN le.job_id IS NOT NULL THEN sjh.run_duration ELSE NULL END) + 1000000 AS VARCHAR(7)),2,2) + ':' 
            + SUBSTRING(CAST(MAX(CASE WHEN le.job_id IS NOT NULL THEN sjh.run_duration ELSE NULL END) + 1000000 AS VARCHAR(7)),4,2) + ':' 
            + SUBSTRING(CAST(MAX(CASE WHEN le.job_id IS NOT NULL THEN sjh.run_duration ELSE NULL END) + 1000000 AS VARCHAR(7)),6,2)
            AS last_duration
    , MAX(CASE WHEN le.last_instance_id IS NOT NULL THEN 
        CONVERT(datetime, RTRIM(run_date)) 
        + ((run_time / 10000 *  3600) 
        + ((run_time % 10000) / 100 * 60) 
        + (run_time  % 10000) % 100) / (86399.9964)
      ELSE '1900-01-01' END) AS last_execution_start 
FROM msdb.dbo.sysjobhistory AS sjh
LEFT JOIN lastExecution     AS le
    ON sjh.job_id = le.job_id
   AND sjh.instance_id = le.last_instance_id
GROUP BY sjh.job_id;
 
/* We need to parse the schedule into something we can understand */
DECLARE @weekDay TABLE (
      mask          INT
    , maskValue     VARCHAR(32)
);
 
INSERT INTO @weekDay
SELECT 1, 'Sunday'      UNION ALL
SELECT 2, 'Monday'      UNION ALL
SELECT 4, 'Tuesday'     UNION ALL
SELECT 8, 'Wednesday'   UNION ALL
SELECT 16, 'Thursday'   UNION ALL
SELECT 32, 'Friday'     UNION ALL
SELECT 64, 'Saturday';
 
 
/* Now let's get our schedule information */
WITH myCTE
AS(
    SELECT sched.name AS 'scheduleName'
        , sched.schedule_id
        , jobsched.job_id
        , CASE 
            WHEN sched.freq_type = 1 
                THEN 'Once' 
            WHEN sched.freq_type = 4 
                AND sched.freq_interval = 1 
                    THEN 'Daily'
            WHEN sched.freq_type = 4 
                THEN 'Every ' + CAST(sched.freq_interval AS VARCHAR(5)) + ' days'
            WHEN sched.freq_type = 8 THEN 
                REPLACE( REPLACE( REPLACE(( 
                    SELECT maskValue 
                    FROM @weekDay AS x 
                    WHERE sched.freq_interval & x.mask <> 0 
                    ORDER BY mask FOR XML RAW)
                , '"/><row maskValue="', ', '), '<row maskValue="', ''), '"/>', '') 
                + CASE 
                    WHEN sched.freq_recurrence_factor <> 0 
                        AND sched.freq_recurrence_factor = 1 
                            THEN '; weekly' 
                    WHEN sched.freq_recurrence_factor <> 0 
                        THEN '; every ' 
                + CAST(sched.freq_recurrence_factor AS VARCHAR(10)) + ' weeks' END
            WHEN sched.freq_type = 16 THEN 'On day ' 
                + CAST(sched.freq_interval AS VARCHAR(10)) + ' of every '
                + CAST(sched.freq_recurrence_factor AS VARCHAR(10)) + ' months' 
            WHEN sched.freq_type = 32 THEN 
                CASE 
                    WHEN sched.freq_relative_interval = 1 THEN 'First'
                    WHEN sched.freq_relative_interval = 2 THEN 'Second'
                    WHEN sched.freq_relative_interval = 4 THEN 'Third'
                    WHEN sched.freq_relative_interval = 8 THEN 'Fourth'
                    WHEN sched.freq_relative_interval = 16 THEN 'Last'
                END + 
                CASE 
                    WHEN sched.freq_interval = 1 THEN ' Sunday'
                    WHEN sched.freq_interval = 2 THEN ' Monday'
                    WHEN sched.freq_interval = 3 THEN ' Tuesday'
                    WHEN sched.freq_interval = 4 THEN ' Wednesday'
                    WHEN sched.freq_interval = 5 THEN ' Thursday'
                    WHEN sched.freq_interval = 6 THEN ' Friday'
                    WHEN sched.freq_interval = 7 THEN ' Saturday'
                    WHEN sched.freq_interval = 8 THEN ' Day'
                    WHEN sched.freq_interval = 9 THEN ' Weekday'
                    WHEN sched.freq_interval = 10 THEN ' Weekend'
                END
                + CASE 
                    WHEN sched.freq_recurrence_factor <> 0 
                        AND sched.freq_recurrence_factor = 1 
                            THEN '; monthly'
                    WHEN sched.freq_recurrence_factor <> 0 
                        THEN '; every ' 
                + CAST(sched.freq_recurrence_factor AS VARCHAR(10)) + ' months' 
                  END
            WHEN sched.freq_type = 64   THEN 'StartUp'
            WHEN sched.freq_type = 128  THEN 'Idle'
          END AS 'frequency'
        , ISNULL('Every ' + CAST(sched.freq_subday_interval AS VARCHAR(10)) + 
            CASE 
                WHEN sched.freq_subday_type = 2 THEN ' seconds'
                WHEN sched.freq_subday_type = 4 THEN ' minutes'
                WHEN sched.freq_subday_type = 8 THEN ' hours'
            END, 'Once') AS 'subFrequency'
        , REPLICATE('0', 6 - LEN(sched.active_start_time)) 
            + CAST(sched.active_start_time AS VARCHAR(6)) AS 'startTime'
        , REPLICATE('0', 6 - LEN(sched.active_end_time)) 
            + CAST(sched.active_end_time AS VARCHAR(6)) AS 'endTime'
        , REPLICATE('0', 6 - LEN(jobsched.next_run_time)) 
            + CAST(jobsched.next_run_time AS VARCHAR(6)) AS 'nextRunTime'
        , CAST(jobsched.next_run_date AS CHAR(8)) AS 'nextRunDate'
    FROM msdb.dbo.sysschedules      AS sched
    JOIN msdb.dbo.sysjobschedules   AS jobsched
        ON sched.schedule_id = jobsched.schedule_id
    WHERE sched.enabled = 1
)
 
/* Finally, let's look at our actual jobs and tie it all together */
SELECT CONVERT(NVARCHAR(128), SERVERPROPERTY('Servername'))             AS [serverName]
    , job.job_id                                                        AS [jobID]
    , job.name                                                          AS [jobName]
    , CASE WHEN job.enabled = 1 THEN 'Enabled' ELSE 'Disabled' END      AS [jobStatus]
    , COALESCE(sched.scheduleName, '(unscheduled)')                     AS [scheduleName]
    , COALESCE(sched.frequency, '')                                     AS [frequency]
    , COALESCE(sched.subFrequency, '')                                  AS [subFrequency]
    , COALESCE(SUBSTRING(sched.startTime, 1, 2) + ':' 
        + SUBSTRING(sched.startTime, 3, 2) + ' - ' 
        + SUBSTRING(sched.endTime, 1, 2) + ':' 
        + SUBSTRING(sched.endTime, 3, 2), '')                           AS [scheduleTime] -- HH:MM
    , COALESCE(SUBSTRING(sched.nextRunDate, 1, 4) + '/' 
        + SUBSTRING(sched.nextRunDate, 5, 2) + '/' 
        + SUBSTRING(sched.nextRunDate, 7, 2) + ' ' 
        + SUBSTRING(sched.nextRunTime, 1, 2) + ':' 
        + SUBSTRING(sched.nextRunTime, 3, 2), '')                       AS [nextRunDate]
      /* Note: the sysjobschedules table refreshes every 20 min, so nextRunDate may be out of date */
    , COALESCE(jh.success, 0)                                           AS [success]
    , COALESCE(jh.cancel, 0)                                            AS [cancel]
    , COALESCE(jh.fail, 0)                                              AS [fail]
    , COALESCE(jh.retry, 0)                                             AS [retry]
    , COALESCE(jh.last_execution_id, 0)                                 AS [lastExecutionID]
    , jh.last_execution_start                                           AS [lastExecutionStart]
    , COALESCE(jh.last_duration, '00:00:01')                            AS [lastDuration]
    , 'EXECUTE msdb.dbo.sp_update_job @job_id = ''' 
        + CAST(job.job_id AS CHAR(36)) + ''', @enabled = 0;'            AS [disableSQLScript]
FROM msdb.dbo.sysjobs               AS job
LEFT JOIN myCTE                     AS sched
    ON job.job_id = sched.job_id
LEFT JOIN @jobHistory               AS jh
    ON job.job_id = jh.job_id
WHERE job.enabled = 1 -- do not display disabled jobs
    --AND jh.last_execution_start >= DATEADD(day, -1, GETDATE()) /* Pull just the last 24 hours */
ORDER BY nextRunDate;

Example Output:

serverName       jobID                                 jobName                      jobStatus scheduleName                      frequency  subFrequency   scheduleTime   nextRunDate       success  cancel  fail  retry  lastExecutionID  lastExecutionStart       lastDuration  disableSQLScript
---------------- ------------------------------------- ---------------------------- --------- --------------------------------- ---------- -------------- -------------- ----------------- -------- ------- ----- ------ ---------------- ------------------------ ------------- ----------------------------------------------------------------------------------------------
SQLFOOL\SQL2012  180BFD8E-AE0C-44F8-992E-27522611992A  DW FactOrders Update         Enabled   Every 4 Hours                     Daily      Every 4 hours  00:00 - 23:59  2012/09/19 20:00  12       0       0     0      84               2012-09-17 14:00:01.000  00:00:04      EXECUTE msdb.dbo.sp_update_job @job_id = '180BFD8E-AE0C-44F8-992E-27522611992A', @enabled = 0;
SQLFOOL\SQL2012  3470C9E5-A2CD-454A-89A1-DEF55FF186D3  SSIS Server Maintenance Job  Enabled   SSISDB Scheduler                  Daily      Once           00:00 - 12:00  2012/09/20 00:00  3        0       0     0      68               2012-09-17 11:50:51.000  00:00:01      EXECUTE msdb.dbo.sp_update_job @job_id = '3470C9E5-A2CD-454A-89A1-DEF55FF186D3', @enabled = 0;
SQLFOOL\SQL2012  F965B24D-60EB-4B95-91B6-C7D66057A883  syspolicy_purge_history      Enabled   syspolicy_purge_history_schedule  Daily      Once           02:00 - 23:59  2012/09/20 02:00  3        0       1     1      70               2012-09-17 11:50:51.000  00:01:24      EXECUTE msdb.dbo.sp_update_job @job_id = 'F965B24D-60EB-4B95-91B6-C7D66057A883', @enabled = 0;

The SQL Server DBA’s Guide to Teradata

6 months ago, I returned from maternity leave only to be handed the proverbial keys to a new database kingdom: Teradata. Being a long-time devotee of all things Microsoft SQL Server, I had quite a bit of learning to do to get up to speed on this new environment. In an effort to save others from the same headaches that plagued me the first few months, I have created a new guide to aid those new to the Teradata platform. I will primarily focus on what I have found most lacking: examples of how to convert SQL Server T-SQL to Teradata SQL. Rather than create a series of posts with random tidbits, I thought it would make more sense to create a single page for this guide and update it incrementally. You can find the very beginnings of this guide here: The SQL Server DBA’s Guide to Teradata. Good luck, and happy converting. :)

TVP Permissions

Here’s just a short post to feed the search engine gerbils. So today, I created a stored procedure that accepted a list of ID’s and filtered them. The most efficient way to handle this was using table-valued parameters (TVPs). My code looks similiar to this:

USE AdventureWorks2012;
GO
 
/* Create our new table type */
IF NOT EXISTS(SELECT * FROM sys.types WHERE name = 'CustomerList')
BEGIN
    CREATE TYPE dbo.CustomerList AS TABLE 
    ( 
          CustomerID INT
        , ID         INT
    );
END;
 
IF OBJECTPROPERTY(OBJECT_ID('dbo.CustomerTerritoryFilterGet_sp')
, N'IsProcedure') IS NOT NULL
    DROP PROCEDURE dbo.CustomerTerritoryFilterGet_sp;
GO
 
/* Create a proc to test with */
CREATE PROCEDURE dbo.CustomerTerritoryFilterGet_sp
      @Customers        CustomerList READONLY
    , @TerritoryFilter  INT
 
AS
BEGIN
 
    SELECT DISTINCT sc.CustomerID, c.ID
    FROM Sales.Customer AS sc
    JOIN @Customers AS c
        ON sc.CustomerID = c.CustomerID
    WHERE sc.TerritoryID = @TerritoryFilter;
 
    RETURN 0;
END
GO
 
/* Test our stored procedure */
DECLARE @myList CustomerList;
INSERT INTO @myList 
VALUES (1, 10), (2, 20), (3, 30), (4, 40), (5, 50), (6, 60), (7, 70);
 
EXECUTE dbo.CustomerTerritoryFilterGet_sp
      @Customers        = @myList
    , @TerritoryFilter  = 1;
GO
 
GRANT EXECUTE ON dbo.CustomerTerritoryFilterGet_sp To myApp;
GO

This would typically be sufficient for most stored procedures. However, my app dev was getting the following error:

The EXECUTE permission was denied on the object 'customerList', database 'AdventureWorks2012', schema 'dbo'.

Wait, we need to explicitly grant permissions to the new data type? Okay, that’s easy enough… let’s just run a quick GRANT statement:

GRANT EXECUTE ON dbo.CustomerList To myApp;
Msg 15151, Level 16, State 1, Line 1
Cannot find the object 'CustomerList', because it does not exist or you do not have permission.

Okay, now I’m confused… what am I missing? It turns out, you need to use a slightly different syntax to grant permissions to a new data type. This syntax, which explicitly tells SQL Server that we’re granting permissions on a Type class, works just fine:

GRANT EXECUTE ON TYPE::dbo.CustomerList To myApp;
 
/* Time to clean up! */
-- DROP PROC dbo.CustomerTerritoryFilterGet_sp;
-- DROP TYPE dbo.CustomerList;
Command(s) completed successfully.

You can find the full syntax here on Books Online:
http://msdn.microsoft.com/en-US/library/ms174346(v=SQL.90).aspx

Tidbits I Discovered Today…

I’ve figured out a couple of tidbits today that I wanted to share. First and foremost, I’ve discovered a (new to me) option in SSMS to convert tabs to spaces. I had previously seen the “Untabify Selected Lines” option (under Edit –> Advanced), but this only seemed to remove the tabs at the beginning of the line; it would not remove in-line tabs. I’ve now found another option that will remove the tabs throughout the selected code.

Here’s how you can do it:

Disclaimer: I’m using SSMS 2008 and have not confirmed this in other versions of SSMS.

From your toolbar, right-click in the empty gray space, or click on the drop-down arrow. Navigate to Customize:

Click on the Commands tab, then navigate down to Edit –> Convert Tabs to Spaces.
Note: if you’re one of those weirdos who like tabs instead of spaces, you can convert in the other (wrong) direction, too. ;)

Once you have found the option you want, click on it and drag it to your toolbar.


Cool, huh?

Another little thing I ran into today was a mismatch between nullable and non-nullable columns in related tables. I started to perform a manual audit before I said, “There has to be a faster way.” And you know what? There was. Here’s the little script I used:

SELECT c1.name AS [columnName]
    , c1.is_nullable AS [FactInternetSales_isNullable]
    , c2.is_nullable AS [FactResellerSales_isNullable]
/* replace AdventureWorksDW2008R2 with your database */
FROM AdventureWorksDW2008R2.sys.columns AS c1 
JOIN AdventureWorksDW2008R2.sys.columns AS c2
    ON c1.name = c2.name
WHERE c1.object_id = object_id('dbo.FactInternetSales') -- replace with your table
    AND c2.object_id = object_id('dbo.FactResellerSales') -- replace with your table
    AND c1.is_nullable <> c2.is_nullable; -- find discrepancies

This returned the following results:

columnName             FactInternetSales_isNullable FactResellerSales_isNullable
---------------------- ---------------------------- ----------------------------
RevisionNumber         0                            1
OrderQuantity          0                            1
UnitPrice              0                            1
ExtendedAmount         0                            1
UnitPriceDiscountPct   0                            1
DiscountAmount         0                            1
ProductStandardCost    0                            1
TotalProductCost       0                            1
SalesAmount            0                            1
TaxAmt                 0                            1
Freight                0                            1
 
(11 row(s) affected)

This script isn’t as polished as my usual, but it’s really just a quick-and-dirty way of checking for mismatches in NULL properties.

Okay, that’s all I have for now, my SQL friends. Oh, and if you know any other cool little tidbits, please feel free to share with me. :)

BCP Script Generator

I’m currently working on the logic migration of data marts from SQL Server to Teradata. While another team is working on the actual table migration, it’s still helpful to have some data in the tables for developing against. The easiest method I’ve found to do this is to use BCP to export some sample data. So of course, I’ve created a SQL script that will generate the BCP code for me. Because that’s what I like to do on Sunday evenings.

-- User-defined variables --
 
DECLARE @tableToBCP NVARCHAR(128)   = 'AdventureWorksDW2008R2.dbo.DimCustomer'
    , @Top          VARCHAR(10)     = NULL -- Leave NULL for all rows
    , @Delimiter    VARCHAR(4)      = '|'
    , @UseNULL      BIT             = 1
    , @OverrideChar CHAR(1)         = '~'
    , @MaxDop       CHAR(1)         = '1'
    , @Directory    VARCHAR(256)    = 'C:\bcp_output\';
 
 
-- Script-defined variables -- 
 
DECLARE @columnList TABLE (columnID INT);
 
DECLARE @bcpStatement NVARCHAR(MAX) = 'BCP "SELECT '
    , @currentID INT
    , @firstID INT;
 
INSERT INTO @columnList
SELECT column_id 
FROM sys.columns 
WHERE object_id = OBJECT_ID(@tableToBCP)
ORDER BY column_id;
 
IF @Top IS NOT NULL
    SET @bcpStatement = @bcpStatement + 'TOP (' + @Top + ') ';
 
SELECT @firstID = MIN(columnID) FROM @columnList;
 
WHILE EXISTS(SELECT * FROM @columnList)
BEGIN
 
    SELECT @currentID = MIN(columnID) FROM @columnList;
 
    IF @currentID <> @firstID
        SET @bcpStatement = @bcpStatement + ',';
 
    SELECT @bcpStatement = @bcpStatement + name
    FROM sys.columns 
    WHERE object_id = OBJECT_ID(@tableToBCP)
        AND column_id = @currentID;
 
    DELETE FROM @columnList WHERE columnID = @currentID;
 
 
END;
 
SET @bcpStatement = @bcpStatement + ' FROM ' + @tableToBCP 
    + ' WITH (NOLOCK) OPTION (MAXDOP 1);" queryOut '
    + @Directory + REPLACE(@tableToBCP, '.', '_') + '.dat -S' + @@SERVERNAME
    + ' -T -t"' + @Delimiter + '" -c -C;'
 
SELECT @bcpStatement;

This will generate a standard BCP script:

BCP "SELECT CustomerKey,GeographyKey,CustomerAlternateKey,Title,FirstName,MiddleName,
LastName,NameStyle,BirthDate,MaritalStatus,Suffix,Gender,EmailAddress,YearlyIncome,
TotalChildren,NumberChildrenAtHome,EnglishEducation,SpanishEducation,FrenchEducation,
EnglishOccupation,SpanishOccupation,FrenchOccupation,HouseOwnerFlag,NumberCarsOwned,
AddressLine1,AddressLine2,Phone,DateFirstPurchase,CommuteDistance 
FROM AdventureWorksDW2008R2.dbo.DimCustomer WITH (NOLOCK) OPTION (MAXDOP 1);" queryOut 
C:\bcp_output\AdventureWorksDW2008R2_dbo_DimCustomer.dat -SSQLFOOL -T -t"|" -c -C;

However, I’ve been running into some issues with the data load. See, the thing… I’m finding that some of my tables don’t… that is to say… they may possibly contain… [whisper]uncleansed data[/whisper]. I know, I know… this may come as a shock to many of you, and all I ask is that you please don’t judge me for it. ;)

What do I mean by “uncleansed data?” I mostly mean user-inputted VARCHAR columns that contain pipes (|), tabs, carriage returns, and line feeds. These types of characters tend to mess with the data import process. Also, I’ve not yet found a way to import a data file into Teradata where a non-nullable character column contains an empty string (”). Obviously, the vast majority of the data is fine, but even one of these issues can throw an error during the import process. I’ve modified the script above to handle these specific exceptions.

Since I’m only using this data for testing purposes, I found it pretty easy to simply replace the offending records with ~. I’m not sure if anyone else has a need for this particular script, but I figured you could modify it pretty easily to do whatever you need.

-- User-defined variables --
 
DECLARE @tableToBCP NVARCHAR(128)   = 'AdventureWorksDW2008R2.dbo.DimCustomer'
    , @Top          VARCHAR(10)     = NULL -- Leave NULL for all rows
    , @Delimiter    VARCHAR(4)      = '|'
    , @UseNULL      BIT             = 1
    , @OverrideChar CHAR(1)         = '~'
    , @MaxDop       CHAR(1)         = '1'
    , @Directory    VARCHAR(256)    = 'C:\bcp_output\';
 
 
-- Script-defined variables -- 
 
DECLARE @columnList TABLE (columnID INT);
 
DECLARE @bcpStatement NVARCHAR(MAX) = 'BCP "SELECT '
    , @currentID INT
    , @firstID INT;
 
INSERT INTO @columnList
SELECT column_id 
FROM sys.columns 
WHERE object_id = OBJECT_ID(@tableToBCP)
ORDER BY column_id;
 
IF @Top IS NOT NULL
    SET @bcpStatement = @bcpStatement + 'TOP (' + @Top + ') ';
 
SELECT @firstID = MIN(columnID) FROM @columnList;
 
WHILE EXISTS(SELECT * FROM @columnList)
BEGIN
 
    SELECT @currentID = MIN(columnID) FROM @columnList;
 
    IF @currentID <> @firstID
        SET @bcpStatement = @bcpStatement + ',';
 
    SELECT @bcpStatement = @bcpStatement + 
                            CASE 
                                WHEN user_type_id IN (231, 167, 175, 239) 
                                THEN 'CASE WHEN ' + name + ' = '''' THEN ' 
                                    + CASE 
                                        WHEN is_nullable = 1 THEN 'NULL' 
                                        ELSE '''' + REPLICATE(@OverrideChar, max_length) + ''''
                                      END
                                    + ' WHEN ' + name + ' LIKE ''%' + @Delimiter + '%'''
                                        + ' OR ' + name + ' LIKE ''%'' + CHAR(9) + ''%''' -- tab
                                        + ' OR ' + name + ' LIKE ''%'' + CHAR(10) + ''%''' -- line feed
                                        + ' OR ' + name + ' LIKE ''%'' + CHAR(13) + ''%''' -- carriage return
                                        + ' THEN ' 
                                        + CASE 
                                            WHEN is_nullable = 1 THEN 'NULL' 
                                            ELSE '''' + REPLICATE(@OverrideChar, max_length) + ''''
                                          END
                                    + ' ELSE ' + name + ' END' 
                                ELSE name 
                            END 
    FROM sys.columns 
    WHERE object_id = OBJECT_ID(@tableToBCP)
        AND column_id = @currentID;
 
    DELETE FROM @columnList WHERE columnID = @currentID;
 
 
END;
 
SET @bcpStatement = @bcpStatement + ' FROM ' + @tableToBCP 
    + ' WITH (NOLOCK) OPTION (MAXDOP 1);" queryOut '
    + @Directory + REPLACE(@tableToBCP, '.', '_') + '.dat -S' + @@SERVERNAME
    + ' -T -t"' + @Delimiter + '" -c -C;'
 
SELECT @bcpStatement;

The sample output of this would look like:

BCP "SELECT CustomerKey,GeographyKey,CASE WHEN CustomerAlternateKey = '' THEN 
'~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~' WHEN CustomerAlternateKey LIKE '%|%' OR 
CustomerAlternateKey LIKE '%' + CHAR(9) + '%' OR CustomerAlternateKey LIKE 
'%' + CHAR(10) + '%' OR CustomerAlternateKey LIKE '%' + CHAR(13) + '%' 
THEN '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~' ELSE CustomerAlternateKey END,
CASE WHEN Title = '' THEN NULL WHEN Title LIKE '%|%' OR Title LIKE '%' + CHAR(9)
 + '%' OR Title LIKE '%' + CHAR(10) + '%' OR Title LIKE '%' + CHAR(13) + '%' 
THEN NULL ELSE Title END,CASE WHEN FirstName = '' THEN NULL WHEN FirstName 
LIKE '%|%' OR FirstName LIKE '%' + CHAR(9) + '%' OR FirstName LIKE '%' + 
CHAR(10) + '%' OR FirstName LIKE '%' + CHAR(13) + '%' THEN NULL ELSE 
FirstName END,CASE WHEN MiddleName = '' THEN NULL WHEN MiddleName LIKE '%|%'
 OR MiddleName LIKE '%' + CHAR(9) + '%' OR MiddleName LIKE '%' + CHAR(10) +
 '%' OR MiddleName LIKE '%' + CHAR(13) + '%' THEN NULL ELSE MiddleName END,
CASE WHEN LastName = '' THEN NULL WHEN LastName LIKE '%|%' OR LastName LIKE
 '%' + CHAR(9) + '%' OR LastName LIKE '%' + CHAR(10) + '%' OR LastName LIKE
 '%' + CHAR(13) + '%' THEN NULL ELSE LastName END,NameStyle,BirthDate,CASE 
WHEN MaritalStatus = '' THEN NULL WHEN MaritalStatus LIKE '%|%' OR 
MaritalStatus LIKE '%' + CHAR(9) + '%' OR MaritalStatus LIKE '%' + CHAR(10) 
+ '%' OR MaritalStatus LIKE '%' + CHAR(13) + '%' THEN NULL ELSE MaritalStatus 
END,CASE WHEN Suffix = '' THEN NULL WHEN Suffix LIKE '%|%' OR Suffix LIKE '%' 
+ CHAR(9) + '%' OR Suffix LIKE '%' + CHAR(10) + '%' OR Suffix LIKE '%' + 
CHAR(13) + '%' THEN NULL ELSE Suffix END,CASE WHEN Gender = '' THEN NULL 
WHEN Gender LIKE '%|%' OR Gender LIKE '%' + CHAR(9) + '%' OR Gender LIKE '%' 
+ CHAR(10) + '%' OR Gender LIKE '%' + CHAR(13) + '%' THEN NULL ELSE Gender 
END,CASE WHEN EmailAddress = '' THEN NULL WHEN EmailAddress LIKE '%|%' OR 
EmailAddress LIKE '%' + CHAR(9) + '%' OR EmailAddress LIKE '%' + CHAR(10) + 
'%' OR EmailAddress LIKE '%' + CHAR(13) + '%' THEN NULL ELSE EmailAddress END,
YearlyIncome,TotalChildren,NumberChildrenAtHome, CASE WHEN EnglishEducation = '' 
THEN NULL WHEN EnglishEducation LIKE '%|%' OR 
EnglishEducation LIKE '%' + CHAR(9) + '%' OR EnglishEducation LIKE '%' + 
CHAR(10) + '%' OR EnglishEducation LIKE '%' 
+ CHAR(13) + '%' THEN NULL ELSE EnglishEducation END,CASE WHEN 
SpanishEducation = '' THEN NULL WHEN SpanishEducation LIKE '%|%' OR 
SpanishEducation LIKE '%' + CHAR(9) + '%' OR SpanishEducation LIKE '%' + 
CHAR(10) + '%' OR SpanishEducation LIKE '%' + CHAR(13) + '%' THEN NULL 
ELSE SpanishEducation END,CASE WHEN FrenchEducation = '' THEN NULL WHEN 
FrenchEducation LIKE '%|%' OR FrenchEducation LIKE '%' + CHAR(9) + '%' 
OR FrenchEducation LIKE '%' + CHAR(10) + '%' OR FrenchEducation LIKE '%' 
+ CHAR(13) + '%' THEN NULL ELSE FrenchEducation END,CASE WHEN 
EnglishOccupation = '' THEN NULL WHEN EnglishOccupation LIKE '%|%' OR 
EnglishOccupation LIKE '%' + CHAR(9) + '%' OR EnglishOccupation LIKE '%' 
+ CHAR(10) + '%' OR EnglishOccupation LIKE '%' + CHAR(13) + '%' THEN 
NULL ELSE EnglishOccupation END,CASE WHEN SpanishOccupation = '' THEN 
NULL WHEN SpanishOccupation LIKE '%|%' OR SpanishOccupation LIKE '%' 
+ CHAR(9) + '%' OR SpanishOccupation LIKE '%' + CHAR(10) + '%' OR 
SpanishOccupation LIKE '%' + CHAR(13) + '%' THEN NULL ELSE SpanishOccupation 
END,CASE WHEN FrenchOccupation = '' THEN NULL WHEN FrenchOccupation LIKE 
'%|%' OR FrenchOccupation LIKE '%' + CHAR(9) + '%' OR FrenchOccupation 
LIKE '%' + CHAR(10) + '%' OR FrenchOccupation LIKE '%' + CHAR(13) + '%' 
THEN NULL ELSE FrenchOccupation END,CASE WHEN HouseOwnerFlag = '' THEN 
NULL WHEN HouseOwnerFlag LIKE '%|%' OR HouseOwnerFlag LIKE '%' + CHAR(9) 
+ '%' OR HouseOwnerFlag LIKE '%' + CHAR(10) + '%' OR HouseOwnerFlag LIKE 
'%' + CHAR(13) + '%' THEN NULL ELSE HouseOwnerFlag END,NumberCarsOwned,CASE 
WHEN AddressLine1 = '' THEN NULL WHEN AddressLine1 LIKE '%|%' OR AddressLine1 
LIKE '%' + CHAR(9) + '%' OR AddressLine1 LIKE '%' + CHAR(10) + '%' OR 
AddressLine1 LIKE '%' + CHAR(13) + '%' THEN NULL ELSE AddressLine1 END,CASE 
WHEN AddressLine2 = '' THEN NULL WHEN AddressLine2 LIKE '%|%' OR AddressLine2 
LIKE '%' + CHAR(9) + '%' OR AddressLine2 LIKE '%' + CHAR(10) + '%' OR 
AddressLine2 LIKE '%' + CHAR(13) + '%' THEN NULL ELSE AddressLine2 END,CASE 
WHEN Phone = '' THEN NULL WHEN Phone LIKE '%|%' OR Phone LIKE '%' + CHAR(9) 
+ '%' OR Phone LIKE '%' + CHAR(10) + '%' OR Phone LIKE '%' + CHAR(13) + '%' 
THEN NULL ELSE Phone END,DateFirstPurchase,CASE WHEN CommuteDistance = '' 
THEN NULL WHEN CommuteDistance LIKE '%|%' OR CommuteDistance LIKE '%' + 
CHAR(9) + '%' OR CommuteDistance LIKE '%' + CHAR(10) + '%' OR CommuteDistance 
LIKE '%' + CHAR(13) + '%' THEN NULL ELSE CommuteDistance END 
FROM AdventureWorksDW2008R2.dbo.DimCustomer WITH (NOLOCK) OPTION (MAXDOP 1);" queryOut 
C:\bcp_output\AdventureWorksDW2008R2_dbo_DimCustomer.dat -SSQLFOOL -T -t"|" -c -C;

I don’t know about you, but that would take me a bit longer to write manually than to execute the script above. :)

One note: I’ve found that copying this code into a batch file will actually render the CHAR functions, i.e. CHAR(13) will be replaced with a carriage return in the script. To avoid this, copy and paste the BCP script directly into your command window.

Not familiar with BCP? Refer to my blog post on BCP Basics to help get you started.

A question for my blog readers…

My role at work has recently changed. While I will continue to be working with SQL Server, I will also be spending a good chunk of my time working in our newly acquired Teradata environment. My plan is to blog about both as time permits, but I am trying to determine whether I should create a separate blog for the Teradata content. So my question for you, dear reader, is whether you think I should blog on this site about both SQL Server *and* Teradata, or whether I should separate the content? Any input is appreciated.

Are You Approaching Your Partition Range Limits?

In my post last week, How To Estimate Data Utilization, I said that it may be my last post for a while. Well… apparently I lied. :)

For those of you who use table partitioning, you know that you need to define a partitioning scheme and function prior to applying partitioning to an index. Personally, I tend to build the function for a couple of years out, and I tend to create them through the end of a calendar year. Now, if I failed to expand a partition range at the end of the year, then come January 1st, all of my data would be written to the same partition. Not the end of the world, no, but it causes all kinds of nasty performance and maintenance issues. Thus, as part of my end-of-year / maternity-leave preparations, I’m in the process of examining all partitioned functions to identify those that need to have their partition ranges expanded. For those interested, here’s the script I used:

IF OBJECT_ID('tempdb..#Results') IS NOT NULL
    DROP TABLE #Results;
 
CREATE TABLE #Results
(
      databaseName  NVARCHAR(128)
    , schemaName    NVARCHAR(128)
    , functionName  NVARCHAR(128)
    , data_space_id INT
    , maxRangeValue SQL_VARIANT
)
 
/* Grab results for each database and store in our temp table.  
   And no, I don't *need* to select from sys.indexes and perform 
   left joins, but I'm overly cautious and want to make sure 
   I'm not accidentally missing any databases. :) */
 
--EXECUTE master.dbo.sp_msforeachdb
EXECUTE sp_foreachdb 'USE ?;
INSERT INTO #Results
SELECT DB_NAME() AS databaseName
    , sps.name AS schemaName
    , spf.name AS functionName
    , sps.data_space_id 
    , MAX(prv.value) AS maxRangeValue
FROM sys.indexes AS i
LEFT JOIN sys.partition_schemes AS sps WITH (NOLOCK)
    ON i.data_space_id = sps.data_space_id
LEFT JOIN sys.partition_functions AS spf WITH (NOLOCK)
    ON sps.function_id = spf.function_id
LEFT JOIN sys.partition_range_values AS prv WITH (NOLOCK)
    ON spf.function_id = prv.function_id
GROUP BY sps.name
    , spf.name
    , sps.data_space_id;';
/*  
    sp_foreachdb was written by SQL MVP Aaron Bertrand and can be downloaded 
    at http://www.mssqltips.com/sqlservertip/2201/making-a-more-reliable-and-flexible-spmsforeachdb/
    Alternatively, you can also use sys.sp_MSforeachdb
*/
 
/* Make sure we're not missing any major databases */
SELECT * FROM sys.databases WHERE name NOT IN (SELECT databaseName FROM #Results);
 
/* Retrieve our results */
SELECT * 
FROM #Results
WHERE schemaName IS NOT NULL
ORDER BY maxRangeValue;

Example Results:

databaseName        schemaName                      functionName                          data_space_id   maxRangeValue
------------------- ------------------------------- ------------------------------------- --------------- -------------------------
HistoricalMart      dailyRangeDate_ps               dailyRangeDate_pf                     65609           2011-12-31 00:00:00.000
AdventureWorks      yearlyRangeSmallDateTime_ps     yearlyRangeSmallDateTime_pf           65605           2012-01-01 00:00:00.000
dbaTools            monthlyRangeDateTime_ps         monthlyRangeDateTime_pf               65604           2012-12-01 00:00:00.000

How To Estimate Data Utilization

Recently, on a conference call presenting data growth rates and database capacity projections, I had a top-line executive ask, “But how much of that data are we actually using today?” The question was met with silence; unless you have rigorous auditing in place — and kudos to you if you do — it’s a difficult question to answer. But it begs the question, is there some way to gleam this information from SQL Server? I think the answer is “yes,” if you make some assumptions and understand what you’re looking at.

SQL Server collects stats about every time an index is used and how it is used (i.e. whether a user seeked or scanned the index, etc.). It also provides a DMV to view these stats: sys.dm_db_index_usage_stats.

This DMV provides a wealth of great information, but to answer our question of “What data is actually being used?”, we have to refine our criteria. Are we talking in terms of table counts or data size? I’d argue that data size is more important than table counts; one unqueried millow-row table is more wasteful than a hundred ten-row tables.

Also, are we looking at indexes or content? From a database perspective, I’m more interested in indexes: how much space are we wasting on unused indexes? To identify this, I need to look at the activity on each individual index.

From a business perspective, I would be more interested in content (i.e. tables): how much business information is being stored that no one is even looking at? To answer this question, I need to roll up all index usage to see if *any* of the indexes on a table were used. Since both were of interest to me, I decided to write queries to answer both questions.

Lastly, we need to understand the flaws with this data. Chiefly, I cannot tell whether a user requested one row from a million-row table, or if [s]he needed all of the data in the table. This is a pretty important issue, especially with large historical data stores, and it’s where I have to make the biggest assumption: if even one person looked at one row in the table, I count all pages in the table as having been accessed.

Now, you may make different decisions than I did above, and that’s fine… each environment and project has different needs. But these assumptions are very important to understanding the output of the query below:

USE master;
GO
 
/* 
    This will give you an approximation of how much data is being utilized on a server.
    Since the data is only valid as of the last server reboot, we should start off with
    an idea of how much data we've accrued.  
*/
 
/* Find out when the server was last rebooted */
-- 2008
SELECT sqlserver_start_time FROM sys.dm_os_sys_info;
-- 2005
SELECT create_date FROM sys.databases WHERE name = 'tempdb';
 
 
/* Create a temporary table to hold our data, since we're going to iterate through databases */
IF OBJECT_ID('tempdb..#Results') IS NOT NULL
    DROP TABLE #Results;
 
CREATE TABLE #Results
(
      databaseName  NVARCHAR(128)
    , tableName     NVARCHAR(128)
    , indexID       INT
    , records       BIGINT
    , activity      BIGINT
    , totalPages    BIGINT
);
 
/*  
    sp_foreachdb was written by SQL MVP Aaron Bertrand and can be downloaded 
    at http://www.mssqltips.com/sqlservertip/2201/making-a-more-reliable-and-flexible-spmsforeachdb/
    Alternatively, you can also use sys.sp_MSforeachdb
*/
--EXECUTE master.dbo.sp_foreachdb
EXECUTE sys.sp_MSforeachdb
'   USE ?; 
 
    -- You can gleam a lot of information about historical data usage from partitions
    -- but for now, we will just roll up any partitions we may have
    WITH myCTE AS
    (
        SELECT p.[object_id] AS objectID
            , p.index_id
            , SUM(p.[rows]) AS records
            , SUM(au.total_pages) AS totalPages
        FROM sys.partitions AS p WITH (NOLOCK)
        JOIN sys.allocation_units AS au WITH (NOLOCK)
            ON p.hobt_id = au.container_id
        GROUP BY p.[object_id] 
            , p.index_id
    )
 
    -- Grab all tables and join to our usage stats DMV
    INSERT INTO #Results
    SELECT DB_NAME() AS databaseName
        , t.name
        , x.index_id
        , MAX(x.records) AS records
        , ISNULL(SUM(us.user_lookups + us.user_scans + us.user_seeks), 0) AS activity
        , SUM(x.totalPages) AS totalPages
    FROM sys.tables AS t WITH (NOLOCK)
    JOIN myCTE AS x
        ON t.[object_id] = x.objectID
    LEFT JOIN sys.dm_db_index_usage_stats AS us WITH (NOLOCK)
        ON us.[object_id] = x.objectID
        AND us.index_id = x.index_id
        AND us.database_id = DB_ID()
    GROUP BY t.name
    , x.index_id;'
 
/* Because we're looping through databases, make sure we're not missing any major ones */
SELECT * FROM sys.databases WHERE name NOT IN (SELECT databaseName FROM #Results);
 
/* Retrieve actual % data utilization, which is performed at the index level */
SELECT databaseName
    , SUM(queriedPages) AS TotalQueriedPages
    , SUM(totalPages) AS TotalPages
    , CAST(SUM(queriedPages) AS FLOAT) / REPLACE(SUM(totalPages), 0, 1) AS '%DataUtil'
FROM (
    SELECT databaseName
        , tableName
        , indexID
        , CASE -- If we have any activity at all on an index, count it as activity
            WHEN activity = 0 THEN 0.0
            ELSE totalPages
          END AS queriedPages
        , totalPages
    FROM #Results
    WHERE databaseName NOT IN ('master', 'tempdb', 'msdb', 'model')
) x
GROUP BY databaseName
ORDER BY databaseName;
 
/* Retrieve % content utilization, which is performed at the table level */
SELECT databaseName
    , SUM(queriedPages) AS TotalQueriedPages
    , SUM(totalPages) AS TotalPages
    , CAST(SUM(queriedPages) AS FLOAT) / REPLACE(SUM(totalPages), 0, 1) AS '%ContentUtil'
FROM (
    SELECT databaseName
        , tableName
        , MAX(records) AS records
        , CASE WHEN SUM(activity) > 0 THEN SUM(totalPages) ELSE 0 END AS queriedPages
        , SUM(totalPages) AS totalPages
    FROM #Results
    WHERE databaseName NOT IN ('master', 'tempdb', 'msdb', 'model')
    GROUP BY databaseName
        , tableName
) x
GROUP BY databaseName
ORDER BY databaseName;

Results:

databaseName               TotalQueriedPages   TotalPages           %DataUtil
-------------------------- ------------------- -------------------- ----------------------
Database1 		   127618701           130607247            0.969619893356378
Database2 		   567188              1614958              0.351209133612143
Database3 		   34269036            34579469             0.991022620966216
Database4 		   137970594           170733391            0.803399928206158
Database5 		   74632930            101543575            0.66909214627557
Database6 		   55809933            72884205             0.765734157938039
Database7 		   560810026           620609815            0.902175272517656
 
databaseName               TotalQueriedPages   TotalPages           %ContentUtil
-------------------------- ------------------- -------------------- ----------------------
Database1 		   127763715           130607247            0.970721679051682
Database2 		   571125              1614958              0.353646967908763
Database3 		   34269036            34579469             0.991022620966216
Database4 		   137970921           170733391            0.803399928206158
Database5 		   96144726            101543575            0.861947682777784
Database6 		   72269666            72884205             0.991568146820268
Database7 		   620525938           620609815            0.998240279711804

The first result set examines the utilization of indexes, and the second result set examines the utilization of data at the content (table) level. For example, if we look at Database6, we’ll see that we are only utilizing 77% of our indexes, but we’re looking at 99% of our table data. So this is a good indicator that we have some unused indexes to clean up in that database.

Know a better way to answer this question using SQL Server DMV’s? Please leave me a comment so I can learn from your experience. :)

In unrelated news, this may be my last blog post for a little while. I’m due with my second child a week from today and expect all of my free time to be consumed by him for a little while. That and, quite frankly, I do not trust myself near a computer, especially a database, in such a sleep-deprived state. :)

East Iowa SQL Saturday is Next Saturday! (Oct 1st)

East Iowa’s 3rd annual SQL Saturday is right around the corner! This year’s event will be on Saturday, October 1st. It’s being held in the same location as previous years, at the University of Iowa Capitol Centre in Iowa City.

We have a great mix of speakers this year, including nationally-recognized speakers such as Jason Strate and Ted Krueger. Topics include Denali, PowerPivot, High Availability, Disaster Recovery, PowerShell, SSIS, Analysis Services, Reporting Services, database tuning, and more. Don’t miss out on this great opportunity to network with SQL-savvy locals while receiving FREE training.

If you haven’t registered yet, please do so we can make sure we order enough food. Oh, and make sure to stick around for the after-party, too. The after-party is one of my favorite parts of the whole event. This year’s party will be held at The Mill, which is within walking distance of the conference center.

I hope to see you there! :)