The End of an Era – SQLFool scripts are now open-sourced

Hello, dear readers. It’s been a while.

For those who haven’t yet heard the news, I have switched to the dark side roles. I’m still kicking ass at GoDaddy and I’m still technical, but I now lead a team of big data engineers. As you can imagine, this means that I spend the majority of my time with Hadoop and its related ecosystem (Pig, Hive, Spark, etc.). This is very different than the performance-focused SQL Server role I had when I first created this blog. For those of you who make your living primarily or solely with SQL Server, please rest assured… SQL Server is still awesome, is still relevant, and still has a place in most enterprise data platform strategies. In fact, I’m presently working on egress automation of our Hadoop data warehouse to SQL Server for serving BI reports. Unfortunately, my change in technology focus means that my SQL Server blogging has largely come to an end, and with it, new posts on this blog. But while this blog may end, my blogging activities do not. For those interested, I have launched a new blog at that chronicles my adventures in a non-Microsoft world with Hadoop, open-source, and development on a Mac. :)

But back to the reason for this post. Recently, I saw a trackback comment from SQLBlimp that listed my index defrag script as #3 on its SQL Server Helpfulness Hall of Fame. This filled me with mixed emotions; while it made me happy that the script has been so helpful to so many people, I couldn’t help but feel sad because I knew I would no longer be keeping it updated. I have spent countless hours on this script, which has won numerous awards and has been mentioned in blogs, presentations, online publications, & books. It’s also still extremely popular; I stopped paying attention to its metrics when it broke 100,000 downloads 3 years ago. Similarly, I’ve been surprised to observe that, which broke 1 million visits in April 2014, now receives more web traffic today than it ever did at the height of my blogging activities.

For this reason, I have made two decisions. First, I will keep this blog online so that people can continue to stumble across blog posts that, while a couple of years old, are still largely relevant.

Secondly, I have decided to open source all of my SQL Server procedures & scripts on GitHub. I haven’t yet had a chance to upload all of my scripts, but I started with the most popular ones (based on traffic stats). These scripts are available in a single repository to make downloads easy. And, if you are so inclined, feel free to contribute back to the repo. For the index defrag script in particular, I have some automated regression tests that I can run against any pull requests to aid in merge approvals.

One final comment. This may be the end of my SQLFool era, but I hope to maintain the friendships I’ve made along this incredible journey. I wholeheartedly believe that the SQL Server community is the best damn technical community in the world. Thank you ALL for your camaraderie, friendship, help, support, and awesomeness. Keep kickin’ ass, my friends. :)



SqlDev PowerShell 102: Working with Data and Variables

In my previous post, SqlDev PowerShell 101: Getting Started with PowerShell, I walked through how to configure and use PowerShell to run basic commands. This post will cover working with data and variables in PowerShell. Specifically, I’m going to demonstrate how to connect to a database (i.e. SQL Server) using Invoke-Sqlcmd, retrieve data, store the data in a variable, and work with that variable.

Note: Invoke-Sqlcmd does require the use of the sqlps module. If you don’t have it installed yet, check here for more information.

From a new PowerShell window, write the following command:

Invoke-Sqlcmd -Query "SELECT DB_NAME();"

PowerShell Working with Data: PowerShell_102_01

Tip: You can auto-complete “Invoke-Sqlcmd” after typing “Invoke-S” and hitting TAB on your keyboard.

What’s happening here? Well, because no database instance was specified, the cmdlet is connecting to the database on the local computer and running the query specified. “Column1″ is returned just as it would be in SQL Server Management Studio (SSMS), because no column name was specified. You can see this by running:

Invoke-Sqlcmd -Query "SELECT DB_NAME() AS MyDatabaseName;"

PowerShell Working with Data: PowerShell_102_02

But what if you don’t have a local database instance, or you want to connect to another database? No problem. Just add the -ServerInstance parameter:

Invoke-Sqlcmd -Query "SELECT DB_NAME() AS MyDatabaseName;" -ServerInstance "."

“.” refers to the local, default SQL Server instance. If you want to connect to a named or remote instance, just replace this value with your server name.

Tip: Hit the “Up” arrow to retrieve the previous results, then just edit the command.

PowerShell Working with Data: PowerShell_102_03

So that’s all pretty easy so far, but honestly, if all I needed to do was run a simple query, I’d do it in SSMS. No, the real power of PowerShell is the .NET Framework. So let’s see this in action by assigning our results to a variable. A variable is defined using the dollar sign ($) and a name; defining a type is optional. Let’s do that now.

$DbName = Invoke-Sqlcmd -Query "SELECT DB_NAME() AS MyDatabaseName;" -ServerInstance "."

PowerShell Working with Data: PowerShell_102_04

In order to view the value stored in the variable, just append a new command to the previous line using a semicolon (;) and the variable’s name.

$DbName = Invoke-Sqlcmd -Query "SELECT DB_NAME() AS MyDatabaseName;" -ServerInstance "."; $DbName

PowerShell Working with Data: PowerShell_102_05

Optionally, you could also type the variable’s name and press Enter.


PowerShell Working with Data: PowerShell_102_06

Now let’s run a query that returns multiple columns:

$DbDetails = Invoke-Sqlcmd -Query "SELECT database_id, name, state_desc, recovery_model_desc FROM sys.databases WHERE name = 'master';" -ServerInstance "."; $DbDetails

PowerShell Working with Data: PowerShell_102_07

We didn’t define a data type, so let’s see what data type PowerShell assigned for us. To do this, we’ll use the GetType method:


PowerShell Working with Data: PowerShell_102_08

You can see that the variable is defined as a system object; specifically, it’s a data row. That’s important, because it determines how we’ll interact with the variable. For instance, what if we wanted to reference the value stored in the variable, not the variable as an object? You can do this by referencing the column stored in the object directly, i.e.


PowerShell Working with Data: PowerShell_102_09

Now let’s run the same query, but remove the database criteria from the WHERE statement:

$DbDetailsAll = Invoke-Sqlcmd -Query "SELECT database_id, name, state_desc, recovery_model_desc FROM sys.databases;" -ServerInstance "."; $DbDetailsAll

PowerShell Working with Data: PowerShell_102_10

Your results will vary, of course, but the important thing is that you should have more than 1 row returned.

Now let’s return just the database names again:


PowerShell Working with Data: PowerShell_102_11

Hopefully you’re starting to see how retrieving an array of values and accessing just a single attribute could be useful.

Let’s now iterate through our list of databases and perform an action. In this example, we’ll just iterate through each database and retrieve the count of tables. To do this, we’ll need to use a Foreach loop.

Foreach ($DatabaseRow In $DbDetailsAll) {$TableCount = Invoke-Sqlcmd -Query "SELECT COUNT(*) AS TableCount FROM sys.tables;" -ServerInstance "." -Database $; $TableCount }

PowerShell Working with Data: PowerShell_102_12

This is the biggest statement we’ve written yet, so let’s take a minute to break it down.

We’re starting the loop by calling Foreach, then we’re specifying how we’re iterating through the array. As each row in the array comes into focus, we’re storing the row in the $DatabaseRow variable that we’ve just defined. Everything specified inside the curly brackets ({}) is executing for each row in the array. We’re then connecting to the local instance and changing the context of the database to be the value currently stored in $DatabaseRow. The results are stored in a new variable, $TableCount, which is then outputted to the screen.

It’s actually a pretty short piece of code, given all that it’s doing. But it’s difficult to understand which TableCount maps to which table, so let’s add that information to our output:

Foreach ($DatabaseRow In $DbDetailsAll) {$TableCount = Invoke-Sqlcmd -Query "SELECT '$' AS DatabaseName, COUNT(*) AS TableCount FROM sys.tables;" -ServerInstance "." -Database $; $TableCount }

PowerShell Working with Data: PowerShell_102_13

Woah, that’s not what we want! PowerShell is displaying the object type, not the value. In order to actually retrieve the value from an object variable, we’re going to store the database name in a string variable first, then reference that newly-defined variable.

Foreach ($DatabaseRow In $DbDetailsAll) {$DatabaseName = $; $TableCount = Invoke-Sqlcmd -Query "SELECT '$DatabaseName' AS DatabaseName, COUNT(*) AS TableCount FROM sys.tables;" -ServerInstance "." -Database $; $TableCount }

PowerShell Working with Data: PowerShell_102_14

Voila! Now we have a list of databases on a server and the associated count of tables in each database.

If you’re wondering why you’d ever actually use this PowerShell script, well… you probably wouldn’t. But it’s important to have these building blocks before going into more advanced concepts. In my next post, we’ll be creating a config table of SQL Server instances, connecting to each server, retrieving details about missing indexes, and writing those results out to disk. That should be a little more useful. ;)

Until then… happy scripting! :)

Go Daddy Insiders

I don’t hide the fact that I work at Go Daddy. All discussions of advertising methods aside, it’s a great company to work for. Not only am I treated well as an employee, I also get to work in a world-class technical environment. However, the marketing campaigns tend to steal the spotlight. As a result, few people are aware of technology that it takes to be the #1 hosting provider in the world. Some examples of little-known facts about Go Daddy:

  • 10 billion DNS queries answered daily
  • Over 35,000 servers & 100,000 square feet of state-of-the-art global data centers
  • 25 petabytes — yes, petabytes! — of networked data storage

Pretty cool, huh? Go Daddy has launched a new blog called Inside Go Daddy as a way to share all the nitty gritty details of what it takes to support this kind of environment. Here’s a blurb from the site:

This is your inside source for what’s going on with Go Daddy’s tech experts. You’ll get insight and opinions from Go Daddy’s tech leaders on industry topics, company projects & open source initiatives … the leading edge, unconventional, “behind-the-scenes” information you won’t find anywhere else. It’s not PR, it’s not executive talk, it’s the story straight from Go Daddy’s developers, engineers & IT personnel.

Shockingly, I’ve signed up to blog about database scalability. ;) I’ve just started a new series that explores the tuning and design changes required to support 27k transactions per second during the airing of Go Daddy’s Super Bowl commercials. Those who attended my Summit 2009 session might recognize some of the high-level content, but this series will explore the topics in depth and with never-before-revealed detail. My first article, Scaling the Database: Data Types, is now live.

If you find the content helpful or interesting, please share the article or leave a comment. My employer monitors blog traffic, and we have a bit of a contest going on to see what topics get the most hits. Quite frankly, it’d be cool if the SQL Server topics outperformed the NoSQL topics. ;)

Also, I’ll entertain topic requests, so if there’s something you’re just dying to know about what we do or how we do it, let me know. :)

The SQL Server DBA’s Guide to Teradata

6 months ago, I returned from maternity leave only to be handed the proverbial keys to a new database kingdom: Teradata. Being a long-time devotee of all things Microsoft SQL Server, I had quite a bit of learning to do to get up to speed on this new environment. In an effort to save others from the same headaches that plagued me the first few months, I have created a new guide to aid those new to the Teradata platform. I will primarily focus on what I have found most lacking: examples of how to convert SQL Server T-SQL to Teradata SQL. Rather than create a series of posts with random tidbits, I thought it would make more sense to create a single page for this guide and update it incrementally. You can find the very beginnings of this guide here: The SQL Server DBA’s Guide to Teradata. Good luck, and happy converting. :)

TVP Permissions

Here’s just a short post to feed the search engine gerbils. So today, I created a stored procedure that accepted a list of ID’s and filtered them. The most efficient way to handle this was using table-valued parameters (TVPs). My code looks similiar to this:

USE AdventureWorks2012;
/* Create our new table type */
IF NOT EXISTS(SELECT * FROM sys.types WHERE name = 'CustomerList')
    CREATE TYPE dbo.CustomerList AS TABLE 
          CustomerID INT
        , ID         INT
IF OBJECTPROPERTY(OBJECT_ID('dbo.CustomerTerritoryFilterGet_sp')
, N'IsProcedure') IS NOT NULL
    DROP PROCEDURE dbo.CustomerTerritoryFilterGet_sp;
/* Create a proc to test with */
CREATE PROCEDURE dbo.CustomerTerritoryFilterGet_sp
      @Customers        CustomerList READONLY
    , @TerritoryFilter  INT
    SELECT DISTINCT sc.CustomerID, c.ID
    FROM Sales.Customer AS sc
    JOIN @Customers AS c
        ON sc.CustomerID = c.CustomerID
    WHERE sc.TerritoryID = @TerritoryFilter;
    RETURN 0;
/* Test our stored procedure */
DECLARE @myList CustomerList;
VALUES (1, 10), (2, 20), (3, 30), (4, 40), (5, 50), (6, 60), (7, 70);
EXECUTE dbo.CustomerTerritoryFilterGet_sp
      @Customers        = @myList
    , @TerritoryFilter  = 1;
GRANT EXECUTE ON dbo.CustomerTerritoryFilterGet_sp To myApp;

This would typically be sufficient for most stored procedures. However, my app dev was getting the following error:

The EXECUTE permission was denied on the object 'customerList', database 'AdventureWorks2012', schema 'dbo'.

Wait, we need to explicitly grant permissions to the new data type? Okay, that’s easy enough… let’s just run a quick GRANT statement:

GRANT EXECUTE ON dbo.CustomerList To myApp;
Msg 15151, Level 16, State 1, Line 1
Cannot find the object 'CustomerList', because it does not exist or you do not have permission.

Okay, now I’m confused… what am I missing? It turns out, you need to use a slightly different syntax to grant permissions to a new data type. This syntax, which explicitly tells SQL Server that we’re granting permissions on a Type class, works just fine:

GRANT EXECUTE ON TYPE::dbo.CustomerList To myApp;
/* Time to clean up! */
-- DROP PROC dbo.CustomerTerritoryFilterGet_sp;
-- DROP TYPE dbo.CustomerList;
Command(s) completed successfully.

You can find the full syntax here on Books Online:

Tidbits I Discovered Today…

I’ve figured out a couple of tidbits today that I wanted to share. First and foremost, I’ve discovered a (new to me) option in SSMS to convert tabs to spaces. I had previously seen the “Untabify Selected Lines” option (under Edit –> Advanced), but this only seemed to remove the tabs at the beginning of the line; it would not remove in-line tabs. I’ve now found another option that will remove the tabs throughout the selected code.

Here’s how you can do it:

Disclaimer: I’m using SSMS 2008 and have not confirmed this in other versions of SSMS.

From your toolbar, right-click in the empty gray space, or click on the drop-down arrow. Navigate to Customize:

Click on the Commands tab, then navigate down to Edit –> Convert Tabs to Spaces.
Note: if you’re one of those weirdos who like tabs instead of spaces, you can convert in the other (wrong) direction, too. ;)

Once you have found the option you want, click on it and drag it to your toolbar.

Cool, huh?

Another little thing I ran into today was a mismatch between nullable and non-nullable columns in related tables. I started to perform a manual audit before I said, “There has to be a faster way.” And you know what? There was. Here’s the little script I used:

SELECT AS [columnName]
    , c1.is_nullable AS [FactInternetSales_isNullable]
    , c2.is_nullable AS [FactResellerSales_isNullable]
/* replace AdventureWorksDW2008R2 with your database */
FROM AdventureWorksDW2008R2.sys.columns AS c1 
JOIN AdventureWorksDW2008R2.sys.columns AS c2
    ON =
WHERE c1.object_id = object_id('dbo.FactInternetSales') -- replace with your table
    AND c2.object_id = object_id('dbo.FactResellerSales') -- replace with your table
    AND c1.is_nullable <> c2.is_nullable; -- find discrepancies

This returned the following results:

columnName             FactInternetSales_isNullable FactResellerSales_isNullable
---------------------- ---------------------------- ----------------------------
RevisionNumber         0                            1
OrderQuantity          0                            1
UnitPrice              0                            1
ExtendedAmount         0                            1
UnitPriceDiscountPct   0                            1
DiscountAmount         0                            1
ProductStandardCost    0                            1
TotalProductCost       0                            1
SalesAmount            0                            1
TaxAmt                 0                            1
Freight                0                            1
(11 row(s) affected)

This script isn’t as polished as my usual, but it’s really just a quick-and-dirty way of checking for mismatches in NULL properties.

Okay, that’s all I have for now, my SQL friends. Oh, and if you know any other cool little tidbits, please feel free to share with me. :)

BCP Script Generator

I’m currently working on the logic migration of data marts from SQL Server to Teradata. While another team is working on the actual table migration, it’s still helpful to have some data in the tables for developing against. The easiest method I’ve found to do this is to use BCP to export some sample data. So of course, I’ve created a SQL script that will generate the BCP code for me. Because that’s what I like to do on Sunday evenings.

-- User-defined variables --
DECLARE @tableToBCP NVARCHAR(128)   = 'AdventureWorksDW2008R2.dbo.DimCustomer'
    , @Top          VARCHAR(10)     = NULL -- Leave NULL for all rows
    , @Delimiter    VARCHAR(4)      = '|'
    , @UseNULL      BIT             = 1
    , @OverrideChar CHAR(1)         = '~'
    , @MaxDop       CHAR(1)         = '1'
    , @Directory    VARCHAR(256)    = 'C:\bcp_output\';
-- Script-defined variables -- 
DECLARE @columnList TABLE (columnID INT);
    , @currentID INT
    , @firstID INT;
INSERT INTO @columnList
SELECT column_id 
FROM sys.columns 
WHERE object_id = OBJECT_ID(@tableToBCP)
ORDER BY column_id;
    SET @bcpStatement = @bcpStatement + 'TOP (' + @Top + ') ';
SELECT @firstID = MIN(columnID) FROM @columnList;
    SELECT @currentID = MIN(columnID) FROM @columnList;
    IF @currentID <> @firstID
        SET @bcpStatement = @bcpStatement + ',';
    SELECT @bcpStatement = @bcpStatement + name
    FROM sys.columns 
    WHERE object_id = OBJECT_ID(@tableToBCP)
        AND column_id = @currentID;
    DELETE FROM @columnList WHERE columnID = @currentID;
SET @bcpStatement = @bcpStatement + ' FROM ' + @tableToBCP 
    + ' WITH (NOLOCK) OPTION (MAXDOP 1);" queryOut '
    + @Directory + REPLACE(@tableToBCP, '.', '_') + '.dat -S' + @@SERVERNAME
    + ' -T -t"' + @Delimiter + '" -c -C;'
SELECT @bcpStatement;

This will generate a standard BCP script:

BCP "SELECT CustomerKey,GeographyKey,CustomerAlternateKey,Title,FirstName,MiddleName,
FROM AdventureWorksDW2008R2.dbo.DimCustomer WITH (NOLOCK) OPTION (MAXDOP 1);" queryOut 
C:\bcp_output\AdventureWorksDW2008R2_dbo_DimCustomer.dat -SSQLFOOL -T -t"|" -c -C;

However, I’ve been running into some issues with the data load. See, the thing… I’m finding that some of my tables don’t… that is to say… they may possibly contain… [whisper]uncleansed data[/whisper]. I know, I know… this may come as a shock to many of you, and all I ask is that you please don’t judge me for it. ;)

What do I mean by “uncleansed data?” I mostly mean user-inputted VARCHAR columns that contain pipes (|), tabs, carriage returns, and line feeds. These types of characters tend to mess with the data import process. Also, I’ve not yet found a way to import a data file into Teradata where a non-nullable character column contains an empty string (”). Obviously, the vast majority of the data is fine, but even one of these issues can throw an error during the import process. I’ve modified the script above to handle these specific exceptions.

Since I’m only using this data for testing purposes, I found it pretty easy to simply replace the offending records with ~. I’m not sure if anyone else has a need for this particular script, but I figured you could modify it pretty easily to do whatever you need.

-- User-defined variables --
DECLARE @tableToBCP NVARCHAR(128)   = 'AdventureWorksDW2008R2.dbo.DimCustomer'
    , @Top          VARCHAR(10)     = NULL -- Leave NULL for all rows
    , @Delimiter    VARCHAR(4)      = '|'
    , @UseNULL      BIT             = 1
    , @OverrideChar CHAR(1)         = '~'
    , @MaxDop       CHAR(1)         = '1'
    , @Directory    VARCHAR(256)    = 'C:\bcp_output\';
-- Script-defined variables -- 
DECLARE @columnList TABLE (columnID INT);
    , @currentID INT
    , @firstID INT;
INSERT INTO @columnList
SELECT column_id 
FROM sys.columns 
WHERE object_id = OBJECT_ID(@tableToBCP)
ORDER BY column_id;
    SET @bcpStatement = @bcpStatement + 'TOP (' + @Top + ') ';
SELECT @firstID = MIN(columnID) FROM @columnList;
    SELECT @currentID = MIN(columnID) FROM @columnList;
    IF @currentID <> @firstID
        SET @bcpStatement = @bcpStatement + ',';
    SELECT @bcpStatement = @bcpStatement + 
                                WHEN user_type_id IN (231, 167, 175, 239) 
                                THEN 'CASE WHEN ' + name + ' = '''' THEN ' 
                                    + CASE 
                                        WHEN is_nullable = 1 THEN 'NULL' 
                                        ELSE '''' + REPLICATE(@OverrideChar, max_length) + ''''
                                    + ' WHEN ' + name + ' LIKE ''%' + @Delimiter + '%'''
                                        + ' OR ' + name + ' LIKE ''%'' + CHAR(9) + ''%''' -- tab
                                        + ' OR ' + name + ' LIKE ''%'' + CHAR(10) + ''%''' -- line feed
                                        + ' OR ' + name + ' LIKE ''%'' + CHAR(13) + ''%''' -- carriage return
                                        + ' THEN ' 
                                        + CASE 
                                            WHEN is_nullable = 1 THEN 'NULL' 
                                            ELSE '''' + REPLICATE(@OverrideChar, max_length) + ''''
                                    + ' ELSE ' + name + ' END' 
                                ELSE name 
    FROM sys.columns 
    WHERE object_id = OBJECT_ID(@tableToBCP)
        AND column_id = @currentID;
    DELETE FROM @columnList WHERE columnID = @currentID;
SET @bcpStatement = @bcpStatement + ' FROM ' + @tableToBCP 
    + ' WITH (NOLOCK) OPTION (MAXDOP 1);" queryOut '
    + @Directory + REPLACE(@tableToBCP, '.', '_') + '.dat -S' + @@SERVERNAME
    + ' -T -t"' + @Delimiter + '" -c -C;'
SELECT @bcpStatement;

The sample output of this would look like:

BCP "SELECT CustomerKey,GeographyKey,CASE WHEN CustomerAlternateKey = '' THEN 
'~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~' WHEN CustomerAlternateKey LIKE '%|%' OR 
CustomerAlternateKey LIKE '%' + CHAR(9) + '%' OR CustomerAlternateKey LIKE 
'%' + CHAR(10) + '%' OR CustomerAlternateKey LIKE '%' + CHAR(13) + '%' 
THEN '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~' ELSE CustomerAlternateKey END,
CASE WHEN Title = '' THEN NULL WHEN Title LIKE '%|%' OR Title LIKE '%' + CHAR(9)
 + '%' OR Title LIKE '%' + CHAR(10) + '%' OR Title LIKE '%' + CHAR(13) + '%' 
LIKE '%|%' OR FirstName LIKE '%' + CHAR(9) + '%' OR FirstName LIKE '%' + 
CHAR(10) + '%' OR FirstName LIKE '%' + CHAR(13) + '%' THEN NULL ELSE 
FirstName END,CASE WHEN MiddleName = '' THEN NULL WHEN MiddleName LIKE '%|%'
 OR MiddleName LIKE '%' + CHAR(9) + '%' OR MiddleName LIKE '%' + CHAR(10) +
 '%' OR MiddleName LIKE '%' + CHAR(13) + '%' THEN NULL ELSE MiddleName END,
CASE WHEN LastName = '' THEN NULL WHEN LastName LIKE '%|%' OR LastName LIKE
 '%' + CHAR(9) + '%' OR LastName LIKE '%' + CHAR(10) + '%' OR LastName LIKE
 '%' + CHAR(13) + '%' THEN NULL ELSE LastName END,NameStyle,BirthDate,CASE 
WHEN MaritalStatus = '' THEN NULL WHEN MaritalStatus LIKE '%|%' OR 
MaritalStatus LIKE '%' + CHAR(9) + '%' OR MaritalStatus LIKE '%' + CHAR(10) 
+ '%' OR MaritalStatus LIKE '%' + CHAR(13) + '%' THEN NULL ELSE MaritalStatus 
END,CASE WHEN Suffix = '' THEN NULL WHEN Suffix LIKE '%|%' OR Suffix LIKE '%' 
+ CHAR(9) + '%' OR Suffix LIKE '%' + CHAR(10) + '%' OR Suffix LIKE '%' + 
WHEN Gender LIKE '%|%' OR Gender LIKE '%' + CHAR(9) + '%' OR Gender LIKE '%' 
+ CHAR(10) + '%' OR Gender LIKE '%' + CHAR(13) + '%' THEN NULL ELSE Gender 
END,CASE WHEN EmailAddress = '' THEN NULL WHEN EmailAddress LIKE '%|%' OR 
EmailAddress LIKE '%' + CHAR(9) + '%' OR EmailAddress LIKE '%' + CHAR(10) + 
'%' OR EmailAddress LIKE '%' + CHAR(13) + '%' THEN NULL ELSE EmailAddress END,
YearlyIncome,TotalChildren,NumberChildrenAtHome, CASE WHEN EnglishEducation = '' 
THEN NULL WHEN EnglishEducation LIKE '%|%' OR 
EnglishEducation LIKE '%' + CHAR(9) + '%' OR EnglishEducation LIKE '%' + 
CHAR(10) + '%' OR EnglishEducation LIKE '%' 
+ CHAR(13) + '%' THEN NULL ELSE EnglishEducation END,CASE WHEN 
SpanishEducation = '' THEN NULL WHEN SpanishEducation LIKE '%|%' OR 
SpanishEducation LIKE '%' + CHAR(9) + '%' OR SpanishEducation LIKE '%' + 
CHAR(10) + '%' OR SpanishEducation LIKE '%' + CHAR(13) + '%' THEN NULL 
ELSE SpanishEducation END,CASE WHEN FrenchEducation = '' THEN NULL WHEN 
FrenchEducation LIKE '%|%' OR FrenchEducation LIKE '%' + CHAR(9) + '%' 
OR FrenchEducation LIKE '%' + CHAR(10) + '%' OR FrenchEducation LIKE '%' 
+ CHAR(13) + '%' THEN NULL ELSE FrenchEducation END,CASE WHEN 
EnglishOccupation = '' THEN NULL WHEN EnglishOccupation LIKE '%|%' OR 
EnglishOccupation LIKE '%' + CHAR(9) + '%' OR EnglishOccupation LIKE '%' 
+ CHAR(10) + '%' OR EnglishOccupation LIKE '%' + CHAR(13) + '%' THEN 
NULL ELSE EnglishOccupation END,CASE WHEN SpanishOccupation = '' THEN 
NULL WHEN SpanishOccupation LIKE '%|%' OR SpanishOccupation LIKE '%' 
+ CHAR(9) + '%' OR SpanishOccupation LIKE '%' + CHAR(10) + '%' OR 
SpanishOccupation LIKE '%' + CHAR(13) + '%' THEN NULL ELSE SpanishOccupation 
END,CASE WHEN FrenchOccupation = '' THEN NULL WHEN FrenchOccupation LIKE 
'%|%' OR FrenchOccupation LIKE '%' + CHAR(9) + '%' OR FrenchOccupation 
LIKE '%' + CHAR(10) + '%' OR FrenchOccupation LIKE '%' + CHAR(13) + '%' 
THEN NULL ELSE FrenchOccupation END,CASE WHEN HouseOwnerFlag = '' THEN 
NULL WHEN HouseOwnerFlag LIKE '%|%' OR HouseOwnerFlag LIKE '%' + CHAR(9) 
+ '%' OR HouseOwnerFlag LIKE '%' + CHAR(10) + '%' OR HouseOwnerFlag LIKE 
'%' + CHAR(13) + '%' THEN NULL ELSE HouseOwnerFlag END,NumberCarsOwned,CASE 
WHEN AddressLine1 = '' THEN NULL WHEN AddressLine1 LIKE '%|%' OR AddressLine1 
LIKE '%' + CHAR(9) + '%' OR AddressLine1 LIKE '%' + CHAR(10) + '%' OR 
AddressLine1 LIKE '%' + CHAR(13) + '%' THEN NULL ELSE AddressLine1 END,CASE 
WHEN AddressLine2 = '' THEN NULL WHEN AddressLine2 LIKE '%|%' OR AddressLine2 
LIKE '%' + CHAR(9) + '%' OR AddressLine2 LIKE '%' + CHAR(10) + '%' OR 
AddressLine2 LIKE '%' + CHAR(13) + '%' THEN NULL ELSE AddressLine2 END,CASE 
WHEN Phone = '' THEN NULL WHEN Phone LIKE '%|%' OR Phone LIKE '%' + CHAR(9) 
+ '%' OR Phone LIKE '%' + CHAR(10) + '%' OR Phone LIKE '%' + CHAR(13) + '%' 
THEN NULL ELSE Phone END,DateFirstPurchase,CASE WHEN CommuteDistance = '' 
THEN NULL WHEN CommuteDistance LIKE '%|%' OR CommuteDistance LIKE '%' + 
CHAR(9) + '%' OR CommuteDistance LIKE '%' + CHAR(10) + '%' OR CommuteDistance 
LIKE '%' + CHAR(13) + '%' THEN NULL ELSE CommuteDistance END 
FROM AdventureWorksDW2008R2.dbo.DimCustomer WITH (NOLOCK) OPTION (MAXDOP 1);" queryOut 
C:\bcp_output\AdventureWorksDW2008R2_dbo_DimCustomer.dat -SSQLFOOL -T -t"|" -c -C;

I don’t know about you, but that would take me a bit longer to write manually than to execute the script above. :)

One note: I’ve found that copying this code into a batch file will actually render the CHAR functions, i.e. CHAR(13) will be replaced with a carriage return in the script. To avoid this, copy and paste the BCP script directly into your command window.

Not familiar with BCP? Refer to my blog post on BCP Basics to help get you started.

A question for my blog readers…

My role at work has recently changed. While I will continue to be working with SQL Server, I will also be spending a good chunk of my time working in our newly acquired Teradata environment. My plan is to blog about both as time permits, but I am trying to determine whether I should create a separate blog for the Teradata content. So my question for you, dear reader, is whether you think I should blog on this site about both SQL Server *and* Teradata, or whether I should separate the content? Any input is appreciated.

Are You Approaching Your Partition Range Limits?

In my post last week, How To Estimate Data Utilization, I said that it may be my last post for a while. Well… apparently I lied. :)

For those of you who use table partitioning, you know that you need to define a partitioning scheme and function prior to applying partitioning to an index. Personally, I tend to build the function for a couple of years out, and I tend to create them through the end of a calendar year. Now, if I failed to expand a partition range at the end of the year, then come January 1st, all of my data would be written to the same partition. Not the end of the world, no, but it causes all kinds of nasty performance and maintenance issues. Thus, as part of my end-of-year / maternity-leave preparations, I’m in the process of examining all partitioned functions to identify those that need to have their partition ranges expanded. For those interested, here’s the script I used:

IF OBJECT_ID('tempdb..#Results') IS NOT NULL
    DROP TABLE #Results;
      databaseName  NVARCHAR(128)
    , schemaName    NVARCHAR(128)
    , functionName  NVARCHAR(128)
    , data_space_id INT
    , maxRangeValue SQL_VARIANT
/* Grab results for each database and store in our temp table.  
   And no, I don't *need* to select from sys.indexes and perform 
   left joins, but I'm overly cautious and want to make sure 
   I'm not accidentally missing any databases. :) */
--EXECUTE master.dbo.sp_msforeachdb
EXECUTE sp_foreachdb 'USE ?;
SELECT DB_NAME() AS databaseName
    , AS schemaName
    , AS functionName
    , sps.data_space_id 
    , MAX(prv.value) AS maxRangeValue
FROM sys.indexes AS i
LEFT JOIN sys.partition_schemes AS sps WITH (NOLOCK)
    ON i.data_space_id = sps.data_space_id
LEFT JOIN sys.partition_functions AS spf WITH (NOLOCK)
    ON sps.function_id = spf.function_id
LEFT JOIN sys.partition_range_values AS prv WITH (NOLOCK)
    ON spf.function_id = prv.function_id
    , sps.data_space_id;';
    sp_foreachdb was written by SQL MVP Aaron Bertrand and can be downloaded 
    Alternatively, you can also use sys.sp_MSforeachdb
/* Make sure we're not missing any major databases */
SELECT * FROM sys.databases WHERE name NOT IN (SELECT databaseName FROM #Results);
/* Retrieve our results */
FROM #Results
ORDER BY maxRangeValue;

Example Results:

databaseName        schemaName                      functionName                          data_space_id   maxRangeValue
------------------- ------------------------------- ------------------------------------- --------------- -------------------------
HistoricalMart      dailyRangeDate_ps               dailyRangeDate_pf                     65609           2011-12-31 00:00:00.000
AdventureWorks      yearlyRangeSmallDateTime_ps     yearlyRangeSmallDateTime_pf           65605           2012-01-01 00:00:00.000
dbaTools            monthlyRangeDateTime_ps         monthlyRangeDateTime_pf               65604           2012-12-01 00:00:00.000