BCP Script Generator

I’m currently working on the logic migration of data marts from SQL Server to Teradata. While another team is working on the actual table migration, it’s still helpful to have some data in the tables for developing against. The easiest method I’ve found to do this is to use BCP to export some sample data. So of course, I’ve created a SQL script that will generate the BCP code for me. Because that’s what I like to do on Sunday evenings.

-- User-defined variables --
 
DECLARE @tableToBCP NVARCHAR(128)   = 'AdventureWorksDW2008R2.dbo.DimCustomer'
    , @Top          VARCHAR(10)     = NULL -- Leave NULL for all rows
    , @Delimiter    VARCHAR(4)      = '|'
    , @UseNULL      BIT             = 1
    , @OverrideChar CHAR(1)         = '~'
    , @MaxDop       CHAR(1)         = '1'
    , @Directory    VARCHAR(256)    = 'C:\bcp_output\';
 
 
-- Script-defined variables -- 
 
DECLARE @columnList TABLE (columnID INT);
 
DECLARE @bcpStatement NVARCHAR(MAX) = 'BCP "SELECT '
    , @currentID INT
    , @firstID INT;
 
INSERT INTO @columnList
SELECT column_id 
FROM sys.columns 
WHERE object_id = OBJECT_ID(@tableToBCP)
ORDER BY column_id;
 
IF @Top IS NOT NULL
    SET @bcpStatement = @bcpStatement + 'TOP (' + @Top + ') ';
 
SELECT @firstID = MIN(columnID) FROM @columnList;
 
WHILE EXISTS(SELECT * FROM @columnList)
BEGIN
 
    SELECT @currentID = MIN(columnID) FROM @columnList;
 
    IF @currentID <> @firstID
        SET @bcpStatement = @bcpStatement + ',';
 
    SELECT @bcpStatement = @bcpStatement + name
    FROM sys.columns 
    WHERE object_id = OBJECT_ID(@tableToBCP)
        AND column_id = @currentID;
 
    DELETE FROM @columnList WHERE columnID = @currentID;
 
 
END;
 
SET @bcpStatement = @bcpStatement + ' FROM ' + @tableToBCP 
    + ' WITH (NOLOCK) OPTION (MAXDOP 1);" queryOut '
    + @Directory + REPLACE(@tableToBCP, '.', '_') + '.dat -S' + @@SERVERNAME
    + ' -T -t"' + @Delimiter + '" -c -C;'
 
SELECT @bcpStatement;

This will generate a standard BCP script:

BCP "SELECT CustomerKey,GeographyKey,CustomerAlternateKey,Title,FirstName,MiddleName,
LastName,NameStyle,BirthDate,MaritalStatus,Suffix,Gender,EmailAddress,YearlyIncome,
TotalChildren,NumberChildrenAtHome,EnglishEducation,SpanishEducation,FrenchEducation,
EnglishOccupation,SpanishOccupation,FrenchOccupation,HouseOwnerFlag,NumberCarsOwned,
AddressLine1,AddressLine2,Phone,DateFirstPurchase,CommuteDistance 
FROM AdventureWorksDW2008R2.dbo.DimCustomer WITH (NOLOCK) OPTION (MAXDOP 1);" queryOut 
C:\bcp_output\AdventureWorksDW2008R2_dbo_DimCustomer.dat -SSQLFOOL -T -t"|" -c -C;

However, I’ve been running into some issues with the data load. See, the thing… I’m finding that some of my tables don’t… that is to say… they may possibly contain… [whisper]uncleansed data[/whisper]. I know, I know… this may come as a shock to many of you, and all I ask is that you please don’t judge me for it. ;)

What do I mean by “uncleansed data?” I mostly mean user-inputted VARCHAR columns that contain pipes (|), tabs, carriage returns, and line feeds. These types of characters tend to mess with the data import process. Also, I’ve not yet found a way to import a data file into Teradata where a non-nullable character column contains an empty string (”). Obviously, the vast majority of the data is fine, but even one of these issues can throw an error during the import process. I’ve modified the script above to handle these specific exceptions.

Since I’m only using this data for testing purposes, I found it pretty easy to simply replace the offending records with ~. I’m not sure if anyone else has a need for this particular script, but I figured you could modify it pretty easily to do whatever you need.

-- User-defined variables --
 
DECLARE @tableToBCP NVARCHAR(128)   = 'AdventureWorksDW2008R2.dbo.DimCustomer'
    , @Top          VARCHAR(10)     = NULL -- Leave NULL for all rows
    , @Delimiter    VARCHAR(4)      = '|'
    , @UseNULL      BIT             = 1
    , @OverrideChar CHAR(1)         = '~'
    , @MaxDop       CHAR(1)         = '1'
    , @Directory    VARCHAR(256)    = 'C:\bcp_output\';
 
 
-- Script-defined variables -- 
 
DECLARE @columnList TABLE (columnID INT);
 
DECLARE @bcpStatement NVARCHAR(MAX) = 'BCP "SELECT '
    , @currentID INT
    , @firstID INT;
 
INSERT INTO @columnList
SELECT column_id 
FROM sys.columns 
WHERE object_id = OBJECT_ID(@tableToBCP)
ORDER BY column_id;
 
IF @Top IS NOT NULL
    SET @bcpStatement = @bcpStatement + 'TOP (' + @Top + ') ';
 
SELECT @firstID = MIN(columnID) FROM @columnList;
 
WHILE EXISTS(SELECT * FROM @columnList)
BEGIN
 
    SELECT @currentID = MIN(columnID) FROM @columnList;
 
    IF @currentID <> @firstID
        SET @bcpStatement = @bcpStatement + ',';
 
    SELECT @bcpStatement = @bcpStatement + 
                            CASE 
                                WHEN user_type_id IN (231, 167, 175, 239) 
                                THEN 'CASE WHEN ' + name + ' = '''' THEN ' 
                                    + CASE 
                                        WHEN is_nullable = 1 THEN 'NULL' 
                                        ELSE '''' + REPLICATE(@OverrideChar, max_length) + ''''
                                      END
                                    + ' WHEN ' + name + ' LIKE ''%' + @Delimiter + '%'''
                                        + ' OR ' + name + ' LIKE ''%'' + CHAR(9) + ''%''' -- tab
                                        + ' OR ' + name + ' LIKE ''%'' + CHAR(10) + ''%''' -- line feed
                                        + ' OR ' + name + ' LIKE ''%'' + CHAR(13) + ''%''' -- carriage return
                                        + ' THEN ' 
                                        + CASE 
                                            WHEN is_nullable = 1 THEN 'NULL' 
                                            ELSE '''' + REPLICATE(@OverrideChar, max_length) + ''''
                                          END
                                    + ' ELSE ' + name + ' END' 
                                ELSE name 
                            END 
    FROM sys.columns 
    WHERE object_id = OBJECT_ID(@tableToBCP)
        AND column_id = @currentID;
 
    DELETE FROM @columnList WHERE columnID = @currentID;
 
 
END;
 
SET @bcpStatement = @bcpStatement + ' FROM ' + @tableToBCP 
    + ' WITH (NOLOCK) OPTION (MAXDOP 1);" queryOut '
    + @Directory + REPLACE(@tableToBCP, '.', '_') + '.dat -S' + @@SERVERNAME
    + ' -T -t"' + @Delimiter + '" -c -C;'
 
SELECT @bcpStatement;

The sample output of this would look like:

BCP "SELECT CustomerKey,GeographyKey,CASE WHEN CustomerAlternateKey = '' THEN 
'~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~' WHEN CustomerAlternateKey LIKE '%|%' OR 
CustomerAlternateKey LIKE '%' + CHAR(9) + '%' OR CustomerAlternateKey LIKE 
'%' + CHAR(10) + '%' OR CustomerAlternateKey LIKE '%' + CHAR(13) + '%' 
THEN '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~' ELSE CustomerAlternateKey END,
CASE WHEN Title = '' THEN NULL WHEN Title LIKE '%|%' OR Title LIKE '%' + CHAR(9)
 + '%' OR Title LIKE '%' + CHAR(10) + '%' OR Title LIKE '%' + CHAR(13) + '%' 
THEN NULL ELSE Title END,CASE WHEN FirstName = '' THEN NULL WHEN FirstName 
LIKE '%|%' OR FirstName LIKE '%' + CHAR(9) + '%' OR FirstName LIKE '%' + 
CHAR(10) + '%' OR FirstName LIKE '%' + CHAR(13) + '%' THEN NULL ELSE 
FirstName END,CASE WHEN MiddleName = '' THEN NULL WHEN MiddleName LIKE '%|%'
 OR MiddleName LIKE '%' + CHAR(9) + '%' OR MiddleName LIKE '%' + CHAR(10) +
 '%' OR MiddleName LIKE '%' + CHAR(13) + '%' THEN NULL ELSE MiddleName END,
CASE WHEN LastName = '' THEN NULL WHEN LastName LIKE '%|%' OR LastName LIKE
 '%' + CHAR(9) + '%' OR LastName LIKE '%' + CHAR(10) + '%' OR LastName LIKE
 '%' + CHAR(13) + '%' THEN NULL ELSE LastName END,NameStyle,BirthDate,CASE 
WHEN MaritalStatus = '' THEN NULL WHEN MaritalStatus LIKE '%|%' OR 
MaritalStatus LIKE '%' + CHAR(9) + '%' OR MaritalStatus LIKE '%' + CHAR(10) 
+ '%' OR MaritalStatus LIKE '%' + CHAR(13) + '%' THEN NULL ELSE MaritalStatus 
END,CASE WHEN Suffix = '' THEN NULL WHEN Suffix LIKE '%|%' OR Suffix LIKE '%' 
+ CHAR(9) + '%' OR Suffix LIKE '%' + CHAR(10) + '%' OR Suffix LIKE '%' + 
CHAR(13) + '%' THEN NULL ELSE Suffix END,CASE WHEN Gender = '' THEN NULL 
WHEN Gender LIKE '%|%' OR Gender LIKE '%' + CHAR(9) + '%' OR Gender LIKE '%' 
+ CHAR(10) + '%' OR Gender LIKE '%' + CHAR(13) + '%' THEN NULL ELSE Gender 
END,CASE WHEN EmailAddress = '' THEN NULL WHEN EmailAddress LIKE '%|%' OR 
EmailAddress LIKE '%' + CHAR(9) + '%' OR EmailAddress LIKE '%' + CHAR(10) + 
'%' OR EmailAddress LIKE '%' + CHAR(13) + '%' THEN NULL ELSE EmailAddress END,
YearlyIncome,TotalChildren,NumberChildrenAtHome, CASE WHEN EnglishEducation = '' 
THEN NULL WHEN EnglishEducation LIKE '%|%' OR 
EnglishEducation LIKE '%' + CHAR(9) + '%' OR EnglishEducation LIKE '%' + 
CHAR(10) + '%' OR EnglishEducation LIKE '%' 
+ CHAR(13) + '%' THEN NULL ELSE EnglishEducation END,CASE WHEN 
SpanishEducation = '' THEN NULL WHEN SpanishEducation LIKE '%|%' OR 
SpanishEducation LIKE '%' + CHAR(9) + '%' OR SpanishEducation LIKE '%' + 
CHAR(10) + '%' OR SpanishEducation LIKE '%' + CHAR(13) + '%' THEN NULL 
ELSE SpanishEducation END,CASE WHEN FrenchEducation = '' THEN NULL WHEN 
FrenchEducation LIKE '%|%' OR FrenchEducation LIKE '%' + CHAR(9) + '%' 
OR FrenchEducation LIKE '%' + CHAR(10) + '%' OR FrenchEducation LIKE '%' 
+ CHAR(13) + '%' THEN NULL ELSE FrenchEducation END,CASE WHEN 
EnglishOccupation = '' THEN NULL WHEN EnglishOccupation LIKE '%|%' OR 
EnglishOccupation LIKE '%' + CHAR(9) + '%' OR EnglishOccupation LIKE '%' 
+ CHAR(10) + '%' OR EnglishOccupation LIKE '%' + CHAR(13) + '%' THEN 
NULL ELSE EnglishOccupation END,CASE WHEN SpanishOccupation = '' THEN 
NULL WHEN SpanishOccupation LIKE '%|%' OR SpanishOccupation LIKE '%' 
+ CHAR(9) + '%' OR SpanishOccupation LIKE '%' + CHAR(10) + '%' OR 
SpanishOccupation LIKE '%' + CHAR(13) + '%' THEN NULL ELSE SpanishOccupation 
END,CASE WHEN FrenchOccupation = '' THEN NULL WHEN FrenchOccupation LIKE 
'%|%' OR FrenchOccupation LIKE '%' + CHAR(9) + '%' OR FrenchOccupation 
LIKE '%' + CHAR(10) + '%' OR FrenchOccupation LIKE '%' + CHAR(13) + '%' 
THEN NULL ELSE FrenchOccupation END,CASE WHEN HouseOwnerFlag = '' THEN 
NULL WHEN HouseOwnerFlag LIKE '%|%' OR HouseOwnerFlag LIKE '%' + CHAR(9) 
+ '%' OR HouseOwnerFlag LIKE '%' + CHAR(10) + '%' OR HouseOwnerFlag LIKE 
'%' + CHAR(13) + '%' THEN NULL ELSE HouseOwnerFlag END,NumberCarsOwned,CASE 
WHEN AddressLine1 = '' THEN NULL WHEN AddressLine1 LIKE '%|%' OR AddressLine1 
LIKE '%' + CHAR(9) + '%' OR AddressLine1 LIKE '%' + CHAR(10) + '%' OR 
AddressLine1 LIKE '%' + CHAR(13) + '%' THEN NULL ELSE AddressLine1 END,CASE 
WHEN AddressLine2 = '' THEN NULL WHEN AddressLine2 LIKE '%|%' OR AddressLine2 
LIKE '%' + CHAR(9) + '%' OR AddressLine2 LIKE '%' + CHAR(10) + '%' OR 
AddressLine2 LIKE '%' + CHAR(13) + '%' THEN NULL ELSE AddressLine2 END,CASE 
WHEN Phone = '' THEN NULL WHEN Phone LIKE '%|%' OR Phone LIKE '%' + CHAR(9) 
+ '%' OR Phone LIKE '%' + CHAR(10) + '%' OR Phone LIKE '%' + CHAR(13) + '%' 
THEN NULL ELSE Phone END,DateFirstPurchase,CASE WHEN CommuteDistance = '' 
THEN NULL WHEN CommuteDistance LIKE '%|%' OR CommuteDistance LIKE '%' + 
CHAR(9) + '%' OR CommuteDistance LIKE '%' + CHAR(10) + '%' OR CommuteDistance 
LIKE '%' + CHAR(13) + '%' THEN NULL ELSE CommuteDistance END 
FROM AdventureWorksDW2008R2.dbo.DimCustomer WITH (NOLOCK) OPTION (MAXDOP 1);" queryOut 
C:\bcp_output\AdventureWorksDW2008R2_dbo_DimCustomer.dat -SSQLFOOL -T -t"|" -c -C;

I don’t know about you, but that would take me a bit longer to write manually than to execute the script above. :)

One note: I’ve found that copying this code into a batch file will actually render the CHAR functions, i.e. CHAR(13) will be replaced with a carriage return in the script. To avoid this, copy and paste the BCP script directly into your command window.

Not familiar with BCP? Refer to my blog post on BCP Basics to help get you started.

0saves
If you enjoyed this post, please consider leaving a comment or subscribing to the RSS feed to have future articles delivered to your feed reader.
Tagged , , , , , , . Bookmark the permalink.

7 Responses to BCP Script Generator

  1. Rowland Gosling says:

    Nice to see an appreciation for BCP! Obviously you’re working with big data (like me).

    I like your solution using dynamic sql too.

  2. david hay says:

    Talk about timing! I just got handed a BCP project, this will help so much! Thank you!

  3. Todd Everett says:

    Thanks for this great script. Glad to see you are having a little more time to blog and I appreciate you sharing the scripts you develop – they are always useful. How are you finding Teradata stacks up against SQL Server for data marts? We use SQL Server for all of our data marts where I work but Teradata is always calling….

  4. @Rowland Thanks!

    @David Glad I could help! Good luck with the project. :)

    @Todd Yeah, it’s hard to find time with a 5-month old, but he’s starting to sleep a little bit better at night, which means I’m markedly less zombie-like. Which hopefully means more blogging. :) It’s still early in the data migration phase, but I can tell you that we’ve seen a huge performance improvement on the couple of marts that have already been migrated. The actual migration process is a whole other story… :)

  5. Pingback: Something for the Weekend – SQL Server Links 20/04/12 - John Sansom SQL Server DBA

  6. Zberteoc says:

    Hi Michelle, I too am a sort of SQL scripting junkie and I got over your article referenced from http://www.experts-exhange.com website. All good and such only that I hate loops of any king in SQL so I will share with you a different and simpler way to build your bcp statement without a loop. You probably know this trick but since in the article you used the loop here I go:

    Instead of this code:


    INSERT INTO @columnList
    SELECT column_id
    FROM sys.columns
    WHERE object_id = OBJECT_ID(@tableToBCP)
    ORDER BY column_id;

    IF @Top IS NOT NULL
    SET @bcpStatement = @bcpStatement + 'TOP (' + @Top + ') ';

    SELECT @firstID = MIN(columnID) FROM @columnList;

    WHILE EXISTS(SELECT * FROM @columnList)
    BEGIN

    SELECT @currentID = MIN(columnID) FROM @columnList;

    IF @currentID @firstID
    SET @bcpStatement = @bcpStatement + ',';

    SELECT @bcpStatement = @bcpStatement + name
    FROM sys.columns
    WHERE object_id = OBJECT_ID(@tableToBCP)
    AND column_id = @currentID;

    DELETE FROM @columnList WHERE columnID = @currentID;

    END;

    use this:


    -- build the columns list
    SELECT
    @bcpStatement = @bcpStatement + name + ','
    FROM sys.columns
    WHERE
    object_id = OBJECT_ID(@tableToBCP)
    order by
    column_id
    -- now get rid of the last comma
    select
    @bcpStatement=REPLACE(@bcpStatement+'#@$', ',#@$','')

  7. Teri Huntley says:

    Hi, Michelle,

    I found your post looking into an issue we’re having in using BCP in SQL 2012 that appears to be related to parallelism. I see you’re using MAXDOP 1 as an option, and this setting resolves our issue. I haven’t found anything else online about this. Do you have any additional information or references on this?

    Thanks.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>