BCP Script Generator

I’m currently working on the logic migration of data marts from SQL Server to Teradata. While another team is working on the actual table migration, it’s still helpful to have some data in the tables for developing against. The easiest method I’ve found to do this is to use BCP to export some sample data. So of course, I’ve created a SQL script that will generate the BCP code for me. Because that’s what I like to do on Sunday evenings.

-- User-defined variables --
 
DECLARE @tableToBCP NVARCHAR(128)   = 'AdventureWorksDW2008R2.dbo.DimCustomer'
    , @Top          VARCHAR(10)     = NULL -- Leave NULL for all rows
    , @Delimiter    VARCHAR(4)      = '|'
    , @UseNULL      BIT             = 1
    , @OverrideChar CHAR(1)         = '~'
    , @MaxDop       CHAR(1)         = '1'
    , @Directory    VARCHAR(256)    = 'C:\bcp_output\';
 
 
-- Script-defined variables -- 
 
DECLARE @columnList TABLE (columnID INT);
 
DECLARE @bcpStatement NVARCHAR(MAX) = 'BCP "SELECT '
    , @currentID INT
    , @firstID INT;
 
INSERT INTO @columnList
SELECT column_id 
FROM sys.columns 
WHERE object_id = OBJECT_ID(@tableToBCP)
ORDER BY column_id;
 
IF @Top IS NOT NULL
    SET @bcpStatement = @bcpStatement + 'TOP (' + @Top + ') ';
 
SELECT @firstID = MIN(columnID) FROM @columnList;
 
WHILE EXISTS(SELECT * FROM @columnList)
BEGIN
 
    SELECT @currentID = MIN(columnID) FROM @columnList;
 
    IF @currentID <> @firstID
        SET @bcpStatement = @bcpStatement + ',';
 
    SELECT @bcpStatement = @bcpStatement + name
    FROM sys.columns 
    WHERE object_id = OBJECT_ID(@tableToBCP)
        AND column_id = @currentID;
 
    DELETE FROM @columnList WHERE columnID = @currentID;
 
 
END;
 
SET @bcpStatement = @bcpStatement + ' FROM ' + @tableToBCP 
    + ' WITH (NOLOCK) OPTION (MAXDOP 1);" queryOut '
    + @Directory + REPLACE(@tableToBCP, '.', '_') + '.dat -S' + @@SERVERNAME
    + ' -T -t"' + @Delimiter + '" -c -C;'
 
SELECT @bcpStatement;

This will generate a standard BCP script:

BCP "SELECT CustomerKey,GeographyKey,CustomerAlternateKey,Title,FirstName,MiddleName,
LastName,NameStyle,BirthDate,MaritalStatus,Suffix,Gender,EmailAddress,YearlyIncome,
TotalChildren,NumberChildrenAtHome,EnglishEducation,SpanishEducation,FrenchEducation,
EnglishOccupation,SpanishOccupation,FrenchOccupation,HouseOwnerFlag,NumberCarsOwned,
AddressLine1,AddressLine2,Phone,DateFirstPurchase,CommuteDistance 
FROM AdventureWorksDW2008R2.dbo.DimCustomer WITH (NOLOCK) OPTION (MAXDOP 1);" queryOut 
C:\bcp_output\AdventureWorksDW2008R2_dbo_DimCustomer.dat -SSQLFOOL -T -t"|" -c -C;

However, I’ve been running into some issues with the data load. See, the thing… I’m finding that some of my tables don’t… that is to say… they may possibly contain… [whisper]uncleansed data[/whisper]. I know, I know… this may come as a shock to many of you, and all I ask is that you please don’t judge me for it. ;)

What do I mean by “uncleansed data?” I mostly mean user-inputted VARCHAR columns that contain pipes (|), tabs, carriage returns, and line feeds. These types of characters tend to mess with the data import process. Also, I’ve not yet found a way to import a data file into Teradata where a non-nullable character column contains an empty string (”). Obviously, the vast majority of the data is fine, but even one of these issues can throw an error during the import process. I’ve modified the script above to handle these specific exceptions.

Since I’m only using this data for testing purposes, I found it pretty easy to simply replace the offending records with ~. I’m not sure if anyone else has a need for this particular script, but I figured you could modify it pretty easily to do whatever you need.

-- User-defined variables --
 
DECLARE @tableToBCP NVARCHAR(128)   = 'AdventureWorksDW2008R2.dbo.DimCustomer'
    , @Top          VARCHAR(10)     = NULL -- Leave NULL for all rows
    , @Delimiter    VARCHAR(4)      = '|'
    , @UseNULL      BIT             = 1
    , @OverrideChar CHAR(1)         = '~'
    , @MaxDop       CHAR(1)         = '1'
    , @Directory    VARCHAR(256)    = 'C:\bcp_output\';
 
 
-- Script-defined variables -- 
 
DECLARE @columnList TABLE (columnID INT);
 
DECLARE @bcpStatement NVARCHAR(MAX) = 'BCP "SELECT '
    , @currentID INT
    , @firstID INT;
 
INSERT INTO @columnList
SELECT column_id 
FROM sys.columns 
WHERE object_id = OBJECT_ID(@tableToBCP)
ORDER BY column_id;
 
IF @Top IS NOT NULL
    SET @bcpStatement = @bcpStatement + 'TOP (' + @Top + ') ';
 
SELECT @firstID = MIN(columnID) FROM @columnList;
 
WHILE EXISTS(SELECT * FROM @columnList)
BEGIN
 
    SELECT @currentID = MIN(columnID) FROM @columnList;
 
    IF @currentID <> @firstID
        SET @bcpStatement = @bcpStatement + ',';
 
    SELECT @bcpStatement = @bcpStatement + 
                            CASE 
                                WHEN user_type_id IN (231, 167, 175, 239) 
                                THEN 'CASE WHEN ' + name + ' = '''' THEN ' 
                                    + CASE 
                                        WHEN is_nullable = 1 THEN 'NULL' 
                                        ELSE '''' + REPLICATE(@OverrideChar, max_length) + ''''
                                      END
                                    + ' WHEN ' + name + ' LIKE ''%' + @Delimiter + '%'''
                                        + ' OR ' + name + ' LIKE ''%'' + CHAR(9) + ''%''' -- tab
                                        + ' OR ' + name + ' LIKE ''%'' + CHAR(10) + ''%''' -- line feed
                                        + ' OR ' + name + ' LIKE ''%'' + CHAR(13) + ''%''' -- carriage return
                                        + ' THEN ' 
                                        + CASE 
                                            WHEN is_nullable = 1 THEN 'NULL' 
                                            ELSE '''' + REPLICATE(@OverrideChar, max_length) + ''''
                                          END
                                    + ' ELSE ' + name + ' END' 
                                ELSE name 
                            END 
    FROM sys.columns 
    WHERE object_id = OBJECT_ID(@tableToBCP)
        AND column_id = @currentID;
 
    DELETE FROM @columnList WHERE columnID = @currentID;
 
 
END;
 
SET @bcpStatement = @bcpStatement + ' FROM ' + @tableToBCP 
    + ' WITH (NOLOCK) OPTION (MAXDOP 1);" queryOut '
    + @Directory + REPLACE(@tableToBCP, '.', '_') + '.dat -S' + @@SERVERNAME
    + ' -T -t"' + @Delimiter + '" -c -C;'
 
SELECT @bcpStatement;

The sample output of this would look like:

BCP "SELECT CustomerKey,GeographyKey,CASE WHEN CustomerAlternateKey = '' THEN 
'~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~' WHEN CustomerAlternateKey LIKE '%|%' OR 
CustomerAlternateKey LIKE '%' + CHAR(9) + '%' OR CustomerAlternateKey LIKE 
'%' + CHAR(10) + '%' OR CustomerAlternateKey LIKE '%' + CHAR(13) + '%' 
THEN '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~' ELSE CustomerAlternateKey END,
CASE WHEN Title = '' THEN NULL WHEN Title LIKE '%|%' OR Title LIKE '%' + CHAR(9)
 + '%' OR Title LIKE '%' + CHAR(10) + '%' OR Title LIKE '%' + CHAR(13) + '%' 
THEN NULL ELSE Title END,CASE WHEN FirstName = '' THEN NULL WHEN FirstName 
LIKE '%|%' OR FirstName LIKE '%' + CHAR(9) + '%' OR FirstName LIKE '%' + 
CHAR(10) + '%' OR FirstName LIKE '%' + CHAR(13) + '%' THEN NULL ELSE 
FirstName END,CASE WHEN MiddleName = '' THEN NULL WHEN MiddleName LIKE '%|%'
 OR MiddleName LIKE '%' + CHAR(9) + '%' OR MiddleName LIKE '%' + CHAR(10) +
 '%' OR MiddleName LIKE '%' + CHAR(13) + '%' THEN NULL ELSE MiddleName END,
CASE WHEN LastName = '' THEN NULL WHEN LastName LIKE '%|%' OR LastName LIKE
 '%' + CHAR(9) + '%' OR LastName LIKE '%' + CHAR(10) + '%' OR LastName LIKE
 '%' + CHAR(13) + '%' THEN NULL ELSE LastName END,NameStyle,BirthDate,CASE 
WHEN MaritalStatus = '' THEN NULL WHEN MaritalStatus LIKE '%|%' OR 
MaritalStatus LIKE '%' + CHAR(9) + '%' OR MaritalStatus LIKE '%' + CHAR(10) 
+ '%' OR MaritalStatus LIKE '%' + CHAR(13) + '%' THEN NULL ELSE MaritalStatus 
END,CASE WHEN Suffix = '' THEN NULL WHEN Suffix LIKE '%|%' OR Suffix LIKE '%' 
+ CHAR(9) + '%' OR Suffix LIKE '%' + CHAR(10) + '%' OR Suffix LIKE '%' + 
CHAR(13) + '%' THEN NULL ELSE Suffix END,CASE WHEN Gender = '' THEN NULL 
WHEN Gender LIKE '%|%' OR Gender LIKE '%' + CHAR(9) + '%' OR Gender LIKE '%' 
+ CHAR(10) + '%' OR Gender LIKE '%' + CHAR(13) + '%' THEN NULL ELSE Gender 
END,CASE WHEN EmailAddress = '' THEN NULL WHEN EmailAddress LIKE '%|%' OR 
EmailAddress LIKE '%' + CHAR(9) + '%' OR EmailAddress LIKE '%' + CHAR(10) + 
'%' OR EmailAddress LIKE '%' + CHAR(13) + '%' THEN NULL ELSE EmailAddress END,
YearlyIncome,TotalChildren,NumberChildrenAtHome, CASE WHEN EnglishEducation = '' 
THEN NULL WHEN EnglishEducation LIKE '%|%' OR 
EnglishEducation LIKE '%' + CHAR(9) + '%' OR EnglishEducation LIKE '%' + 
CHAR(10) + '%' OR EnglishEducation LIKE '%' 
+ CHAR(13) + '%' THEN NULL ELSE EnglishEducation END,CASE WHEN 
SpanishEducation = '' THEN NULL WHEN SpanishEducation LIKE '%|%' OR 
SpanishEducation LIKE '%' + CHAR(9) + '%' OR SpanishEducation LIKE '%' + 
CHAR(10) + '%' OR SpanishEducation LIKE '%' + CHAR(13) + '%' THEN NULL 
ELSE SpanishEducation END,CASE WHEN FrenchEducation = '' THEN NULL WHEN 
FrenchEducation LIKE '%|%' OR FrenchEducation LIKE '%' + CHAR(9) + '%' 
OR FrenchEducation LIKE '%' + CHAR(10) + '%' OR FrenchEducation LIKE '%' 
+ CHAR(13) + '%' THEN NULL ELSE FrenchEducation END,CASE WHEN 
EnglishOccupation = '' THEN NULL WHEN EnglishOccupation LIKE '%|%' OR 
EnglishOccupation LIKE '%' + CHAR(9) + '%' OR EnglishOccupation LIKE '%' 
+ CHAR(10) + '%' OR EnglishOccupation LIKE '%' + CHAR(13) + '%' THEN 
NULL ELSE EnglishOccupation END,CASE WHEN SpanishOccupation = '' THEN 
NULL WHEN SpanishOccupation LIKE '%|%' OR SpanishOccupation LIKE '%' 
+ CHAR(9) + '%' OR SpanishOccupation LIKE '%' + CHAR(10) + '%' OR 
SpanishOccupation LIKE '%' + CHAR(13) + '%' THEN NULL ELSE SpanishOccupation 
END,CASE WHEN FrenchOccupation = '' THEN NULL WHEN FrenchOccupation LIKE 
'%|%' OR FrenchOccupation LIKE '%' + CHAR(9) + '%' OR FrenchOccupation 
LIKE '%' + CHAR(10) + '%' OR FrenchOccupation LIKE '%' + CHAR(13) + '%' 
THEN NULL ELSE FrenchOccupation END,CASE WHEN HouseOwnerFlag = '' THEN 
NULL WHEN HouseOwnerFlag LIKE '%|%' OR HouseOwnerFlag LIKE '%' + CHAR(9) 
+ '%' OR HouseOwnerFlag LIKE '%' + CHAR(10) + '%' OR HouseOwnerFlag LIKE 
'%' + CHAR(13) + '%' THEN NULL ELSE HouseOwnerFlag END,NumberCarsOwned,CASE 
WHEN AddressLine1 = '' THEN NULL WHEN AddressLine1 LIKE '%|%' OR AddressLine1 
LIKE '%' + CHAR(9) + '%' OR AddressLine1 LIKE '%' + CHAR(10) + '%' OR 
AddressLine1 LIKE '%' + CHAR(13) + '%' THEN NULL ELSE AddressLine1 END,CASE 
WHEN AddressLine2 = '' THEN NULL WHEN AddressLine2 LIKE '%|%' OR AddressLine2 
LIKE '%' + CHAR(9) + '%' OR AddressLine2 LIKE '%' + CHAR(10) + '%' OR 
AddressLine2 LIKE '%' + CHAR(13) + '%' THEN NULL ELSE AddressLine2 END,CASE 
WHEN Phone = '' THEN NULL WHEN Phone LIKE '%|%' OR Phone LIKE '%' + CHAR(9) 
+ '%' OR Phone LIKE '%' + CHAR(10) + '%' OR Phone LIKE '%' + CHAR(13) + '%' 
THEN NULL ELSE Phone END,DateFirstPurchase,CASE WHEN CommuteDistance = '' 
THEN NULL WHEN CommuteDistance LIKE '%|%' OR CommuteDistance LIKE '%' + 
CHAR(9) + '%' OR CommuteDistance LIKE '%' + CHAR(10) + '%' OR CommuteDistance 
LIKE '%' + CHAR(13) + '%' THEN NULL ELSE CommuteDistance END 
FROM AdventureWorksDW2008R2.dbo.DimCustomer WITH (NOLOCK) OPTION (MAXDOP 1);" queryOut 
C:\bcp_output\AdventureWorksDW2008R2_dbo_DimCustomer.dat -SSQLFOOL -T -t"|" -c -C;

I don’t know about you, but that would take me a bit longer to write manually than to execute the script above. :)

One note: I’ve found that copying this code into a batch file will actually render the CHAR functions, i.e. CHAR(13) will be replaced with a carriage return in the script. To avoid this, copy and paste the BCP script directly into your command window.

Not familiar with BCP? Refer to my blog post on BCP Basics to help get you started.

BCP Basics

In this blog post, I’m going to walk through the basics of BCP (bulk copy program). BCP is a utility that installs with SQL Server and can assist with large data transfers.

Let’s see what parameter options are available to use. From the command line on a machine with SQL Server installed, type “bcp” and press Enter.

BCP Parameters

You can find out more information on BCP parameters on Books Online: http://msdn.microsoft.com/en-us/library/ms162802.aspx

For now, we’re going to examine just the basics. The simplest syntax of a BCP command is:

bcp
databaseName.Schema.TableName *or* “Query”
in, out, *or* queryout
-S ServerName\instanceName
-U userName -P password *or* -T
-c *or* -n *or* specify storage information for each column

Let’s look at these options in a little more detail:

databaseName.Schema.TableName *or* Query
You can specify either an entire table to copy or a query. The query should be surrounded in quotations and must also include the fully qualified table name.

in, out, *or* queryout
in = import, out = full table export, queryout = query to select data for export

-U userName -P password *or* -T
You can either specify a specific account to access SQL Server, or use -T to indicate Trusted Connection (i.e. Windows Authentication)

-c *or* -n *or* specify storage information for each column
-c indicates character data type, -n indicates native data type; if neither one is specified, by default you will be prompted for the data type for each column.

Now let’s put this together and run some BCP commands. All of these examples will use the AdventureWorks 2008 sample database.

First, let’s export an entire table. To do this, we’ll use the “out” parameter.

bcp AdventureWorks.Sales.SalesOrderDetail out
C:\bcp_outputTable.txt -SYourServerName -T -c

 

Export Table with BCP

Export Table with BCP - Results

I don’t normally export an entire table… or at least, not in one process. So let’s walk through what it would look like to export the same table using a query. This will use the “queryout” parameter.

 

bcp "Select SalesOrderID, SalesOrderDetailID, OrderQty, ProductID
From AdventureWorks.Sales.SalesOrderDetail" queryout
C:\bcp_outputQuery.txt -SYourServerName -T -c

 

Export Query with BCP

Export Query with BCP - Results

You’ll notice that the total duration for the query was shorter than for the full-table export. This is because we’re only exporting a few of the columns. This is important to keep in mind when bcp’ing data: you’ll get better performance if you only export the data elements that you actually need.

Now that we’ve exported some data, let’s walk through the process of importing this data. First, let’s create a table with a constraint that will result in some errors.

Create Table dbo.testBCPLoad
(
      SalesOrderID          int      Not Null
    , SalesOrderDetailID    int      Not Null
    , OrderQty              smallint Null
    , ProductID             int      Null
 
    Constraint PK_testBCPLoad
        Primary Key Clustered
        (SalesOrderID)
);

Now execute the BCP import command:

bcp sandbox.dbo.testBCPLoad in
C:\bcp_outputQuery.txt -SYourServername -T -c

 

Load Data with BCP

Load Data with BCP - Error

You should receive a Primary Key error. When you check your results in SQL Server, you should find no results loaded into the table. This is BCP’s default behavior.

Check Destination Table

Let’s change our constraint and try the same BCP command again:

Alter Table dbo.testBCPLoad
    Drop Constraint PK_testBCPLoad;
 
Alter Table dbo.testBCPLoad
    Add Constraint PK_testBCPLoad
    Primary Key Clustered
        (SalesOrderID, SalesOrderDetailID);

 

bcp sandbox.dbo.testBCPLoad in
C:\bcp_outputQuery.txt -SYourServername -T -c

You should now have the data loaded into your SQL Server destination table:

Import Data with BCP - Results

Destination Table

So there you have it, the basics of BCP! :)

A few BCP tips:

  • BCP commands are case-sensitive!
  • If you’re accessing the data across a WAN, perhaps via a VPN connection, try to remote desktop (mstsc) to the actual SQL Server to perform the BCP. If possible, keep the operation on the same local drive or even local network as the server; the less distance data needs to travel across a network, the faster BCP will perform.
  • If you need to copy large amounts of data (i.e. >100mm rows), try breaking the data into smaller chunks. This will help if you have an error during BCP (i.e. a PK error can rollback the entire import operation by default, although there are options that can change this behavior). When working with partitioned tables, I find it very efficient to segregate the data imported/exported by partition.
  • If you’re BCP’ing data into a new table, you can minimize impact on the server by waiting to create your indexes after all the data is loaded.
  • I like to construct my queries in SSMS, then copy them to BCP. Since the command-line utility does not support copy and pasting, I create a text file with my BCP command in NotePad, then save the command as a .cmd. To execute, just call the .cmd file.