I’m currently working on the logic migration of data marts from SQL Server to Teradata. While another team is working on the actual table migration, it’s still helpful to have some data in the tables for developing against. The easiest method I’ve found to do this is to use BCP to export some sample data. So of course, I’ve created a SQL script that will generate the BCP code for me. Because that’s what I like to do on Sunday evenings.
-- User-defined variables -- DECLARE @tableToBCP NVARCHAR(128) = 'AdventureWorksDW2008R2.dbo.DimCustomer' , @Top VARCHAR(10) = NULL -- Leave NULL for all rows , @Delimiter VARCHAR(4) = '|' , @UseNULL BIT = 1 , @OverrideChar CHAR(1) = '~' , @MaxDop CHAR(1) = '1' , @Directory VARCHAR(256) = 'C:\bcp_output\'; -- Script-defined variables -- DECLARE @columnList TABLE (columnID INT); DECLARE @bcpStatement NVARCHAR(MAX) = 'BCP "SELECT ' , @currentID INT , @firstID INT; INSERT INTO @columnList SELECT column_id FROM sys.columns WHERE object_id = OBJECT_ID(@tableToBCP) ORDER BY column_id; IF @Top IS NOT NULL SET @bcpStatement = @bcpStatement + 'TOP (' + @Top + ') '; SELECT @firstID = MIN(columnID) FROM @columnList; WHILE EXISTS(SELECT * FROM @columnList) BEGIN SELECT @currentID = MIN(columnID) FROM @columnList; IF @currentID <> @firstID SET @bcpStatement = @bcpStatement + ','; SELECT @bcpStatement = @bcpStatement + name FROM sys.columns WHERE object_id = OBJECT_ID(@tableToBCP) AND column_id = @currentID; DELETE FROM @columnList WHERE columnID = @currentID; END; SET @bcpStatement = @bcpStatement + ' FROM ' + @tableToBCP + ' WITH (NOLOCK) OPTION (MAXDOP 1);" queryOut ' + @Directory + REPLACE(@tableToBCP, '.', '_') + '.dat -S' + @@SERVERNAME + ' -T -t"' + @Delimiter + '" -c -C;' SELECT @bcpStatement; |
This will generate a standard BCP script:
BCP "SELECT CustomerKey,GeographyKey,CustomerAlternateKey,Title,FirstName,MiddleName, LastName,NameStyle,BirthDate,MaritalStatus,Suffix,Gender,EmailAddress,YearlyIncome, TotalChildren,NumberChildrenAtHome,EnglishEducation,SpanishEducation,FrenchEducation, EnglishOccupation,SpanishOccupation,FrenchOccupation,HouseOwnerFlag,NumberCarsOwned, AddressLine1,AddressLine2,Phone,DateFirstPurchase,CommuteDistance FROM AdventureWorksDW2008R2.dbo.DimCustomer WITH (NOLOCK) OPTION (MAXDOP 1);" queryOut C:\bcp_output\AdventureWorksDW2008R2_dbo_DimCustomer.dat -SSQLFOOL -T -t"|" -c -C; |
However, I’ve been running into some issues with the data load. See, the thing… I’m finding that some of my tables don’t… that is to say… they may possibly contain… [whisper]uncleansed data[/whisper]. I know, I know… this may come as a shock to many of you, and all I ask is that you please don’t judge me for it.
What do I mean by “uncleansed data?” I mostly mean user-inputted VARCHAR columns that contain pipes (|), tabs, carriage returns, and line feeds. These types of characters tend to mess with the data import process. Also, I’ve not yet found a way to import a data file into Teradata where a non-nullable character column contains an empty string (”). Obviously, the vast majority of the data is fine, but even one of these issues can throw an error during the import process. I’ve modified the script above to handle these specific exceptions.
Since I’m only using this data for testing purposes, I found it pretty easy to simply replace the offending records with ~. I’m not sure if anyone else has a need for this particular script, but I figured you could modify it pretty easily to do whatever you need.
-- User-defined variables -- DECLARE @tableToBCP NVARCHAR(128) = 'AdventureWorksDW2008R2.dbo.DimCustomer' , @Top VARCHAR(10) = NULL -- Leave NULL for all rows , @Delimiter VARCHAR(4) = '|' , @UseNULL BIT = 1 , @OverrideChar CHAR(1) = '~' , @MaxDop CHAR(1) = '1' , @Directory VARCHAR(256) = 'C:\bcp_output\'; -- Script-defined variables -- DECLARE @columnList TABLE (columnID INT); DECLARE @bcpStatement NVARCHAR(MAX) = 'BCP "SELECT ' , @currentID INT , @firstID INT; INSERT INTO @columnList SELECT column_id FROM sys.columns WHERE object_id = OBJECT_ID(@tableToBCP) ORDER BY column_id; IF @Top IS NOT NULL SET @bcpStatement = @bcpStatement + 'TOP (' + @Top + ') '; SELECT @firstID = MIN(columnID) FROM @columnList; WHILE EXISTS(SELECT * FROM @columnList) BEGIN SELECT @currentID = MIN(columnID) FROM @columnList; IF @currentID <> @firstID SET @bcpStatement = @bcpStatement + ','; SELECT @bcpStatement = @bcpStatement + CASE WHEN user_type_id IN (231, 167, 175, 239) THEN 'CASE WHEN ' + name + ' = '''' THEN ' + CASE WHEN is_nullable = 1 THEN 'NULL' ELSE '''' + REPLICATE(@OverrideChar, max_length) + '''' END + ' WHEN ' + name + ' LIKE ''%' + @Delimiter + '%''' + ' OR ' + name + ' LIKE ''%'' + CHAR(9) + ''%''' -- tab + ' OR ' + name + ' LIKE ''%'' + CHAR(10) + ''%''' -- line feed + ' OR ' + name + ' LIKE ''%'' + CHAR(13) + ''%''' -- carriage return + ' THEN ' + CASE WHEN is_nullable = 1 THEN 'NULL' ELSE '''' + REPLICATE(@OverrideChar, max_length) + '''' END + ' ELSE ' + name + ' END' ELSE name END FROM sys.columns WHERE object_id = OBJECT_ID(@tableToBCP) AND column_id = @currentID; DELETE FROM @columnList WHERE columnID = @currentID; END; SET @bcpStatement = @bcpStatement + ' FROM ' + @tableToBCP + ' WITH (NOLOCK) OPTION (MAXDOP 1);" queryOut ' + @Directory + REPLACE(@tableToBCP, '.', '_') + '.dat -S' + @@SERVERNAME + ' -T -t"' + @Delimiter + '" -c -C;' SELECT @bcpStatement; |
The sample output of this would look like:
BCP "SELECT CustomerKey,GeographyKey,CASE WHEN CustomerAlternateKey = '' THEN '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~' WHEN CustomerAlternateKey LIKE '%|%' OR CustomerAlternateKey LIKE '%' + CHAR(9) + '%' OR CustomerAlternateKey LIKE '%' + CHAR(10) + '%' OR CustomerAlternateKey LIKE '%' + CHAR(13) + '%' THEN '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~' ELSE CustomerAlternateKey END, CASE WHEN Title = '' THEN NULL WHEN Title LIKE '%|%' OR Title LIKE '%' + CHAR(9) + '%' OR Title LIKE '%' + CHAR(10) + '%' OR Title LIKE '%' + CHAR(13) + '%' THEN NULL ELSE Title END,CASE WHEN FirstName = '' THEN NULL WHEN FirstName LIKE '%|%' OR FirstName LIKE '%' + CHAR(9) + '%' OR FirstName LIKE '%' + CHAR(10) + '%' OR FirstName LIKE '%' + CHAR(13) + '%' THEN NULL ELSE FirstName END,CASE WHEN MiddleName = '' THEN NULL WHEN MiddleName LIKE '%|%' OR MiddleName LIKE '%' + CHAR(9) + '%' OR MiddleName LIKE '%' + CHAR(10) + '%' OR MiddleName LIKE '%' + CHAR(13) + '%' THEN NULL ELSE MiddleName END, CASE WHEN LastName = '' THEN NULL WHEN LastName LIKE '%|%' OR LastName LIKE '%' + CHAR(9) + '%' OR LastName LIKE '%' + CHAR(10) + '%' OR LastName LIKE '%' + CHAR(13) + '%' THEN NULL ELSE LastName END,NameStyle,BirthDate,CASE WHEN MaritalStatus = '' THEN NULL WHEN MaritalStatus LIKE '%|%' OR MaritalStatus LIKE '%' + CHAR(9) + '%' OR MaritalStatus LIKE '%' + CHAR(10) + '%' OR MaritalStatus LIKE '%' + CHAR(13) + '%' THEN NULL ELSE MaritalStatus END,CASE WHEN Suffix = '' THEN NULL WHEN Suffix LIKE '%|%' OR Suffix LIKE '%' + CHAR(9) + '%' OR Suffix LIKE '%' + CHAR(10) + '%' OR Suffix LIKE '%' + CHAR(13) + '%' THEN NULL ELSE Suffix END,CASE WHEN Gender = '' THEN NULL WHEN Gender LIKE '%|%' OR Gender LIKE '%' + CHAR(9) + '%' OR Gender LIKE '%' + CHAR(10) + '%' OR Gender LIKE '%' + CHAR(13) + '%' THEN NULL ELSE Gender END,CASE WHEN EmailAddress = '' THEN NULL WHEN EmailAddress LIKE '%|%' OR EmailAddress LIKE '%' + CHAR(9) + '%' OR EmailAddress LIKE '%' + CHAR(10) + '%' OR EmailAddress LIKE '%' + CHAR(13) + '%' THEN NULL ELSE EmailAddress END, YearlyIncome,TotalChildren,NumberChildrenAtHome, CASE WHEN EnglishEducation = '' THEN NULL WHEN EnglishEducation LIKE '%|%' OR EnglishEducation LIKE '%' + CHAR(9) + '%' OR EnglishEducation LIKE '%' + CHAR(10) + '%' OR EnglishEducation LIKE '%' + CHAR(13) + '%' THEN NULL ELSE EnglishEducation END,CASE WHEN SpanishEducation = '' THEN NULL WHEN SpanishEducation LIKE '%|%' OR SpanishEducation LIKE '%' + CHAR(9) + '%' OR SpanishEducation LIKE '%' + CHAR(10) + '%' OR SpanishEducation LIKE '%' + CHAR(13) + '%' THEN NULL ELSE SpanishEducation END,CASE WHEN FrenchEducation = '' THEN NULL WHEN FrenchEducation LIKE '%|%' OR FrenchEducation LIKE '%' + CHAR(9) + '%' OR FrenchEducation LIKE '%' + CHAR(10) + '%' OR FrenchEducation LIKE '%' + CHAR(13) + '%' THEN NULL ELSE FrenchEducation END,CASE WHEN EnglishOccupation = '' THEN NULL WHEN EnglishOccupation LIKE '%|%' OR EnglishOccupation LIKE '%' + CHAR(9) + '%' OR EnglishOccupation LIKE '%' + CHAR(10) + '%' OR EnglishOccupation LIKE '%' + CHAR(13) + '%' THEN NULL ELSE EnglishOccupation END,CASE WHEN SpanishOccupation = '' THEN NULL WHEN SpanishOccupation LIKE '%|%' OR SpanishOccupation LIKE '%' + CHAR(9) + '%' OR SpanishOccupation LIKE '%' + CHAR(10) + '%' OR SpanishOccupation LIKE '%' + CHAR(13) + '%' THEN NULL ELSE SpanishOccupation END,CASE WHEN FrenchOccupation = '' THEN NULL WHEN FrenchOccupation LIKE '%|%' OR FrenchOccupation LIKE '%' + CHAR(9) + '%' OR FrenchOccupation LIKE '%' + CHAR(10) + '%' OR FrenchOccupation LIKE '%' + CHAR(13) + '%' THEN NULL ELSE FrenchOccupation END,CASE WHEN HouseOwnerFlag = '' THEN NULL WHEN HouseOwnerFlag LIKE '%|%' OR HouseOwnerFlag LIKE '%' + CHAR(9) + '%' OR HouseOwnerFlag LIKE '%' + CHAR(10) + '%' OR HouseOwnerFlag LIKE '%' + CHAR(13) + '%' THEN NULL ELSE HouseOwnerFlag END,NumberCarsOwned,CASE WHEN AddressLine1 = '' THEN NULL WHEN AddressLine1 LIKE '%|%' OR AddressLine1 LIKE '%' + CHAR(9) + '%' OR AddressLine1 LIKE '%' + CHAR(10) + '%' OR AddressLine1 LIKE '%' + CHAR(13) + '%' THEN NULL ELSE AddressLine1 END,CASE WHEN AddressLine2 = '' THEN NULL WHEN AddressLine2 LIKE '%|%' OR AddressLine2 LIKE '%' + CHAR(9) + '%' OR AddressLine2 LIKE '%' + CHAR(10) + '%' OR AddressLine2 LIKE '%' + CHAR(13) + '%' THEN NULL ELSE AddressLine2 END,CASE WHEN Phone = '' THEN NULL WHEN Phone LIKE '%|%' OR Phone LIKE '%' + CHAR(9) + '%' OR Phone LIKE '%' + CHAR(10) + '%' OR Phone LIKE '%' + CHAR(13) + '%' THEN NULL ELSE Phone END,DateFirstPurchase,CASE WHEN CommuteDistance = '' THEN NULL WHEN CommuteDistance LIKE '%|%' OR CommuteDistance LIKE '%' + CHAR(9) + '%' OR CommuteDistance LIKE '%' + CHAR(10) + '%' OR CommuteDistance LIKE '%' + CHAR(13) + '%' THEN NULL ELSE CommuteDistance END FROM AdventureWorksDW2008R2.dbo.DimCustomer WITH (NOLOCK) OPTION (MAXDOP 1);" queryOut C:\bcp_output\AdventureWorksDW2008R2_dbo_DimCustomer.dat -SSQLFOOL -T -t"|" -c -C; |
I don’t know about you, but that would take me a bit longer to write manually than to execute the script above.
One note: I’ve found that copying this code into a batch file will actually render the CHAR functions, i.e. CHAR(13) will be replaced with a carriage return in the script. To avoid this, copy and paste the BCP script directly into your command window.
Not familiar with BCP? Refer to my blog post on BCP Basics to help get you started.









