I’m currently working on the logic migration of data marts from SQL Server to Teradata. While another team is working on the actual table migration, it’s still helpful to have some data in the tables for developing against. The easiest method I’ve found to do this is to use BCP to export some sample data. So of course, I’ve created a SQL script that will generate the BCP code for me. Because that’s what I like to do on Sunday evenings.
-- User-defined variables -- DECLARE @tableToBCP NVARCHAR(128) = 'AdventureWorksDW2008R2.dbo.DimCustomer' , @Top VARCHAR(10) = NULL -- Leave NULL for all rows , @Delimiter VARCHAR(4) = '|' , @UseNULL BIT = 1 , @OverrideChar CHAR(1) = '~' , @MaxDop CHAR(1) = '1' , @Directory VARCHAR(256) = 'C:\bcp_output\'; -- Script-defined variables -- DECLARE @columnList TABLE (columnID INT); DECLARE @bcpStatement NVARCHAR(MAX) = 'BCP "SELECT ' , @currentID INT , @firstID INT; INSERT INTO @columnList SELECT column_id FROM sys.columns WHERE object_id = OBJECT_ID(@tableToBCP) ORDER BY column_id; IF @Top IS NOT NULL SET @bcpStatement = @bcpStatement + 'TOP (' + @Top + ') '; SELECT @firstID = MIN(columnID) FROM @columnList; WHILE EXISTS(SELECT * FROM @columnList) BEGIN SELECT @currentID = MIN(columnID) FROM @columnList; IF @currentID <> @firstID SET @bcpStatement = @bcpStatement + ','; SELECT @bcpStatement = @bcpStatement + name FROM sys.columns WHERE object_id = OBJECT_ID(@tableToBCP) AND column_id = @currentID; DELETE FROM @columnList WHERE columnID = @currentID; END; SET @bcpStatement = @bcpStatement + ' FROM ' + @tableToBCP + ' WITH (NOLOCK) OPTION (MAXDOP 1);" queryOut ' + @Directory + REPLACE(@tableToBCP, '.', '_') + '.dat -S' + @@SERVERNAME + ' -T -t"' + @Delimiter + '" -c -C;' SELECT @bcpStatement; |
This will generate a standard BCP script:
BCP "SELECT CustomerKey,GeographyKey,CustomerAlternateKey,Title,FirstName,MiddleName, LastName,NameStyle,BirthDate,MaritalStatus,Suffix,Gender,EmailAddress,YearlyIncome, TotalChildren,NumberChildrenAtHome,EnglishEducation,SpanishEducation,FrenchEducation, EnglishOccupation,SpanishOccupation,FrenchOccupation,HouseOwnerFlag,NumberCarsOwned, AddressLine1,AddressLine2,Phone,DateFirstPurchase,CommuteDistance FROM AdventureWorksDW2008R2.dbo.DimCustomer WITH (NOLOCK) OPTION (MAXDOP 1);" queryOut C:\bcp_output\AdventureWorksDW2008R2_dbo_DimCustomer.dat -SSQLFOOL -T -t"|" -c -C; |
However, I’ve been running into some issues with the data load. See, the thing… I’m finding that some of my tables don’t… that is to say… they may possibly contain… [whisper]uncleansed data[/whisper]. I know, I know… this may come as a shock to many of you, and all I ask is that you please don’t judge me for it. 😉
What do I mean by “uncleansed data?” I mostly mean user-inputted VARCHAR columns that contain pipes (|), tabs, carriage returns, and line feeds. These types of characters tend to mess with the data import process. Also, I’ve not yet found a way to import a data file into Teradata where a non-nullable character column contains an empty string (”). Obviously, the vast majority of the data is fine, but even one of these issues can throw an error during the import process. I’ve modified the script above to handle these specific exceptions.
Since I’m only using this data for testing purposes, I found it pretty easy to simply replace the offending records with ~. I’m not sure if anyone else has a need for this particular script, but I figured you could modify it pretty easily to do whatever you need.
-- User-defined variables -- DECLARE @tableToBCP NVARCHAR(128) = 'AdventureWorksDW2008R2.dbo.DimCustomer' , @Top VARCHAR(10) = NULL -- Leave NULL for all rows , @Delimiter VARCHAR(4) = '|' , @UseNULL BIT = 1 , @OverrideChar CHAR(1) = '~' , @MaxDop CHAR(1) = '1' , @Directory VARCHAR(256) = 'C:\bcp_output\'; -- Script-defined variables -- DECLARE @columnList TABLE (columnID INT); DECLARE @bcpStatement NVARCHAR(MAX) = 'BCP "SELECT ' , @currentID INT , @firstID INT; INSERT INTO @columnList SELECT column_id FROM sys.columns WHERE object_id = OBJECT_ID(@tableToBCP) ORDER BY column_id; IF @Top IS NOT NULL SET @bcpStatement = @bcpStatement + 'TOP (' + @Top + ') '; SELECT @firstID = MIN(columnID) FROM @columnList; WHILE EXISTS(SELECT * FROM @columnList) BEGIN SELECT @currentID = MIN(columnID) FROM @columnList; IF @currentID <> @firstID SET @bcpStatement = @bcpStatement + ','; SELECT @bcpStatement = @bcpStatement + CASE WHEN user_type_id IN (231, 167, 175, 239) THEN 'CASE WHEN ' + name + ' = '''' THEN ' + CASE WHEN is_nullable = 1 THEN 'NULL' ELSE '''' + REPLICATE(@OverrideChar, max_length) + '''' END + ' WHEN ' + name + ' LIKE ''%' + @Delimiter + '%''' + ' OR ' + name + ' LIKE ''%'' + CHAR(9) + ''%''' -- tab + ' OR ' + name + ' LIKE ''%'' + CHAR(10) + ''%''' -- line feed + ' OR ' + name + ' LIKE ''%'' + CHAR(13) + ''%''' -- carriage return + ' THEN ' + CASE WHEN is_nullable = 1 THEN 'NULL' ELSE '''' + REPLICATE(@OverrideChar, max_length) + '''' END + ' ELSE ' + name + ' END' ELSE name END FROM sys.columns WHERE object_id = OBJECT_ID(@tableToBCP) AND column_id = @currentID; DELETE FROM @columnList WHERE columnID = @currentID; END; SET @bcpStatement = @bcpStatement + ' FROM ' + @tableToBCP + ' WITH (NOLOCK) OPTION (MAXDOP 1);" queryOut ' + @Directory + REPLACE(@tableToBCP, '.', '_') + '.dat -S' + @@SERVERNAME + ' -T -t"' + @Delimiter + '" -c -C;' SELECT @bcpStatement; |
The sample output of this would look like:
BCP "SELECT CustomerKey,GeographyKey,CASE WHEN CustomerAlternateKey = '' THEN '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~' WHEN CustomerAlternateKey LIKE '%|%' OR CustomerAlternateKey LIKE '%' + CHAR(9) + '%' OR CustomerAlternateKey LIKE '%' + CHAR(10) + '%' OR CustomerAlternateKey LIKE '%' + CHAR(13) + '%' THEN '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~' ELSE CustomerAlternateKey END, CASE WHEN Title = '' THEN NULL WHEN Title LIKE '%|%' OR Title LIKE '%' + CHAR(9) + '%' OR Title LIKE '%' + CHAR(10) + '%' OR Title LIKE '%' + CHAR(13) + '%' THEN NULL ELSE Title END,CASE WHEN FirstName = '' THEN NULL WHEN FirstName LIKE '%|%' OR FirstName LIKE '%' + CHAR(9) + '%' OR FirstName LIKE '%' + CHAR(10) + '%' OR FirstName LIKE '%' + CHAR(13) + '%' THEN NULL ELSE FirstName END,CASE WHEN MiddleName = '' THEN NULL WHEN MiddleName LIKE '%|%' OR MiddleName LIKE '%' + CHAR(9) + '%' OR MiddleName LIKE '%' + CHAR(10) + '%' OR MiddleName LIKE '%' + CHAR(13) + '%' THEN NULL ELSE MiddleName END, CASE WHEN LastName = '' THEN NULL WHEN LastName LIKE '%|%' OR LastName LIKE '%' + CHAR(9) + '%' OR LastName LIKE '%' + CHAR(10) + '%' OR LastName LIKE '%' + CHAR(13) + '%' THEN NULL ELSE LastName END,NameStyle,BirthDate,CASE WHEN MaritalStatus = '' THEN NULL WHEN MaritalStatus LIKE '%|%' OR MaritalStatus LIKE '%' + CHAR(9) + '%' OR MaritalStatus LIKE '%' + CHAR(10) + '%' OR MaritalStatus LIKE '%' + CHAR(13) + '%' THEN NULL ELSE MaritalStatus END,CASE WHEN Suffix = '' THEN NULL WHEN Suffix LIKE '%|%' OR Suffix LIKE '%' + CHAR(9) + '%' OR Suffix LIKE '%' + CHAR(10) + '%' OR Suffix LIKE '%' + CHAR(13) + '%' THEN NULL ELSE Suffix END,CASE WHEN Gender = '' THEN NULL WHEN Gender LIKE '%|%' OR Gender LIKE '%' + CHAR(9) + '%' OR Gender LIKE '%' + CHAR(10) + '%' OR Gender LIKE '%' + CHAR(13) + '%' THEN NULL ELSE Gender END,CASE WHEN EmailAddress = '' THEN NULL WHEN EmailAddress LIKE '%|%' OR EmailAddress LIKE '%' + CHAR(9) + '%' OR EmailAddress LIKE '%' + CHAR(10) + '%' OR EmailAddress LIKE '%' + CHAR(13) + '%' THEN NULL ELSE EmailAddress END, YearlyIncome,TotalChildren,NumberChildrenAtHome, CASE WHEN EnglishEducation = '' THEN NULL WHEN EnglishEducation LIKE '%|%' OR EnglishEducation LIKE '%' + CHAR(9) + '%' OR EnglishEducation LIKE '%' + CHAR(10) + '%' OR EnglishEducation LIKE '%' + CHAR(13) + '%' THEN NULL ELSE EnglishEducation END,CASE WHEN SpanishEducation = '' THEN NULL WHEN SpanishEducation LIKE '%|%' OR SpanishEducation LIKE '%' + CHAR(9) + '%' OR SpanishEducation LIKE '%' + CHAR(10) + '%' OR SpanishEducation LIKE '%' + CHAR(13) + '%' THEN NULL ELSE SpanishEducation END,CASE WHEN FrenchEducation = '' THEN NULL WHEN FrenchEducation LIKE '%|%' OR FrenchEducation LIKE '%' + CHAR(9) + '%' OR FrenchEducation LIKE '%' + CHAR(10) + '%' OR FrenchEducation LIKE '%' + CHAR(13) + '%' THEN NULL ELSE FrenchEducation END,CASE WHEN EnglishOccupation = '' THEN NULL WHEN EnglishOccupation LIKE '%|%' OR EnglishOccupation LIKE '%' + CHAR(9) + '%' OR EnglishOccupation LIKE '%' + CHAR(10) + '%' OR EnglishOccupation LIKE '%' + CHAR(13) + '%' THEN NULL ELSE EnglishOccupation END,CASE WHEN SpanishOccupation = '' THEN NULL WHEN SpanishOccupation LIKE '%|%' OR SpanishOccupation LIKE '%' + CHAR(9) + '%' OR SpanishOccupation LIKE '%' + CHAR(10) + '%' OR SpanishOccupation LIKE '%' + CHAR(13) + '%' THEN NULL ELSE SpanishOccupation END,CASE WHEN FrenchOccupation = '' THEN NULL WHEN FrenchOccupation LIKE '%|%' OR FrenchOccupation LIKE '%' + CHAR(9) + '%' OR FrenchOccupation LIKE '%' + CHAR(10) + '%' OR FrenchOccupation LIKE '%' + CHAR(13) + '%' THEN NULL ELSE FrenchOccupation END,CASE WHEN HouseOwnerFlag = '' THEN NULL WHEN HouseOwnerFlag LIKE '%|%' OR HouseOwnerFlag LIKE '%' + CHAR(9) + '%' OR HouseOwnerFlag LIKE '%' + CHAR(10) + '%' OR HouseOwnerFlag LIKE '%' + CHAR(13) + '%' THEN NULL ELSE HouseOwnerFlag END,NumberCarsOwned,CASE WHEN AddressLine1 = '' THEN NULL WHEN AddressLine1 LIKE '%|%' OR AddressLine1 LIKE '%' + CHAR(9) + '%' OR AddressLine1 LIKE '%' + CHAR(10) + '%' OR AddressLine1 LIKE '%' + CHAR(13) + '%' THEN NULL ELSE AddressLine1 END,CASE WHEN AddressLine2 = '' THEN NULL WHEN AddressLine2 LIKE '%|%' OR AddressLine2 LIKE '%' + CHAR(9) + '%' OR AddressLine2 LIKE '%' + CHAR(10) + '%' OR AddressLine2 LIKE '%' + CHAR(13) + '%' THEN NULL ELSE AddressLine2 END,CASE WHEN Phone = '' THEN NULL WHEN Phone LIKE '%|%' OR Phone LIKE '%' + CHAR(9) + '%' OR Phone LIKE '%' + CHAR(10) + '%' OR Phone LIKE '%' + CHAR(13) + '%' THEN NULL ELSE Phone END,DateFirstPurchase,CASE WHEN CommuteDistance = '' THEN NULL WHEN CommuteDistance LIKE '%|%' OR CommuteDistance LIKE '%' + CHAR(9) + '%' OR CommuteDistance LIKE '%' + CHAR(10) + '%' OR CommuteDistance LIKE '%' + CHAR(13) + '%' THEN NULL ELSE CommuteDistance END FROM AdventureWorksDW2008R2.dbo.DimCustomer WITH (NOLOCK) OPTION (MAXDOP 1);" queryOut C:\bcp_output\AdventureWorksDW2008R2_dbo_DimCustomer.dat -SSQLFOOL -T -t"|" -c -C; |
I don’t know about you, but that would take me a bit longer to write manually than to execute the script above. 🙂
One note: I’ve found that copying this code into a batch file will actually render the CHAR functions, i.e. CHAR(13) will be replaced with a carriage return in the script. To avoid this, copy and paste the BCP script directly into your command window.
Not familiar with BCP? Refer to my blog post on BCP Basics to help get you started.
Nice to see an appreciation for BCP! Obviously you’re working with big data (like me).
I like your solution using dynamic sql too.
Talk about timing! I just got handed a BCP project, this will help so much! Thank you!
Thanks for this great script. Glad to see you are having a little more time to blog and I appreciate you sharing the scripts you develop – they are always useful. How are you finding Teradata stacks up against SQL Server for data marts? We use SQL Server for all of our data marts where I work but Teradata is always calling….
@Rowland Thanks!
@David Glad I could help! Good luck with the project. 🙂
@Todd Yeah, it’s hard to find time with a 5-month old, but he’s starting to sleep a little bit better at night, which means I’m markedly less zombie-like. Which hopefully means more blogging. 🙂 It’s still early in the data migration phase, but I can tell you that we’ve seen a huge performance improvement on the couple of marts that have already been migrated. The actual migration process is a whole other story… 🙂
Pingback: Something for the Weekend – SQL Server Links 20/04/12 - John Sansom SQL Server DBA
Hi Michelle, I too am a sort of SQL scripting junkie and I got over your article referenced from http://www.experts-exhange.com website. All good and such only that I hate loops of any king in SQL so I will share with you a different and simpler way to build your bcp statement without a loop. You probably know this trick but since in the article you used the loop here I go:
Instead of this code:
INSERT INTO @columnList
SELECT column_id
FROM sys.columns
WHERE object_id = OBJECT_ID(@tableToBCP)
ORDER BY column_id;
IF @Top IS NOT NULL
SET @bcpStatement = @bcpStatement + 'TOP (' + @Top + ') ';
SELECT @firstID = MIN(columnID) FROM @columnList;
WHILE EXISTS(SELECT * FROM @columnList)
BEGIN
SELECT @currentID = MIN(columnID) FROM @columnList;
IF @currentID @firstID
SET @bcpStatement = @bcpStatement + ',';
SELECT @bcpStatement = @bcpStatement + name
FROM sys.columns
WHERE object_id = OBJECT_ID(@tableToBCP)
AND column_id = @currentID;
DELETE FROM @columnList WHERE columnID = @currentID;
END;
use this:
-- build the columns list
SELECT
@bcpStatement = @bcpStatement + name + ','
FROM sys.columns
WHERE
object_id = OBJECT_ID(@tableToBCP)
order by
column_id
-- now get rid of the last comma
select
@bcpStatement=REPLACE(@bcpStatement+'#@$', ',#@$','')
Hi, Michelle,
I found your post looking into an issue we’re having in using BCP in SQL 2012 that appears to be related to parallelism. I see you’re using MAXDOP 1 as an option, and this setting resolves our issue. I haven’t found anything else online about this. Do you have any additional information or references on this?
Thanks.