Have you ever needed to create a comma-delimited list of related records, grouped by a parent record? This is one of those business requirements that just happens to pop up every so often. For example, turning this:
|3||“Pride and Prejudice”|
|1||“Great Expectations”, “David Copperfield”, “Bleak House”|
|3||“Pride and Prejudice”, “Mansfield Park”|
There’s a handful of ways you can handle this, especially when dealing with small data sets. However, the problem becomes a bit more tricky when dealing with large record sets (i.e. hundreds of thousands or even millions of records). My first attempt at a solution used a CTE (common table expression). It did the job but was very slow. Looking for a better solution, I discovered the XML method.
To give you a quick example:
/* Create a table variable to play with */ Declare @myTable Table (customerID int, textData varchar(10)); /* Populate some test rescords */ Insert Into @myTable Select 1, 'abc' Union All Select 1, 'def' Union All Select 2, 'uvw' Union All Select 2, 'xyz' /* Just take a look at the data before we format it */ Select * From @myTable; /* Let's take a look at what For XML Raw will return for us */ Select textData From @myTable Where customerID = 1 Order By textData For XML Raw; /* Now consolidate the data, using the For XML Raw option to concatenate the textData column */ Select customerID , Replace( Replace( Replace( ( Select textData From @myTable As a Where a.customerID = b.customerID Order By textData For XML Raw) , '"/><row textData="', ', ') , '<row textData="', '') , '"/>', '') As 'textData' From @myTable b Group By customerID;
This has become my default method for handling this report requirement. While discussing this with a colleague, he mentioned using an approach that was similar in design but used Cross Apply on the XML. Wanting to see which performed better, I ran the two following queries in the AdventureWorks sample database (2008):
/* Method 1 */ Select ProductsOrdered , Count(*) As 'salesOrders' From ( Select SalesOrderID , Replace( Replace( Replace( ( Select Top 10 ProductID From Sales.SalesOrderDetail As sod With (NoLock) Where soh.SalesOrderID = sod.SalesOrderID Order By ProductID For XML Raw) , '"/><row ProductID="', ', ') , '<row ProductID="', '') , '"/>', '') As 'ProductsOrdered' From Sales.SalesOrderHeader As soh With (NoLock) ) x Group By ProductsOrdered Order By Count(*) Desc Option (MaxDop 1); /* Method 2 */ Select ProductsOrdered , Count(*) As 'salesOrders' From ( Select SalesOrderID , SubString(ProductsOrdered, 1, Len(ProductsOrdered) - 1) As 'ProductsOrdered' From Sales.SalesOrderHeader As soh With (NoLock) Cross Apply ( Select Top 10 Cast(ProductID As varchar(10)) + ', ' From Sales.SalesOrderDetail As sod With (NoLock) Where sod.SalesOrderID = soh.SalesOrderID Order By ProductID For XML Path('') ) X(ProductsOrdered) ) x Group By ProductsOrdered Order By Count(*) Desc;
Here’s the results: (click to enlarge)
As evidenced in the image above, the first method has slightly higher CPU and double the duration. The 2nd method had almost double the writes and significantly more reads.
I was hoping for a clear winner, but apparently each method has its benefits. I’ll probably continue to stick with my original, more resource-friendly method (Method 1), unless someone suggests a better solution. 🙂