Row Concatenation in T-SQL

Have you ever needed to create a comma-delimited list of related records, grouped by a parent record? This is one of those business requirements that just happens to pop up every so often. For example, turning this:

AuthorID    BookName
1 “Great Expectations”
1 “David Copperfield”
1 “Bleak House”
2 “Divine Comedy”
3 “Pride and Prejudice”
3 “Mansfield Park”
   

into this:

AuthorID    ListOfBooks
1 “Great Expectations”, “David Copperfield”, “Bleak House”
2 “Divine Comedy”
3 “Pride and Prejudice”, “Mansfield Park”
   

There’s a handful of ways you can handle this, especially when dealing with small data sets. However, the problem becomes a bit more tricky when dealing with large record sets (i.e. hundreds of thousands or even millions of records). My first attempt at a solution used a CTE (common table expression). It did the job but was very slow. Looking for a better solution, I discovered the XML method.

To give you a quick example:

/* Create a table variable to play with */
Declare @myTable Table 
    (customerID int, textData varchar(10));
 
/* Populate some test rescords */
Insert Into @myTable
Select 1, 'abc' Union All
Select 1, 'def' Union All
Select 2, 'uvw' Union All
Select 2, 'xyz'
 
/* Just take a look at the data
   before we format it */
Select * From @myTable;
 
/* Let's take a look at what
   For XML Raw will return 
   for us */
Select textData
From @myTable
Where customerID = 1
Order By textData
For XML Raw;
 
/* Now consolidate the data, using
   the For XML Raw option to 
   concatenate the textData column */
Select customerID
        , Replace( Replace( Replace(
            (   Select textData
                From @myTable As a
                Where a.customerID = b.customerID
                Order By textData
                For XML Raw)
                , '"/><row textData="', ', ')
                , '<row textData="', '')
                , '"/>', '')
            As 'textData'
From @myTable b
Group By customerID;

This has become my default method for handling this report requirement. While discussing this with a colleague, he mentioned using an approach that was similar in design but used Cross Apply on the XML. Wanting to see which performed better, I ran the two following queries in the AdventureWorks sample database (2008):

/* Method 1 */
Select ProductsOrdered
    , Count(*) As 'salesOrders'
From (
    Select SalesOrderID
            , Replace( Replace( Replace(
                (   Select Top 10 ProductID
                    From Sales.SalesOrderDetail As sod With (NoLock)
                    Where soh.SalesOrderID = sod.SalesOrderID
                    Order By ProductID
                    For XML Raw)
                    , '"/><row ProductID="', ', ')
                    , '<row ProductID="', '')
                    , '"/>', '')
                As 'ProductsOrdered'
    From Sales.SalesOrderHeader As soh With (NoLock)
) x
Group By ProductsOrdered
Order By Count(*) Desc
Option (MaxDop 1);
 
/* Method 2 */
Select ProductsOrdered
    , Count(*) As 'salesOrders'
From (
    Select SalesOrderID
        , SubString(ProductsOrdered, 1, Len(ProductsOrdered) - 1) 
              As 'ProductsOrdered'
    From Sales.SalesOrderHeader As soh With (NoLock)
    Cross Apply (
                    Select Top 10 
                        Cast(ProductID As varchar(10)) + ', '
                    From Sales.SalesOrderDetail As sod With (NoLock)
                    Where sod.SalesOrderID = soh.SalesOrderID
                    Order By ProductID
                    For XML Path('')
                ) X(ProductsOrdered)
) x
Group By ProductsOrdered
Order By Count(*) Desc;

Here’s the results: (click to enlarge)

Row Concatenation Comparison

Row Concatenation Comparison

As evidenced in the image above, the first method has slightly higher CPU and double the duration. The 2nd method had almost double the writes and significantly more reads.

I was hoping for a clear winner, but apparently each method has its benefits. I’ll probably continue to stick with my original, more resource-friendly method (Method 1), unless someone suggests a better solution. :)

0saves
If you enjoyed this post, please consider leaving a comment or subscribing to the RSS feed to have future articles delivered to your feed reader.
Tagged , . Bookmark the permalink.

2 Responses to Row Concatenation in T-SQL

  1. raymond lew says:

    Hello Fool, thank you this page and the method. I have been making good use of this until I hit some quotes. The quotes get transformed into " for me; not what I want. I tweaked the query a bit and came up with xml path (”) which deals with the quotes.

    Here’s my working code:
    /* Create a table variable to play with */
    DECLARE @myTable TABLE
    (customerID INT, textData VARCHAR(10));

    /* Populate some test rescords */
    INSERT INTO @myTable
    SELECT 1, ‘a”"b”c’ UNION All
    SELECT 1, ‘de”></\f’ UNION All
    SELECT 2, ‘uvw’ UNION All
    SELECT 2, ‘xyz’

    /* Just take a look at the data
    before we format it */
    –SELECT * FROM @myTable;

    /* Let’s take a look at what
    For XML Raw will return
    for us */
    SELECT textData AS ‘X’
    FROM @myTable
    WHERE customerID = 1
    ORDER BY textData
    FOR XML raw;

    SELECT textData as ‘X’,textData as ‘X’
    FROM @myTable
    WHERE customerID = 1
    ORDER BY textData
    FOR XML path(”);

    /* Now consolidate the data, using
    the For XML Raw option to
    concatenate the textData column */
    SELECT customerID
    , REPLACE( REPLACE( REPLACE(
    ( SELECT textData AS ‘X’
    FROM @myTable AS a
    WHERE a.customerID = b.customerID
    ORDER BY textData
    FOR XML path(”) )
    , ”, ‘, ‘)
    , ”, ”)
    , ”, ”)
    AS ‘textData’
    FROM @myTable b
    GROUP BY customerID;

    If you run this, you’ll see that ‘>’ and ‘<’ are still problems. I suppose a couple more replaces will deal with these. Or there may be some kind of FOR XML hack.

    Anyhow, thank you very much

    mondo

  2. Matt says:

    Thank you so much. I have been searching for a “duckies and bunnies” version of this so I could understand it. MySQL and the GROUP_CONCAT thing is a touch easier on my head. Much appreciated.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>