Ward Pond's SQL Server blog

Ruminating on issues pertinent to the design and development of sound databases and processes under Microsoft SQL Server 2008, SQL Server 2005, and SQL Server 2000 (while reserving the right to vent about anything else that's on my mind)

Database Programming: A New String Concatenation Contender for SQL Server 2008

Database Programming: A New String Concatenation Contender for SQL Server 2008

  • Comments 9
  • Likes

UPDATED 11:11p 7 September 2009: Based on my latest exchange with Adam (see the comments for this post), compound operators should be avoided for string concatenation due to the ORDER BY bug discused in KB 287515.  The XML variant below is the preferred approach for string concatenation, but the compound operator remains a viable alternative in other scenarios.

UPDATED 7 September 2009: See Adam Machanic's comments on this post for a link to a discussion of a SQL Server 2008 ORDER BY bug (KB 287515) which could influence the sequencing of your results under certain scenarios.  The XML variant might be better after all (the ORDER BY issue is something of an edge case (functions applied to ORDER BY columns) and should be repaired eventually, but the current risk is non-trivial); compound operators are still of use in scenarios where sequencing of results is not an issue (aggregates, etc.).

Back in March of last year, we had a discussion which concluded here (and was briefly revisited here) regarding string concatenation techniques.  I’ve uncovered a new SQL Server 2008-based contender which I’d like to share.

This approach is based on the new-in-SQL-Server-2008 += compound operator, which allows all order of iterative processing (copied from linked BOL article):

Operator Link to more information Action

+=

+= (Add EQUALS) (Transact-SQL)

Adds some amount to the original value and sets the original value to the result.

-=

-= (Subtract EQUALS) (Transact-SQL)

Subtracts some amount from the original value and sets the original value to the result.

*=

*= (Multiply EQUALS) (Transact-SQL)

Multiplies by an amount and sets the original value to the result.

/=

/= (Divide EQUALS) (Transact-SQL)

Divides by an amount and sets the original value to the result.

%=

%= (Modulo EQUALS) (Transact-SQL)

Divides by an amount and sets the original value to the modulo.

&=

&= (Bitwise AND EQUALS) (Transact-SQL)

Performs a bitwise AND and sets the original value to the result.

^=

^= (Bitwise Exclusive OR EQUALS) (Transact-SQL)

Performs a bitwise exclusive OR and sets the original value to the result.

|=

|= (Bitwise OR EQUALS) (Transact-SQL)

Performs a bitwise OR and sets the original value to the result.

Since the use of compound operators restricts us to SQL Server 2008, the snippet below exploits the also-new-in-SQL-Server-2008 row constructor capability (scroll to Example B in the latest BOL examples for the INSERT statement).  In my testing, nine distinct INSERT statements each consumed 6% of the queries resources; the two statements below were each 9% of the total query cost.  This is a pretty impressive economy for such a small bed of data, so those of you writing long seed scripts should definitely take this into account!

Two uses of row constructors as well as the use of the compound operator are highlighted in the code snippet below (the previous XML-based contender is also included for reference):


--  lay the groundwork
 
DECLARE @Results NVARCHAR(4000)
 
--      drop and create the tables
IF EXISTS (SELECT 1 FROM sys.tables WHERE name = 'Parent')
    DROP TABLE Parent
 
CREATE TABLE Parent
(
    ParentID        INT 
   ,ParentString    VARCHAR(100)
)
IF EXISTS (SELECT 1 FROM sys.tables WHERE name = 'Child')
    DROP TABLE Child
 
CREATE TABLE Child
(
    ChildID     INT 
   ,ParentID    INT
)
 
--      populate the tables
--      since this is SQL Server 2008 only, we can use row constructors
INSERT Parent VALUES
    (1, 'Parent 1 String'),
    (2, 'Parent 2 String'),
    (3, 'Parent 3 String')

INSERT Child VALUES
    (1, 1),
    (2, 1),
    (2, 2),
    (2, 3),
    (3, 1),
    (3, 3)
 
--      show the data
 
SELECT  Child.ChildID, ISNULL(Parent.ParentString, '')
FROM    Child
INNER   JOIN Parent
ON      Child.ParentID = Parent.ParentID
ORDER BY
        Child.ChildID, Child.ParentID
 
-- Concatenate a string via the new compound operator
-- Pivot Parent values into 1 column for 1 base row
 
SET @Results = N''
 
SELECT  @Results += ',' + Parent.ParentString
FROM    Child
JOIN    Parent
ON      Child.ParentID = Parent.ParentID
WHERE   Child.ChildID = 2  -- must specify 1 base row.  could be a correlated subquery
ORDER BY
        Child.ParentID
 
--  Two ways to display the result without the leading comma
SELECT RIGHT(@Results,LEN(@Results)-1)
SELECT STUFF(@Results,1,1,'')
 
-- Concatenate via XML
-- Pivot Parent values into 1 column for 1 base row
SELECT  STUFF(( SELECT [text()]= ',' + ISNULL(Parent.ParentString, '') + ''
FROM    Child
JOIN    Parent
ON      Child.ParentID = Parent.ParentID
WHERE   Child.ChildID = 2  -- must specify 1 base row.  could be a correlated subquery
ORDER BY
        Child.ParentID
FOR XML PATH('')), 1,1, '') AS Parent_CSV

The query plans for each query that the XML-based call includes a statistically insignificant (at this volume) UDX call for the XML instantiation:

Query Plan For Compound Operator Variant

query plan for compound operator variant

 

Query Plan For XML Variant

query plan for XML variant

For this reason, I’d expect the compound operator-based query to scale better with a large volume of data.  Once one becomes familiar with compound operators, which are commonplace in nGL languages, this construction strikes me as both more maintainable and more elegant.

On a related note, I continue to find the subtle programmability enhancements to SQL Server 2008 to be very cool.

Of course, these are my biases.  What do you think?

-wp


this copyrighted material was originally posted at http://blogs.technet.com/wardpond. 

the author and his employer are pleased to provide this content for you at that site, and via rss, free of charge and without advertising.

the author welcomes and appreciates links to and citations of his work.  however, if you are viewing the full text of this article at any other website, be aware that its author does not endorse and is not compensated by any advertising or access fees you may be subjected to outside the original web and rss sites

Comments
  • Hi Ward,

    Nothing really "new" here except the syntax. We call that the "aggregate concatenation" trick. Google and you'll see lots of stuff. Look especially for posts by Steve Kass. What you'll find is that the behavior is undefined and your results are not guaranteed, by the query processor, to comply to what you specified in the ORDER BY clause.

    The order IS, in fact, guaranteed when using FOR XML, the behavior of that operator is documented and well-defined, and in my tests they're equally fast. So, no contest if you ask me.

  • I did the Googling for you. Found a KB article on the topic, describing the ORDER BY issue:

    http://support.microsoft.com/default.aspx?scid=287515

  • ... and sorry for the three posts, but I just tested and this is certainly not fixed in SQL Server 2008, even with the new syntax... try the following with and without the ORDER BY (and with different ORDER BYs):

    DECLARE @temp table([id] int);

    INSERT INTO @temp VALUES(3);

    INSERT INTO @temp VALUES(2);

    INSERT INTO @temp VALUES(1);

    DECLARE @i varchar(50);

    SET @i = '';

    SELECT

    @i += convert(varchar, [id])

    FROM @temp

    ORDER BY

    LTRIM(RTRIM([id]));

    PRINT @i;

  • @Adam: As always, thanks for your comments, no matter how you format them.  I've updated the top of the post with what I hope is an accurate synopsis of your concerns.

    If I'm reading the KB article correctly, the bug only manifests itself when functions are used on columns in the scope of the ORDER BY statement.  This approach wouldn't be a best practice (we couldn't use the order of an index to satisfy an ORDER BY; even an index on the necessary column would need to be scanned due to the presence of the functions), and there are several workaround (in addition to the workaround discussed in the article, an index on a calculated, persisted column matching the expression would also solve the problem if that column were used in the ORDER BY).

    In the context of the original post, all of this likely mitigates in favor of use of the XML variant for string concatenation, at least until the ORDER BY issue is resolved.

    It seems to me that compound operators are still useful for scenarios where order is immaterial (aggregates, etc., although alternate syntax certainly also exists in these cases).

    Do you agree, Adam, or do you stay away from compound operators altogether?

        -wp

  • Hi Ward,

    I have no issue w/ compound operators--I think they're a useful addition to T-SQL and see no reason to avoid them (unless you have to write code that has any chance of being back-ported to 2005, etc). The aggregate concatenation trick is, in my opinion, not something that should be used unless the QP team decides to modify the query engine to fully and deterministically support it. The issue, as far as I know, is not limited to cases where there is a function used in the ORDER BY--that's just an easy way to show the problem. There is simply no guarantee that it will work.

    And will the issue indeed be fixed eventually? I don't know, but I certainly don't see that on the horizon; this "feature" has been heavily discussed since the SQL Server 7.0 time frame. I've been an MVP since 2004 and have seen numerous discussions on the topic since that time in the private MVP newsgroup--many of which included members of the QO and QP teams--and never has there been any indication that anyone was planning to fix the issue. I'm sure there were many discussions before I showed up. So this is something they've known about for 10+ years and four major versions. I believe that the primary argument against a fix is that it's not a "fix" at all since this behavior is undocumented. So there's no bug; a "fix" would actually be an enhancement. And with the supported and much more flexible FOR XML PATH workaround, plus the ability to do aggregate concatenation using SQLCLR, there is no need for such an enhancement.

  • @Adam: you've obviously got way more context around the history of this issue that I do.  Under the circimstances, I agrree that the XML variant is unambiguously preferable for ordered concatenation.

    Now..  how 'bout them row constructors? :)

        -wp

  • I know I might be resuscitating an old post however I would like to pose a problem I find to you guys. I prefer the FOR XML approach when concatenating strings. However my data sometimes contains non-xml friendly characters like &, > and < which are translated to & etc. Is there a neat way to avoid this from happening?  I would like to avoid the REPLACE (result,'&','&') as there are multiple "special" characters.

  • Answering my own question: [Promise I had googled this a zillion times]

    from: sqlblog.com/.../handling-special-characters-with-for-xml-path.aspx

    select

    stuff(

       (select ', <' + name + '>'

       from sys.databases

       where database_id > 4

       order by name

       for xml path(''), type

       ).value('(./text())[1]','varchar(max)')

     , 1, 2, '') as namelist;

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment