Ward Pond's SQL Server blog

Ruminating on issues pertinent to the design and development of sound databases and processes under Microsoft SQL Server 2008, SQL Server 2005, and SQL Server 2000 (while reserving the right to vent about anything else that's on my mind)

Database Programming: Applying APPLY -- Solving Parent Axis Access Performance Issues In XML Access In SQL Server 2005

Database Programming: Applying APPLY -- Solving Parent Axis Access Performance Issues In XML Access In SQL Server 2005

  • Comments 7
  • Likes

This little nugget will save you an incredible amount of time if you face the need to retrieve data from multiple levels of the same XML document in one SELECT statement.

 

The two methods we'll examine in this post are “parent axis access,” the “classic” method of addressing this requirement, and the new SQL Server 2005 CROSS APPLY operator (CROSS APPLY is fully described in the December, 2005 version of SQL Server 2005 Books OnLine at ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/udb9/html/0208b259-7129-4d9f-9204-8445a8119116.htm).

 

Consider the following script, which seeks to retrieve the data from the title elements at each level of the XML:

--  ParentAxisAccessFixDemo

--    run via Query Analyzer with Include Actual Execution Plan turned on

 

DECLARE @xml XML

 

SET @xml ='<solutionmap>

      <title>SolutionX</title>

      <level0-item><title>ScenarioGroupA</title>

            <level1-item><title>Scenario1</title>

                  <process><title>ProcessA</title></process>

            </level1-item>

            <level1-item><title>Scenario2</title>

                  <process><title>ProcessA</title></process>

                  <process><title>ProcessB</title></process>

            </level1-item>

      </level0-item>

      <level0-item><title>ScenarioGroupB</title>

            <level1-item><title>Scenario1</title>

                  <process><title>ProcessA</title></process>

            </level1-item>

            <level1-item><title>Scenario2</title>

                  <process><title>ProcessA</title></process>

                  <process><title>ProcessB</title></process>

            </level1-item>

      </level0-item>

      <title>SolutionX</title>

      <level0-item><title>ScenarioGroupA</title>

            <level1-item><title>Scenario1</title>

                  <process><title>ProcessA</title></process>

            </level1-item>

            <level1-item><title>Scenario2</title>

                  <process><title>ProcessA</title></process>

                  <process><title>ProcessB</title></process>

            </level1-item>

      </level0-item>

      <level0-item><title>ScenarioGroupB</title>

            <level1-item><title>Scenario1</title>

                  <process><title>ProcessA</title></process>

            </level1-item>

            <level1-item><title>Scenario2</title>

                  <process><title>ProcessA</title></process>

                  <process><title>ProcessB</title></process>

            </level1-item>

      </level0-item>

</solutionmap>'

 

SELECT  @xml

 

SELECT  c.value('(../../../title/text())[1]', 'nvarchar(100)') AS Solution,

        c.value('(../../title/text())[1]', 'nvarchar(100)')    AS ScenarioGroup,

        c.value('(../title/text())[1]', 'nvarchar(100)')       AS Scenario,

        c.value('(title/text())[1]', 'nvarchar(100)')          AS Capability

FROM    @xml.nodes('/solutionmap[1]/level0-item/level1-item/process')

AS      t(c)

-- the query above is 87% of total cost on my single proc system

 

SELECT  sm.sm.value('(title/text())[1]', 'nvarchar(100)')     AS Solution,

        l0.item.value('(title/text())[1]', 'nvarchar(100)')   AS ScenarioGroup,

        l1.item.value('(title/text())[1]', 'nvarchar(100)')   AS Scenario,

        p.process.value('(title/text())[1]', 'nvarchar(100)') AS Capability

FROM    @xml.nodes('/solutionmap[1]')

AS      sm(sm)

CROSS APPLY

        sm.sm.nodes('level0-item')

AS      l0(item)

CROSS APPLY

        l0.item.nodes('level1-item')

AS      l1(item)

CROSS APPLY

        l1.item.nodes('process')

AS      p(process)

-- the query above is 13% of total cost on my single proc system

 

If we examine the XML, we see that each node will have, at maximum, one title attribute.  There can be any number of level0 nodes under the root solutionmap node, any number of level1 nodes under a particular level0 node, and so on.  Each of them will have a single title attribute, and we wish to retrieve them all.

 

In SQL Server 2000, if we wanted to write a query to extract all of the title attributes, we would use the first SELECT above.  The query drills down to the deepest level of the XML for the Capability value, as noted in the NODES() reference.  The values from the higher levels (Scenario, ScenarioGroup, and Solution) are retrieved from the parent, grandparent, and great-grandparent (in this case, also the root) nodes respectively.

 

A look at the execution plan for this query shows that we’ve spawned thirteen XML Reader table valued function calls.  The results of these calls are fed to a two-level nested series of sixteen INNER JOINs (eleven nested loops, three merges, and two hash matches) to produce the results.

 

With SQL Server 2005’s new APPLY syntax, we use the second query above.  The execution plan for this query shows eight XML Reader table valued function calls, down from the original thirteen.  These are fed through a single tier of seven JOINs (five nested loops, one merge, and one hash match) to produce the results.

 

The proof is in the pudding, as they say.  When I run these two queries on SQL Server 2005 RTM on my hyperthreaded single-processor laptop, the parent axis access query consumes 87% of the resources for the batch, while the CROSS APPLY version consumes 13%.  This is an immediate 670% performance improvement with a trivial amount of data.  This performance improvement scales non-linearly with respect to the volume of XML processed -- the more XML you’ve got, the better the performance improvement is!

 

Parent axis access has been my sole remaining major performance complaint with XML access in SQL Server, and this CROSS APPLY syntax resolves it nicely.  Pardon the double negative, but when it comes to SQL Server 2005 and XML access, there’s no longer much you can’t do.
Comments
  • The validation strategy for an earlier post is revisited in light of the newly-gained insight that Query Costing functionality in SQL Server 2005 Management Studio does not provide accurate results for XQuery statements against unindexed XML.

  • there's quite a lot you can't do with xml in sql server 2005. XQUERY does not support the   xml datatype. What use is that?

    you can't use any xquery functions any any useful context either.

  • Thanks for your comments, naz.

    XQUERY runs against the XML datatype in SQL Server 2005 and above, so I'm not quite certain what you're suggesting.  I've also had great success using the xquery functions in several useful contexts.

    Can you offer specific examples of tasks you've been unable to perform to your satisfaction?

  • Hi, when using this cross apply method, the execution plan seems to be a bit off in my opinion. If i run the stored proc it comes back in less than one second but the actual execution plan shows that the subtree cost is over 14000!!!!

    Why would this be...?

  • Hi Chris..

    As discussed in this post, there are issues with query costing for XML access in SQL Server 2005.

    I hope this answers your question.  Please write back if it doesn't..

         -wp

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment