Welcome to TechNet Blogs Sign in | Join | Help

Database Programming: Set-Based Update -- Hugo's Syntax Outperforms Shaun's; Ward Tweaks Hugo's Syntax Even Further

This is turning into a good old-fashioned coding party.  I love it..

Hugo Kornelis left a wonderful comment on last night's post (thanks Hugo!) answering my challenge to outperform Shaun's syntax.  With some adjustments I've made to control for parallelism (we can argue later whether that's a good thing; when I leave parallelism in play on my sandbox (which turns out to be hyperthreaded, contrary to what I said last night) the results of the following tests are far too random), I have some results to share..

I expanded on Hugo's approach and built 5 possible approaches to this problem (if you've got a different one, bring it on..).  Let's name them as follows (along with the average of five execution times on my laptop (HP nc8430; 2gB RAM):

  • 1MRowTable (2156 ms)
  • ShaunsCTE (1546 ms)
  • HugosFourWayJoin (1123 ms)
  • HugosTwoWayJoin (1110 ms)
  • TwoWayJoinNoTOP (983 ms)

Here's a rundown on each alternative..

1MRowTable

Looking at Hugo's syntax, I wondered if we might get more efficiency from a SELECT off of a 1M row table than off of a four-way Cartesian product.  This requires a dedicated table, which we'll populate with an adaptation of Shaun's CTE:

drop table SetBuilder

go

create table SetBuilder (Id bigint PRIMARY KEY)

go

 

;with digits(i) as (

    select 1 as i union all select 2 union all select 3 union all

    select 4 union all select 5 union all select 6 union all

    select 7 union all select 8 union all select 9 union all

    select 0

),

-- generate 1M rows each with a unique row number, i

sequence(i) as (

    select d1.i + (10*d2.i) + (100*d3.i) + (1000*d4.i) + (10000*d5.i) + (100000*d6.i)

      from digits as d1,

           digits as d2,

           digits as d3,

           digits as d4,

           digits as d5,

           digits as d6

)

insert SetBuilder (Id)

select i

from sequence 

Here's the test for generating 1M GUIDs using the table we just populated:

 

drop table SomeTable;

go

checkpoint;

dbcc freeproccache;

dbcc dropcleanbuffers;

go

declare @start datetime,

          @datediff int;

set @start = getdate();

select newid() as uuid

into SomeTable

from SetBuilder

option (maxdop 1);

select @datediff = datediff(ms, @start, getdate());

select @datediff  AS [1MRowTable],

         count(*)

from SomeTable;

ShaunsCTE

Here's Shaun's CTE blended with Hugo's monitoring syntax:

drop table SomeTable;

go

checkpoint;

dbcc freeproccache;

dbcc dropcleanbuffers;

go

declare @start datetime,

          @datediff int;

set @start = getdate();

;with digits(i) as (

select 1 as i union all select 2 union all select 3 union all

select 4 union all select 5 union all select 6 union all

select 7 union all select 8 union all select 9 union all

select 0

),

-- generate 1M rows each with a unique row number, i

sequence(i) as (

select d1.i + (10*d2.i) + (100*d3.i) + (1000*d4.i) + (10000*d5.i) + (100000*d6.i)

from digits as d1,

digits as d2,

digits as d3,

digits as d4,

digits as d5,

digits as d6

)

select newid() as uuid

into SomeTable

from sequence

option (maxdop 1);

select @datediff = datediff(ms, @start, getdate());

select @datediff  AS ShaunsCTE,

         count(*)

from SomeTable;

HugosFourWayJoin

Here's Hugo's four-way JOIN proposal, appropriate for use in a newly created database:

drop table SomeTable;

go

checkpoint;

dbcc freeproccache;

dbcc dropcleanbuffers;

go

declare @start datetime,

          @datediff int;

set @start = getdate();

select top (1000000) newid() as uuid

into SomeTable

from sysobjects a, sysobjects b, sysobjects c, sysobjects d;

select @datediff = datediff(ms, @start, getdate());

select @datediff  AS HugosFourWayJoin,

         count(*)

from SomeTable

option (maxdop 1);

HugosTwoWayJoin

Here's Hugo's two-way JOIN, appropriate for use in databases with several user objects defined:

drop table SomeTable;

go

checkpoint;

dbcc freeproccache;

dbcc dropcleanbuffers;

go

declare @start datetime,

          @datediff int;

set @start = getdate();

select top (1000000) newid() as uuid

into SomeTable

from sysobjects a, sysobjects b

select @datediff = datediff(ms, @start, getdate());

select @datediff  AS HugosTwoWayJoin,

         count(*)

from SomeTable

option (maxdop 1);

WardsTwoWayJoinNoTOP

Given that the two-way JOIN outperformed the four-way JOIN, I thought we might get better performance if we built a contrived table containing 1000 rows, which would generate an exactly 1M row Cartesian product, which removes the need for the TOP clause in the SELECT:

create table SetBuilder1K (Id bigint PRIMARY KEY)

go

 

;with digits(i) as (

    select 1 as i union all select 2 union all select 3 union all

    select 4 union all select 5 union all select 6 union all

    select 7 union all select 8 union all select 9 union all

    select 0

),

-- generate 1M rows each with a unique row number, i

sequence(i) as (

    select d1.i + (10*d2.i) + (100*d3.i) + (1000*d4.i) + (10000*d5.i) + (100000*d6.i)

      from digits as d1,

           digits as d2,

           digits as d3,

           digits as d4,

           digits as d5,

           digits as d6

)

insert SetBuilder1K (Id)

select TOP 1000 i

from sequence

order by i

 

drop table SomeTable;

go

checkpoint;

dbcc freeproccache;

dbcc dropcleanbuffers;

go

declare @start datetime,

          @datediff int;

set @start = getdate();

select newid() as uuid

into SomeTable

from SetBuilder1K a, SetBuilder1K b

select @datediff = datediff(ms, @start, getdate());

select @datediff  AS WardsTwoWayJoinNoTOP,

         count(*)

from SomeTable

option (maxdop 1);

Let's Keep This Party Going!

It looks as though the precisely contrived Cartesian product is the most performant of these alternatives.  This is further demonstration, I think, that it's best to let SQL Server be SQL Server; working with tables appears to be more performant than working with CTEs.

This is definitely a work in progress, and I hope those of you who are so inclined will join the party and throw some code on the bonfire.  I'd be especially interested in knowing if anybody can get consistent results with parallelism enabled.

If you've made it this far, thanks..

     -wp

PS.  If you have made it this far, your reward is the name of Hugo's blog, which I think is so brilliant, I'm going to pass it through the SQL query parser:

SELECT  Hints,

        Tips,

        Tricks

FROM    Hugo Kornelis
WHERE   RDBMS = 'SQL Server'

Outstanding, Hugo..

Published Wednesday, September 20, 2006 11:15 PM by Ward Pond

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

# re: Database Programming: Set-Based Update -- Hugo's Syntax Outperforms Shaun's; Ward Tweaks Hugo's Syntax Even Further

Thursday, September 21, 2006 6:16 AM by Denis the SQL Menace
How about the next challenge is to return all 78498 prime numbers between 1 and 1000000?

# Ward Pond's SQL Server blog : Database Programming: Denis' Prime Number Challenge

# re: Database Programming: Set-Based Update -- Hugo's Syntax Outperforms Shaun's; Ward Tweaks Hugo's Syntax Even Further

Saturday, September 23, 2006 5:34 PM by Hugo Kornelis
Hi Ward,

What surprises me, is that the join of two 1,000-row tables outperforms the version with the single 1,000,000-row setbuilder table. I would have expected the latter to be the fastest. Have you already tried a three-way join of three 100-row tables?

(I would have tried it myself, but I can't; I'm busy writing a blog entry with my response to Denis' prime number challenge. I'll put it up shortly.)

Best, Hugo

# The prime number challenge – great waste of time!

No sane person would even consider using SQL Server to construct a list of prime numbers. So just to...

# re: Explicit does seem to beat CTE in my experience as well

Sunday, September 24, 2006 3:52 PM by Marc Brooks
I discovered this when playing with the CTE version of date-range generation.  What I ended up with is something that seems much faster and generates a very simple query plan by doing explicit date math. Check out this: http://musingmarc.blogspot.com/2006/07/need-date-range-in-sql-without-filling.html

# Ward Pond's SQL Server blog : Database Programming: Prime Number Methodology

Leave a Comment

(required) 
required 
(required) 

  
Enter Code Here: Required
 
Page view tracker