Ward Pond's SQL Server blog

Ruminating on issues pertinent to the design and development of sound databases and processes under Microsoft SQL Server 2008, SQL Server 2005, and SQL Server 2000 (while reserving the right to vent about anything else that's on my mind)

Database Programming: Mixing TOP With UNION

Database Programming: Mixing TOP With UNION

  • Comments 7
  • Likes

I'd like to share yet another interesting conversation from an internal discussion list.

Michael sent a query that was producing inconsistent results:

select TOP 35 1 AS [PRECEDANCE], W03a.CRDATTIM, W03a.RECORDCD, W03a.CRNODE, W03a.PRTY + TM.TEAMPRTY AS [TOTAL PRIORITY]
from WL4U999S TM,
WL3U999S WRKa
,
W03U999S W03a
where W03a.LOCKSTAT = ' '
and rtrim(W03a.END_QUEUE_FLG) = 'N'
and rtrim(W03a.SUSPFLAG)= 'N'
and rtrim(W03a.INXFLD03) = rtrim(TM.TEAM)
and rtrim(TM.USERID) = 'AWDSETUP'
and rtrim(W03a.QUEUECD) = rtrim(TM.QUEUECD)
and rtrim(W03a.WRKTYPE) = rtrim(WRKa.WRKTYPE)
and rtrim(WRKa.USERID) = rtrim(TM.USERID)
and rtrim(W03a.UNITCD) = rtrim(WRKa.UNITCD)
and rtrim(W03a.QUEUECD) = rtrim(WRKa.QUEUECD)

union

select top 35 2 AS [PRECEDANCE], W03b.CRDATTIM, W03b.RECORDCD, W03b.CRNODE, W03b.PRTY
AS [TOTAL PRIORITY]
from WL3U999S WRKb,
W03U999S W03b
where W03b.LOCKSTAT = ' '
and W03b.END_QUEUE_FLG = 'N'
and W03b.SUSPFLAG = 'N'
and WRKb.USERID = 'AWDSETUP'
and W03b.UNITCD = WRKb.UNITCD
and W03b.WRKTYPE = WRKb.WRKTYPE
and W03b.QUEUECD = WRKb.QUEUECD
and W03b.QUEUECD not in
( select W30.QUEUECD
from W30U999S W30 where W03b.QUEUECD = W30.QUEUECD)
order by 5 desc, 1, 2
option(force order);

Michael's intent was to produce 70 rows from this query, but he was only getting 35 back.  The issue is that the TOP from the first query is running over the entire UNIONed resultset.

Fortunately, we can resolve this issue fairly easily.  All we need to do is place parentheses around each subquery and give each subquery an alias:

select PRECEDANCE, CRDATTIM, RECORDCD, CRNODE, [TOTAL PRIORITY]

FROM (

select TOP 35 1 AS [PRECEDANCE], W03a.CRDATTIM, W03a.RECORDCD, W03a.CRNODE, W03a.PRTY + TM.TEAMPRTY AS [TOTAL PRIORITY]

from WL4U999S TM,

WL3U999S WRKa,

W03U999S W03a

where W03a.LOCKSTAT = ' '

and rtrim(W03a.END_QUEUE_FLG) = 'N'

and rtrim(W03a.SUSPFLAG)= 'N'

and rtrim(W03a.INXFLD03) = rtrim(TM.TEAM)

and rtrim(TM.USERID) = 'AWDSETUP'

and rtrim(W03a.QUEUECD) = rtrim(TM.QUEUECD)

and rtrim(W03a.WRKTYPE) = rtrim(WRKa.WRKTYPE)

and rtrim(WRKa.USERID) = rtrim(TM.USERID)

and rtrim(W03a.UNITCD) = rtrim(WRKa.UNITCD)

and rtrim(W03a.QUEUECD) = rtrim(WRKa.QUEUECD)

) a

 

union

 

select PRECEDANCE, CRDATTIM, RECORDCD, CRNODE, [TOTAL PRIORITY]

FROM (

select top 35 2 AS [PRECEDANCE], W03b.CRDATTIM, W03b.RECORDCD, W03b.CRNODE, W03b.PRTY AS [TOTAL PRIORITY]

from WL3U999S WRKb,

W03U999S W03b

where W03b.LOCKSTAT = ' '

and W03b.END_QUEUE_FLG = 'N'

and W03b.SUSPFLAG = 'N'

and WRKb.USERID = 'AWDSETUP'

and W03b.UNITCD = WRKb.UNITCD

and W03b.WRKTYPE = WRKb.WRKTYPE

and W03b.QUEUECD = WRKb.QUEUECD

and W03b.QUEUECD not in

( select W30.QUEUECD

from W30U999S W30 where W03b.QUEUECD = W30.QUEUECD)

order by 5 desc, 1, 2

option(force order)

) b

The revised query returns a 70 record resultset, as intended.

     -wp

Comments
  • IMHO the first query returns 35 rows instead of 70 because UNION and not UNION ALL operation is used.
    Each TOP applies to its subquery and after that UNION operation eliminates duplicate records and that is why the entire query return 35 records

  • Thanks for your comment, Shurik.

    I ran a quick test with your suggestion and the behavior of this query didn't change in either case.  Can you pass along a code sample to demonstrate your idea?

  • Shurik's comment is incorrect. The constants "1 AS PRECEDENCE" and "2 AS PRECEDENCE" in the two union'ed query ensure that there can never be any duplicates. Changing UNION to UNION ALL would improve performance, though, as it eliminates the step to search for the (non-existing) duplicates.

    In your revised query, you should duplicate the ORDER BY clause to the first subquery as well. In the current version, the TOP in the first subquery has no corresponding ORDER BY, meaning that you'll get a pseudo-random selection of 35 rows.

  • I agree with you on all points, Hugo.  I have an inkling that Michael's intent is to have the ORDER BY on the master query:


    select PRECEDANCE, CRDATTIM, RECORDCD, CRNODE, [TOTAL PRIORITY]
    FROM (
    select TOP 35 1 AS [PRECEDANCE], W03a.CRDATTIM, W03a.RECORDCD, W03a.CRNODE, W03a.PRTY + TM.TEAMPRTY AS [TOTAL PRIORITY]
    from WL4U999S TM,
    WL3U999S WRKa,
    W03U999S W03a
    where W03a.LOCKSTAT = ' '
    and rtrim(W03a.END_QUEUE_FLG) = 'N'
    and rtrim(W03a.SUSPFLAG)= 'N'
    and rtrim(W03a.INXFLD03) = rtrim(TM.TEAM)
    and rtrim(TM.USERID) = 'AWDSETUP'
    and rtrim(W03a.QUEUECD) = rtrim(TM.QUEUECD)
    and rtrim(W03a.WRKTYPE) = rtrim(WRKa.WRKTYPE)
    and rtrim(WRKa.USERID) = rtrim(TM.USERID)
    and rtrim(W03a.UNITCD) = rtrim(WRKa.UNITCD)
    and rtrim(W03a.QUEUECD) = rtrim(WRKa.QUEUECD)
    ) a

    union

    select PRECEDANCE, CRDATTIM, RECORDCD, CRNODE,[TOTAL PRIORITY]
    FROM (
    select top 35 2 AS [PRECEDANCE], W03b.CRDATTIM, W03b.RECORDCD, W03b.CRNODE, W03b.PRTY AS [TOTAL PRIORITY]
    from WL3U999S WRKb,
    W03U999S W03b
    where W03b.LOCKSTAT = ' '
    and W03b.END_QUEUE_FLG = 'N'
    and W03b.SUSPFLAG = 'N'
    and WRKb.USERID = 'AWDSETUP'
    and W03b.UNITCD = WRKb.UNITCD
    and W03b.WRKTYPE = WRKb.WRKTYPE
    and W03b.QUEUECD = WRKb.QUEUECD
    and W03b.QUEUECD not in
    ( select W30.QUEUECD
    from W30U999S W30 where W03b.QUEUECD = W30.QUEUECD)
    ) b
    order by 5 desc, 1, 2
    option(force order)

  • It is possible that I am missing something again, but the following query returns 70 records and I am unable to reproduce the effect when
    "...the TOP from the first query is running over the entire UNIONed resultset."

    select top 35 1 as precedence, id from sysobjects
    union
    select top 35 2 as precedence, id from sysobjects
    order by 2, 1

    select @@version
    Microsoft SQL Server  2000 - 8.00.2162 (Intel X86)


  • Thanks for writing back, Shurik.

    It appears I may have slightly mis-stated Michael's original issue.  I went back to the original email and found this, which I missed when I wrote my first post:

    -- begin excerpt

    The problem is that if you run the top query all by itself without an order by it brings back 1 set of data, if you run it with the sort from the bottom(in another query window) then you get another data set that is totally different. You can view all 35 records from each query and they aren't even close to being the same.

    Then if you run the query on the bottom you get zero rows(which you should). Run this entire thing together as is and you get another different dataset. I then moved the top query to the bottom and re-ran it, this time the sort per se affects the over all records and you get back the same data as if you ran the top query all by itself, with ORDER by. I am guessing, its either an issue, or its how the optimizer is running the query, because I don't honestly see how this query isn't working.

    -- end excerpt

    It's this issue that the new nested query resolves.

       -wp

  • What's this, you say? Useful T-SQL making a return to this blog? Yep. The first T-SQL I've posted since

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment