Ward Pond's SQL Server blog

Ruminating on issues pertinent to the design and development of sound databases and processes under Microsoft SQL Server 2008, SQL Server 2005, and SQL Server 2000 (while reserving the right to vent about anything else that's on my mind)

Database Programming: TOP Without ORDER BY?

Database Programming: TOP Without ORDER BY?

  • Comments 6
  • Likes

Egad, a technical post!  And after I'd already said goodbye for the month.  A long time ago, a friend taught me that expectations are small murders.  What does that make plans?

At any rate, Peter asked a FAQ on one of the internal discussions this morning, and since we haven't discussed it here..

I have a simple question regarding the Select Top …. clause.

When I use this clause, will the result be always the same, or does SQL 2000 (in my case here) return always a different set of data?

Bruno replied:

If you don’t define an “order by”, by standard SQL semantics, you are not guaranteed to always have the same result. Most of the time, you will, but index updates and server activity can change the records returned (even if there is no data insertion/update, which of course can change the data returned in any case and I suppose it is not your scenario)

 

That’s not a limitation of SQL 2000 or 2005, is simply part of the SQL definition: think about sets of records, they are unordered collections unless you explicitly asks for some order.

.. which led me to post the following:

Just to expand on Bruno’s point a bit..

TOP without ORDER BY doesn’t make much sense.  If one is programming a ranking function (which is basically what TOP Is), criteria by which to rank is a crucial part of the specification.  As Bruno points out, if you don’t specify ranking criteria, SQL Server will simply return the first records it finds until the count is satisfied.  There are several scenarios (parallelism primary among them) which could lead to different result sets in this scenario. 

Bruno then wrote:

There are some scenarios that TOP without ORDER BY is useful, though – getting a simple view of the table structure and sample data, process records N at a time, when you have to process all that meet some WHERE filter.

.. to which I replied:

While a simple view of the table structure can be retrieved with a TOP query, it’s more efficient to use a WHERE 1=0 construct, which will bypass indexes completely.

 

In your N records example, without an ORDER BY, there’s no guarantee you wouldn’t process the same record twice.

 

Think about the “top 10 hits” in music.  Top by what?  Volume of the bass guitar?  Number of drum hits?  The “top 10 hits” are measured by sales.  If we ordered by drum hits, we’d probably get a different list.  Since I believe that one of our jobs in building queries is to give the optimizer as much information as possible (never mind for the query to produce the same results (in context) every time it’s run), for my money TOP without ORDER BY is like IF without THEN.

While this is all obviously my two cents, I just can't see invoking a ranking function (TOP) without criteria (ORDER BY).  Furthermore, this strikes me as an instance where Pond's Tenth Law of System Design applies: your code is a communication with someone else, who will likely come after you are gone.

Would/Have you ever implemented code in production that uses TOP without ORDER BY?

     -wp

Comments
  • No, I never have and never will.  Non-determinism in programming is a sin, and TOP without ORDER BY is not deterministic.  Simple as that :)

  • Every time I write a post before I leave on vacation, you always bail me out, Adam. :)

    "Deterministic" is the word I couldn't find this afternoon.  Thanks for clarifying the discussion.

    Yes, it is simple. :)

  • I don't use TOP without ORDER BY in procedural code, but I do use it when I'm investigating unfamiliar data sources and don't want to look at more than a sample of records.  That's why TOP does not require an ORDER BY clause.

    It is worth mentioning that the SQL syntax could retain this kind of flexibility by requiring an ORDER BY clause unless the query is of the SELECT * variety.  That would help to discourage novice developers from implementing TOP without ORDER BY in procedural code.

  • I very often do use TOP w/o any sort of order by, in the very form that Bruno suggests above: In Management Studio, to see table structure and get a quick glance at some sample data.

    So what if I see the same record twice, or don't get it the second time I hit F5?

    I *never* do it in production code, though.

  • Great discussion!

    The upshot is that the syntax is flexible enough to support the retrieval of table schema and other administrative (non-production) functions, but we all (so far) agree that we'd never do this in procedural code.

    As Adam said in the first comment on the thread, "non-determinism in programming is a sin."

    Thanks..

        -wp

  • One of the comments I accidentally deleted earlier this afternoon posed the following question (paraphrased):

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment