Ward Pond's SQL Server blog

Ruminating on issues pertinent to the design and development of sound databases and processes under Microsoft SQL Server 2008, SQL Server 2005, and SQL Server 2000 (while reserving the right to vent about anything else that's on my mind)

Database Programming: Why I, Like, May Never Write Another LIKE, Like, Ever Again

Database Programming: Why I, Like, May Never Write Another LIKE, Like, Ever Again

  • Comments 5
  • Likes

[UPDATE 30 October 2007: There are significant ambiguities in this post which are addressed in the comments and in this follow-up, which I recommend reading after you've read this post and its comments.  -wp]

I've been doing some performance tuning work over the last couple of days, and I've found a new use for a technique first shown here.

One of the primary functions of our product is to examine program code for certain patterns and practices.  This is accomplished through pattern-matching queries against system tables, which take the general form:

SELECT  data

FROM    mytable

WHERE   column LIKE N'%value%'

This is among the most common performance issues we face.  Full-Text Search is a wonderful technology for addressing this issue; it unleashes the CONTAINS syntax, which is a powerful and wonderful thing.  Sometimes, though, Full-Text Search isn't an option.  In those scenarios, here's some equivalent syntax which provides a 2.5x performance improvement for every LIKE it replaces:

SELECT  data

FROM    mytable

WHERE   patindex('%value%',column COLLATE Latin1_General_BIN) > 0

If you just use the PATINDEX without the binary collation, you get performance equivalent to the LIKE.  Adding the binary collation turns the trick and unlocks the performance boost.

So, when would I ever code a LIKE in light of this discovery?  Well, the only thing I haven't figured out how to do with PATINDEX is escape a reserved character, which, as you'l recall from this post, is among the LIKE command's capabilities.

I hope this tip comes in handy for you..

     -wp

Comments
  • Hi Ward,

    Two comments:

    First of all, in my experiments I see no difference in performance between using PATINDEX with the COLLATE keyword and LIKE with the COLLATE keyword, as in:

    SELECT  data

    FROM    mytable

    WHERE   column LIKE N'%value%' COLLATE Latin1_General_BIN

    ... and since the LIKE is more explicit/readable in terms of your intentions for the query, I would personally favor it over the PATINDEX (not to mention the escape character issue).

    Second, I think it's important to point out that in most databases, LIKE without the COLLATE and PATINDEX or LIKE with the COLLATE are not logically equivalent queries.  The former, in most databases, will be a case-sensitive query, whereas the latter will be case-insensitive.  The good news is that since you're already forcing a table scan by using LIKE '%xxx%' and by using the COLLATE keyword, another operator makes little or no difference.  In my tests using UPPER did not change the performance improvement at all, but does make the query effectively case-sensitive:

    SELECT  data

    FROM    mytable

    WHERE   UPPER(column) LIKE UPPER(N'%value%') COLLATE Latin1_General_BIN

  • Hi Adam..

    Thanks very much for your comments.

    To your first comment: I agree as it applies to the code snippet presented in the post, which is simplified a bit from what I'm actually working on.  The performance boost comes from the collation, not the use of LIKE over PATINDEX.

    The application I'm actually wants to locate multiple occurences of certain strings and analyze where they occur relative to each other.  For these requirements, where I'm essentially crawling a long string, PATINDEX has been a useful solution.

    Your second point is also well-taken, although it will be moot in case-insenstive databases, which is what most folks seem to be using these days.  For what it's worth, in an extreme case I did see a slight performance impact from multiple invocations of the UPPER function; moving it outside of a large set-based construct saved me about 3% of my execution time.

    As always, Adam, thanks for reading and commenting!

        -wp

  • Hi Ward,

    I just realized that I mixed up the terminology in my post a bit.  In my comment, replace all instances of "case-sensitive" with "case-insensitive", and vice-versa.  

    Your comment that "it will be moot in case-insensitive databases," seems to be reversed as well, since the binary collation will actually make the query case-sensitive.  Most of the databases I see use default options, i.e. case-insensitive, although I have talked to a few people recently who seem to favor case-sensitivity for one reason or another.  Definitely not the norm from my point of view, though.

  • By the way, perhaps one of these will help you:

    http://sqlblog.com/blogs/adam_machanic/archive/2006/07/12/counting-occurrences-of-a-substring-within-a-string.aspx

    http://sqlblog.com/blogs/adam_machanic/archive/2006/07/12/pattern-based-split-string.aspx

  • .. but it's got to have a COLLATE with it. A couple of updates are necessary to my recent post on LIKE,

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment