Subscribe in a reader" />
Welcome to TechNet Blogs Sign in | Join | Help

Andrew Fryer's Blog

Insufficient Data

News

Using Full Text Searching SQL Server 2008

I have now got SQL Server 2008 RC0 running in my Hyper-V 64-bit virtual machine, complete with the sample adventure works database from connect. Now it’s time to have a look at querying full text indexes. One of the common ways of doing this is to use the contains function as part of a where clause and combine this with a filter on the structured e.g

SELECT    
    PM.Name AS ModelName,
    PC.Name AS Culture,
    P.Name AS ProductName,
    PD.Description,
    P.ProductNumber
FROM        
    Production.ProductModel PM
INNER JOIN  
    Production.ProductModelProductDescriptionCulture PDC
    ON PM.ProductModelID = PDC.ProductModelID
INNER JOIN
    Production.ProductDescription PD
    ON PDC.ProductDescriptionID = PD.ProductDescriptionID
INNER JOIN
    Production.Product P
    ON PM.ProductModelID = P.ProductModelID
INNER JOIN
    Production.Culture PC
    ON PDC.CultureID = PC.CultureID
WHERE    
    CONTAINS(PD.Description, 'Performance')
    AND (P.ProductNumber LIKE N'FR%')

One thing to note about this query is that it will be much faster in SQL Server 2008 as it will yield a single execution plan:

image

with a table valued function for the full text match. BTW the issues I was having getting Full text to work in CTP6 have gone away and all of this just worked after install.

The contains function can support all sorts of searching operators for proximity matching, applying different weight to different search terms as well as the ability to search from a list of columns in the same table, provided they are in the same language i.e. they have been indexed with the same language.  Check Books On line for more information here

The same syntax is also good if the column you’re searching is actually a document in Varbinary(MAX), and the new Filestream data type.  So if I index the documents table (as I did on Thursday ) then I can run a query like this…

SELECT    
FileName,
[Document]
FROM        
    Production.[Document]
WHERE
    CONTAINS(Production.[Document].[Document], 'red')

and get back this

image

of course you will want to write some code to stream that document back to the user in a more readable form, but the point is that the search has got inside the document and realised it contains the word ‘red’.

If you want to search across multiple tables for a term then your query will look more  like this (BTW I setup the product table for full text searching before running this):

DECLARE @SearchTerm Nvarchar(50) = N'road'
SELECT    
    PM.Name AS ModelName,
    PC.Name AS Culture,
    P.Name AS ProductName,
    PD.Description,
    P.ProductNumber
FROM        
    Production.ProductModel PM
INNER JOIN  
    Production.ProductModelProductDescriptionCulture PDC
    ON PM.ProductModelID = PDC.ProductModelID
INNER JOIN
    Production.ProductDescription PD
    ON PDC.ProductDescriptionID = PD.ProductDescriptionID
INNER JOIN
    Production.Product P
    ON PM.ProductModelID = P.ProductModelID
INNER JOIN
    Production.Culture PC
    ON PDC.CultureID = PC.CultureID
WHERE    
    CONTAINS(PD.Description , @SearchTerm )
OR
    CONTAINS((P.Name, P.ProductLine) , @SearchTerm )

I have highlighted the multiple column clause here in orange , as  I included both of these columns when I setup full text indexing ion the Product table.

The query plan for this statement looks like this:

image

This is not such a good performance story and books on line recommends here that you don’t have multiple contains statements because of this.  In this example we are stuffed as we are full text searching across multiple tables.  If you need to do alot of full text searching on a large index , then design your database with full text searching in mind i.e. try and keep the number of tables that you will want to combine in a text search to a minimum. If you think about this it’s no different from using T-SQL to search lots of columns for the same thing, and so this limitation is not as much of a constraint as it initially appears.

So contains is no substitute for good database design it’s just another tool, to quote Abraham Maslow…

“If the only tool you have is a hammer, then you tend to see every problem as a nail”.

Posted: Wednesday, June 11, 2008 5:09 PM by Andrew_Fryer

Comments

Richard Adams said:

Hi,  Thanks for the post.  I'm currently looking at using SQL server full text indexing  on a new project.. But...

I need to retrieve the actual text where the search terms were found.. I need excerpts from the searched text, highlighting (google style) hit counts, ranking and so forth.. Am I missing something or does SQL server not support this..

I can't see the point of all this technology if all I can get back are rows..

Maybe I need to be looking at Lucene.NET?

Thanks for any pointers..

# June 12, 2008 6:56 PM

ike said:

i am using full text search for an application for searching for specific adverts and displaying the details of the advertisers.Am i doing the right thing or i should also be looking at lucene though i am a little biased towards SQL Server.

# August 6, 2008 10:50 AM
Leave a Comment

(required) 

(required) 

(optional)

(required) 

  
Enter Code Here: Required

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Page view tracker