Insufficient data from Andrew Fryer

The place where I page to when my brain is full up of stuff about the Microsoft platform

Using Full Text Searching SQL Server 2008

Using Full Text Searching SQL Server 2008

  • Comments 6
  • Likes

I have now got SQL Server 2008 RC0 running in my Hyper-V 64-bit virtual machine, complete with the sample adventure works database from connect. Now it’s time to have a look at querying full text indexes. One of the common ways of doing this is to use the contains function as part of a where clause and combine this with a filter on the structured e.g

SELECT    
    PM.Name AS ModelName,
    PC.Name AS Culture,
    P.Name AS ProductName,
    PD.Description,
    P.ProductNumber
FROM        
    Production.ProductModel PM
INNER JOIN  
    Production.ProductModelProductDescriptionCulture PDC
    ON PM.ProductModelID = PDC.ProductModelID
INNER JOIN
    Production.ProductDescription PD
    ON PDC.ProductDescriptionID = PD.ProductDescriptionID
INNER JOIN
    Production.Product P
    ON PM.ProductModelID = P.ProductModelID
INNER JOIN
    Production.Culture PC
    ON PDC.CultureID = PC.CultureID
WHERE    
    CONTAINS(PD.Description, 'Performance')
    AND (P.ProductNumber LIKE N'FR%')

One thing to note about this query is that it will be much faster in SQL Server 2008 as it will yield a single execution plan:

image

with a table valued function for the full text match. BTW the issues I was having getting Full text to work in CTP6 have gone away and all of this just worked after install.

The contains function can support all sorts of searching operators for proximity matching, applying different weight to different search terms as well as the ability to search from a list of columns in the same table, provided they are in the same language i.e. they have been indexed with the same language.  Check Books On line for more information here

The same syntax is also good if the column you’re searching is actually a document in Varbinary(MAX), and the new Filestream data type.  So if I index the documents table (as I did on Thursday ) then I can run a query like this…

SELECT    
FileName,
[Document]
FROM        
    Production.[Document]
WHERE
    CONTAINS(Production.[Document].[Document], 'red')

and get back this

image

of course you will want to write some code to stream that document back to the user in a more readable form, but the point is that the search has got inside the document and realised it contains the word ‘red’.

If you want to search across multiple tables for a term then your query will look more  like this (BTW I setup the product table for full text searching before running this):

DECLARE @SearchTerm Nvarchar(50) = N'road'
SELECT    
    PM.Name AS ModelName,
    PC.Name AS Culture,
    P.Name AS ProductName,
    PD.Description,
    P.ProductNumber
FROM        
    Production.ProductModel PM
INNER JOIN  
    Production.ProductModelProductDescriptionCulture PDC
    ON PM.ProductModelID = PDC.ProductModelID
INNER JOIN
    Production.ProductDescription PD
    ON PDC.ProductDescriptionID = PD.ProductDescriptionID
INNER JOIN
    Production.Product P
    ON PM.ProductModelID = P.ProductModelID
INNER JOIN
    Production.Culture PC
    ON PDC.CultureID = PC.CultureID
WHERE    
    CONTAINS(PD.Description , @SearchTerm )
OR
    CONTAINS((P.Name, P.ProductLine) , @SearchTerm )

I have highlighted the multiple column clause here in orange , as  I included both of these columns when I setup full text indexing ion the Product table.

The query plan for this statement looks like this:

image

This is not such a good performance story and books on line recommends here that you don’t have multiple contains statements because of this.  In this example we are stuffed as we are full text searching across multiple tables.  If you need to do alot of full text searching on a large index , then design your database with full text searching in mind i.e. try and keep the number of tables that you will want to combine in a text search to a minimum. If you think about this it’s no different from using T-SQL to search lots of columns for the same thing, and so this limitation is not as much of a constraint as it initially appears.

So contains is no substitute for good database design it’s just another tool, to quote Abraham Maslow…

“If the only tool you have is a hammer, then you tend to see every problem as a nail”.

Comments
  • Hi,  Thanks for the post.  I'm currently looking at using SQL server full text indexing  on a new project.. But...

    I need to retrieve the actual text where the search terms were found.. I need excerpts from the searched text, highlighting (google style) hit counts, ranking and so forth.. Am I missing something or does SQL server not support this..

    I can't see the point of all this technology if all I can get back are rows..

    Maybe I need to be looking at Lucene.NET?

    Thanks for any pointers..

  • i am using full text search for an application for searching for specific adverts and displaying the details of the advertisers.Am i doing the right thing or i should also be looking at lucene though i am a little biased towards SQL Server.

  • Hello..

    I'm trying to run a full text search on PDFs but the PDF files are not showing in the item count in the catalog.  My table also includes DOCs and they are showing in the count and is being indexed/searched properly but not PDFs.  

    I'm lost and need help.  I installed the Adobe PDF iFilter for 64-bit v9.0.  I'm using SQL Server 2008.

    Why isn't PDFs being indexed?  Could it be a 64-bit or a SQL Server 2008 issue?

    Thanks in advance..

  • My only suggestion is..

    Have you got the ifilters setup for the pdf format, i.e. the ifilter is installed and there is column in your table that indicates that the document typre (ifilter to use).

    Andrew

  • Hi,

    I also have problems with the Adobe iFilter for 64-bits platform on a 64-bits SQL Server running on Win2008x64.

    Documents are stored in an image column type.

    I have checked the fulltext_document_types and added the bin path of the iFilter dll to the PATH system variable.

    Still, no PDFs are being indexed...

    The crawl logs report no errors.

    Today I uninstalled the Adobe iFilter and installed the trial version of Foxit iFilter. It indexed all the PDFs without any problem!

    I have googled around a lot but I can't find any possible explanation to why I'm not able to use the Adobe iFilter.

  • Sivert,  This is obviously down to the ifilter rather than SQL Server so a good idea to se foxit, whihc I will pass on.  I will try and find out why adobe doesn't work and get back to you.

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment