Newbie Full Text Query Question - Strange Ranking Results?

Posted on 2004-07-31
Last Modified: 2008-02-01

I am trying to speed up our very antiquated search facility on our site using FTS. I just need a few pointers.

I am struggling to produce accurate results on some multiple word seach criteria.

For speed i have a lookup table that is produced from about 10 tables that conatin all text that i want to search on. I use a simple FREETEXTTABLE(SearchTable, *, N'TextStringToSearch')  search on this, which then allows me to use the identities returned together with the rank to get the correct records ranked according to the FREETEXTTABLE results. This is fine and extremly fast. The problem is when using multiple words, for example 'New Zealand'. (I have a country name field) The result will bring back a dozen or so 'New Zealand' reults, then bring back 'New' Then 'Zealand' then continue to giving a low ranking score to 'New Zealand' again. Obviously i cannot tell when a user wants grouping and when they dont, this gets worse when you saerch for 'New Zealand Beaches' For example, is comes back with beaches in other countries first!

Is this normal behaviour? Is there a word proximity thing i can use? Can I weight the columns being searched? Am i completly off track? I dont need google accuracy, but it sure needs to be relevant and quick. Any pointers?

Question by:gravity100
  • 5
  • 4
LVL 75

Accepted Solution

Anthony Perkins earned 250 total points
ID: 11684973
Since you cannot search for phrases (I know New Zealand is not a phrase, but just bare with me ...) with FREETEXTTABLE, consider using instead CONTAINSTABLE which does support it.  You can then do something like this:
CONTAINSTABLE(SearchTable, *, N'"New Zealand"')  
LVL 75

Expert Comment

by:Anthony Perkins
ID: 11684981
Also, just a friendly reminder, of how it works here:
What are my choices?

Author Comment

ID: 11685184
Yes that is usefull, but as i see it that will exclude 'new' and 'zealand' something we dont want to do. I guess i could combine both FREETEXTTABLE  and CONTAINSTABLE, that would give a more accurate result, but it seems overly expensive and wouldnt solve the problem very well. How does the NEAR term work, it doesnt seem to regard text in different fields as near, how near is near? I could concatanate all the search into one field if you think that would work. I currently have the following

SELECT SL.Country_Code, SL.Hotel_Name, SL.Hotel_Code,SL.Currency_Rate,SL.Rate_Value, A.Rank
FROM   SearchLookup AS SL
INNER JOIN CONTAINSTABLE(SearchLookup, *, '(new) OR (zealand) OR (new NEAR zealand) OR ("new zealand")')  AS A ON SL.ID = A.[KEY]
GROUP BY  SL.Country_Code, SL.Hotel_Name, SL.Hotel_Code,SL.Currency_Rate,SL.Rate_Value, A.Rank
ORDER BY A.Rank Desc

..this sql brings up New Delhi as the highest rank...

Am i missing something?
Optimizing Cloud Backup for Low Bandwidth

With cloud storage prices going down a growing number of SMBs start to use it for backup storage. Unfortunately, business data volume rarely fits the average Internet speed. This article provides an overview of main Internet speed challenges and reveals backup best practices.


Author Comment

ID: 11685200
LVL 75

Expert Comment

by:Anthony Perkins
ID: 11685288
>>but as i see it that will exclude 'new' and 'zealand' something we dont want to do.<<
Than I guess I am not following you.  If your search has "new" on its own, and New Delhi is the highest than I can only assume that it must be in your database more times than others.  In other words the way, that I would expect the following search string to work:
'(new) OR (zealand) OR (new NEAR zealand) OR ("new zealand")'

Is to give equal importance to each of the phrases delimited by OR.  So it would seem quite plausible that New Delhi could show up as the highest rank.

As far as the NEAR is concerned, this is from BOL:
NEAR | ~
Indicates that the word or phrase on the left side of the NEAR or ~ operator should be approximately close to the word or phrase on the right side of the NEAR or ~ operator. Multiple proximity terms can be chained, for example:

This means that word or phrase a should be near word or phrase b, which should be near word or phrase c.

Microsoft® SQL Server™ ranks the distance between the left and right word or phrase. A low rank value (for example, 0) indicates a large distance between the two. If the specified words or phrases are far apart from each other, the query is considered to be satisfied; however, the query has a very low (0) rank value. However, if <contains_search_condition> consists of only one or more NEAR proximity terms, SQL Server does not return rows with a rank value of 0. For more information about ranking, see CONTAINSTABLE.

Author Comment

ID: 11685325
Yeah exactly, 'far apart' , whats that supposed to mean, no help MS. I will try concatanating all the searched fields into one field and se how that fairs...

" If the specified words or phrases are far apart from each other, the query is considered to be satisfied" - Not across fields by the looks of it..

Author Comment

ID: 11685638
I only want to use NEARs, i dont want any ORs in the search, that condition can never bring back accurate results.The OR in my example was just iterating the point. I want all results, but 'New Zealand' has to come before 'New' or 'Zealand'.

I have just concatinated all the search fields into one field, I am using a CONTAINSTABLE and a NEAR term on this table, much better results are being returned thanks, although that still does not solve the initial problem, 'New' and 'zealand' results are not being returned.

"if <contains_search_condition> consists of only one or more NEAR proximity terms, SQL Server does not return rows with a rank value of 0"

I guess i'll have to ask the user to broaden there search criteria if this happens.


Author Comment

ID: 11686860

Just to close this thread with the implemented solution:

1. Parse search phrase into words (removing noise words - still to do)
2. Build string from words (ord1 NEAR word2 NEAR word3)
3. Search using: CONTAINSTABLE and NEAR (word1 NEAR word2 NEAR word3 etc..) on whole lookup table
4. Return in rank order

This produces by far the best results

the FREETEXTTABLE does not appear to be very useful for accurate results on free form search text.

" If the specified words or phrases are far apart from each other, the query is considered to be satisfied" - Not across fields by the looks of it.. "
This is not true sorry, so much easier in the light of day...

Thanks all
LVL 75

Expert Comment

by:Anthony Perkins
ID: 11688447
You may want to re-read the EE Guidelines regarding grading standards:
What's the right grade to give?

And specifically this section:
C: Because Experts' reliability are often judged by their grading records, many Experts would like the opportunity to clarify if you have questions about their solutions. If you have given the Expert(s) ample time to respond to your clarification posts and you have responded to each of their posts providing requested information; or if the answers, after clarification, lack finality or do not completely address the issue presented, then a "C" grade is an option. You also have the option here of just asking Community Support to delete the question.
Remember, the Expert helping you today is probably going to be helping you next time you post a question. Give them a fair chance to earn an 'Excellent!' grade and they'll provide you with some amazing support. It's also true that a "C" is the lowest grade you can give, and the Experts know that -- so use it judiciously.

I wish you the best of luck.

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Why is this different from all of the other step by step guides?  Because I make a living as a DBA and not as a writer and I lived through this experience. Defining the name: When I talk to people they say different names on this subject stuff l…
Ever needed a SQL 2008 Database replicated/mirrored/log shipped on another server but you can't take the downtime inflicted by initial snapshot or disconnect while T-logs are restored or mirror applied? You can use SQL Server Initialize from Backup…
Via a live example, show how to extract insert data into a SQL Server database table using the Import/Export option and Bulk Insert.
Using examples as well as descriptions, and references to Books Online, show the documentation available for datatypes, explain the available data types and show how data can be passed into and out of variables.

749 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question