• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 480
  • Last Modified:

Newbie Full Text Query Question - Strange Ranking Results?


I am trying to speed up our very antiquated search facility on our site using FTS. I just need a few pointers.

I am struggling to produce accurate results on some multiple word seach criteria.

For speed i have a lookup table that is produced from about 10 tables that conatin all text that i want to search on. I use a simple FREETEXTTABLE(SearchTable, *, N'TextStringToSearch')  search on this, which then allows me to use the identities returned together with the rank to get the correct records ranked according to the FREETEXTTABLE results. This is fine and extremly fast. The problem is when using multiple words, for example 'New Zealand'. (I have a country name field) The result will bring back a dozen or so 'New Zealand' reults, then bring back 'New' Then 'Zealand' then continue to giving a low ranking score to 'New Zealand' again. Obviously i cannot tell when a user wants grouping and when they dont, this gets worse when you saerch for 'New Zealand Beaches' For example, is comes back with beaches in other countries first!

Is this normal behaviour? Is there a word proximity thing i can use? Can I weight the columns being searched? Am i completly off track? I dont need google accuracy, but it sure needs to be relevant and quick. Any pointers?

  • 5
  • 4
1 Solution
Anthony PerkinsCommented:
Since you cannot search for phrases (I know New Zealand is not a phrase, but just bare with me ...) with FREETEXTTABLE, consider using instead CONTAINSTABLE which does support it.  You can then do something like this:
CONTAINSTABLE(SearchTable, *, N'"New Zealand"')  
Anthony PerkinsCommented:
Also, just a friendly reminder, of how it works here:
What are my choices?
gravity100Author Commented:
Yes that is usefull, but as i see it that will exclude 'new' and 'zealand' something we dont want to do. I guess i could combine both FREETEXTTABLE  and CONTAINSTABLE, that would give a more accurate result, but it seems overly expensive and wouldnt solve the problem very well. How does the NEAR term work, it doesnt seem to regard text in different fields as near, how near is near? I could concatanate all the search into one field if you think that would work. I currently have the following

SELECT SL.Country_Code, SL.Hotel_Name, SL.Hotel_Code,SL.Currency_Rate,SL.Rate_Value, A.Rank
FROM   SearchLookup AS SL
INNER JOIN CONTAINSTABLE(SearchLookup, *, '(new) OR (zealand) OR (new NEAR zealand) OR ("new zealand")')  AS A ON SL.ID = A.[KEY]
GROUP BY  SL.Country_Code, SL.Hotel_Name, SL.Hotel_Code,SL.Currency_Rate,SL.Rate_Value, A.Rank
ORDER BY A.Rank Desc

..this sql brings up New Delhi as the highest rank...

Am i missing something?
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

gravity100Author Commented:
Anthony PerkinsCommented:
>>but as i see it that will exclude 'new' and 'zealand' something we dont want to do.<<
Than I guess I am not following you.  If your search has "new" on its own, and New Delhi is the highest than I can only assume that it must be in your database more times than others.  In other words the way, that I would expect the following search string to work:
'(new) OR (zealand) OR (new NEAR zealand) OR ("new zealand")'

Is to give equal importance to each of the phrases delimited by OR.  So it would seem quite plausible that New Delhi could show up as the highest rank.

As far as the NEAR is concerned, this is from BOL:
NEAR | ~
Indicates that the word or phrase on the left side of the NEAR or ~ operator should be approximately close to the word or phrase on the right side of the NEAR or ~ operator. Multiple proximity terms can be chained, for example:

This means that word or phrase a should be near word or phrase b, which should be near word or phrase c.

Microsoft® SQL Server™ ranks the distance between the left and right word or phrase. A low rank value (for example, 0) indicates a large distance between the two. If the specified words or phrases are far apart from each other, the query is considered to be satisfied; however, the query has a very low (0) rank value. However, if <contains_search_condition> consists of only one or more NEAR proximity terms, SQL Server does not return rows with a rank value of 0. For more information about ranking, see CONTAINSTABLE.
gravity100Author Commented:
Yeah exactly, 'far apart' , whats that supposed to mean, no help MS. I will try concatanating all the searched fields into one field and se how that fairs...

" If the specified words or phrases are far apart from each other, the query is considered to be satisfied" - Not across fields by the looks of it..
gravity100Author Commented:
I only want to use NEARs, i dont want any ORs in the search, that condition can never bring back accurate results.The OR in my example was just iterating the point. I want all results, but 'New Zealand' has to come before 'New' or 'Zealand'.

I have just concatinated all the search fields into one field, I am using a CONTAINSTABLE and a NEAR term on this table, much better results are being returned thanks, although that still does not solve the initial problem, 'New' and 'zealand' results are not being returned.

"if <contains_search_condition> consists of only one or more NEAR proximity terms, SQL Server does not return rows with a rank value of 0"

I guess i'll have to ask the user to broaden there search criteria if this happens.

gravity100Author Commented:

Just to close this thread with the implemented solution:

1. Parse search phrase into words (removing noise words - still to do)
2. Build string from words (ord1 NEAR word2 NEAR word3)
3. Search using: CONTAINSTABLE and NEAR (word1 NEAR word2 NEAR word3 etc..) on whole lookup table
4. Return in rank order

This produces by far the best results

the FREETEXTTABLE does not appear to be very useful for accurate results on free form search text.

" If the specified words or phrases are far apart from each other, the query is considered to be satisfied" - Not across fields by the looks of it.. "
This is not true sorry, so much easier in the light of day...

Thanks all
Anthony PerkinsCommented:
You may want to re-read the EE Guidelines regarding grading standards:
What's the right grade to give?

And specifically this section:
C: Because Experts' reliability are often judged by their grading records, many Experts would like the opportunity to clarify if you have questions about their solutions. If you have given the Expert(s) ample time to respond to your clarification posts and you have responded to each of their posts providing requested information; or if the answers, after clarification, lack finality or do not completely address the issue presented, then a "C" grade is an option. You also have the option here of just asking Community Support to delete the question.
Remember, the Expert helping you today is probably going to be helping you next time you post a question. Give them a fair chance to earn an 'Excellent!' grade and they'll provide you with some amazing support. It's also true that a "C" is the lowest grade you can give, and the Experts know that -- so use it judiciously.

I wish you the best of luck.

Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

  • 5
  • 4
Tackle projects and never again get stuck behind a technical roadblock.
Join Now