Link to home
Start Free TrialLog in
Avatar of gravity100
gravity100

asked on

Newbie Full Text Query Question - Strange Ranking Results?

Hi

I am trying to speed up our very antiquated search facility on our site using FTS. I just need a few pointers.

I am struggling to produce accurate results on some multiple word seach criteria.

For speed i have a lookup table that is produced from about 10 tables that conatin all text that i want to search on. I use a simple FREETEXTTABLE(SearchTable, *, N'TextStringToSearch')  search on this, which then allows me to use the identities returned together with the rank to get the correct records ranked according to the FREETEXTTABLE results. This is fine and extremly fast. The problem is when using multiple words, for example 'New Zealand'. (I have a country name field) The result will bring back a dozen or so 'New Zealand' reults, then bring back 'New' Then 'Zealand' then continue to giving a low ranking score to 'New Zealand' again. Obviously i cannot tell when a user wants grouping and when they dont, this gets worse when you saerch for 'New Zealand Beaches' For example, is comes back with beaches in other countries first!

Is this normal behaviour? Is there a word proximity thing i can use? Can I weight the columns being searched? Am i completly off track? I dont need google accuracy, but it sure needs to be relevant and quick. Any pointers?

Thanks
ASKER CERTIFIED SOLUTION
Avatar of Anthony Perkins
Anthony Perkins
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Also, just a friendly reminder, of how it works here:
What are my choices?
https://www.experts-exchange.com/Databases/Microsoft_SQL_Server/help.jsp#hi67
Avatar of gravity100
gravity100

ASKER

Yes that is usefull, but as i see it that will exclude 'new' and 'zealand' something we dont want to do. I guess i could combine both FREETEXTTABLE  and CONTAINSTABLE, that would give a more accurate result, but it seems overly expensive and wouldnt solve the problem very well. How does the NEAR term work, it doesnt seem to regard text in different fields as near, how near is near? I could concatanate all the search into one field if you think that would work. I currently have the following

SELECT SL.Country_Code, SL.Hotel_Name, SL.Hotel_Code,SL.Currency_Rate,SL.Rate_Value, A.Rank
FROM   SearchLookup AS SL
INNER JOIN CONTAINSTABLE(SearchLookup, *, '(new) OR (zealand) OR (new NEAR zealand) OR ("new zealand")')  AS A ON SL.ID = A.[KEY]
GROUP BY  SL.Country_Code, SL.Hotel_Name, SL.Hotel_Code,SL.Currency_Rate,SL.Rate_Value, A.Rank
ORDER BY A.Rank Desc

..this sql brings up New Delhi as the highest rank...

Am i missing something?
>>but as i see it that will exclude 'new' and 'zealand' something we dont want to do.<<
Than I guess I am not following you.  If your search has "new" on its own, and New Delhi is the highest than I can only assume that it must be in your database more times than others.  In other words the way, that I would expect the following search string to work:
'(new) OR (zealand) OR (new NEAR zealand) OR ("new zealand")'

Is to give equal importance to each of the phrases delimited by OR.  So it would seem quite plausible that New Delhi could show up as the highest rank.

As far as the NEAR is concerned, this is from BOL:
<quote>
NEAR | ~
Indicates that the word or phrase on the left side of the NEAR or ~ operator should be approximately close to the word or phrase on the right side of the NEAR or ~ operator. Multiple proximity terms can be chained, for example:
a NEAR b NEAR c

This means that word or phrase a should be near word or phrase b, which should be near word or phrase c.

Microsoft® SQL Server™ ranks the distance between the left and right word or phrase. A low rank value (for example, 0) indicates a large distance between the two. If the specified words or phrases are far apart from each other, the query is considered to be satisfied; however, the query has a very low (0) rank value. However, if <contains_search_condition> consists of only one or more NEAR proximity terms, SQL Server does not return rows with a rank value of 0. For more information about ranking, see CONTAINSTABLE.
</quote>
Yeah exactly, 'far apart' , whats that supposed to mean, no help MS. I will try concatanating all the searched fields into one field and se how that fairs...

" If the specified words or phrases are far apart from each other, the query is considered to be satisfied" - Not across fields by the looks of it..
I only want to use NEARs, i dont want any ORs in the search, that condition can never bring back accurate results.The OR in my example was just iterating the point. I want all results, but 'New Zealand' has to come before 'New' or 'Zealand'.

I have just concatinated all the search fields into one field, I am using a CONTAINSTABLE and a NEAR term on this table, much better results are being returned thanks, although that still does not solve the initial problem, 'New' and 'zealand' results are not being returned.

"if <contains_search_condition> consists of only one or more NEAR proximity terms, SQL Server does not return rows with a rank value of 0"

I guess i'll have to ask the user to broaden there search criteria if this happens.


Thx

Just to close this thread with the implemented solution:

1. Parse search phrase into words (removing noise words - still to do)
2. Build string from words (ord1 NEAR word2 NEAR word3)
3. Search using: CONTAINSTABLE and NEAR (word1 NEAR word2 NEAR word3 etc..) on whole lookup table
4. Return in rank order

This produces by far the best results

the FREETEXTTABLE does not appear to be very useful for accurate results on free form search text.

P.S
"
" If the specified words or phrases are far apart from each other, the query is considered to be satisfied" - Not across fields by the looks of it.. "
This is not true sorry, so much easier in the light of day...

Thanks all
You may want to re-read the EE Guidelines regarding grading standards:
What's the right grade to give?
https://www.experts-exchange.com/Databases/Microsoft_SQL_Server/help.jsp#hi73

And specifically this section:
<quote>
C: Because Experts' reliability are often judged by their grading records, many Experts would like the opportunity to clarify if you have questions about their solutions. If you have given the Expert(s) ample time to respond to your clarification posts and you have responded to each of their posts providing requested information; or if the answers, after clarification, lack finality or do not completely address the issue presented, then a "C" grade is an option. You also have the option here of just asking Community Support to delete the question.
Remember, the Expert helping you today is probably going to be helping you next time you post a question. Give them a fair chance to earn an 'Excellent!' grade and they'll provide you with some amazing support. It's also true that a "C" is the lowest grade you can give, and the Experts know that -- so use it judiciously.
</quote>

I wish you the best of luck.