Solved

Newbie Full Text Query Question - Strange Ranking Results?

Posted on 2004-07-31
9
417 Views
Last Modified: 2008-02-01
Hi

I am trying to speed up our very antiquated search facility on our site using FTS. I just need a few pointers.

I am struggling to produce accurate results on some multiple word seach criteria.

For speed i have a lookup table that is produced from about 10 tables that conatin all text that i want to search on. I use a simple FREETEXTTABLE(SearchTable, *, N'TextStringToSearch')  search on this, which then allows me to use the identities returned together with the rank to get the correct records ranked according to the FREETEXTTABLE results. This is fine and extremly fast. The problem is when using multiple words, for example 'New Zealand'. (I have a country name field) The result will bring back a dozen or so 'New Zealand' reults, then bring back 'New' Then 'Zealand' then continue to giving a low ranking score to 'New Zealand' again. Obviously i cannot tell when a user wants grouping and when they dont, this gets worse when you saerch for 'New Zealand Beaches' For example, is comes back with beaches in other countries first!

Is this normal behaviour? Is there a word proximity thing i can use? Can I weight the columns being searched? Am i completly off track? I dont need google accuracy, but it sure needs to be relevant and quick. Any pointers?

Thanks
0
Comment
Question by:gravity100
  • 5
  • 4
9 Comments
 
LVL 75

Accepted Solution

by:
Anthony Perkins earned 250 total points
ID: 11684973
Since you cannot search for phrases (I know New Zealand is not a phrase, but just bare with me ...) with FREETEXTTABLE, consider using instead CONTAINSTABLE which does support it.  You can then do something like this:
CONTAINSTABLE(SearchTable, *, N'"New Zealand"')  
0
 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 11684981
Also, just a friendly reminder, of how it works here:
What are my choices?
http://www.experts-exchange.com/Databases/Microsoft_SQL_Server/help.jsp#hi67
0
 

Author Comment

by:gravity100
ID: 11685184
Yes that is usefull, but as i see it that will exclude 'new' and 'zealand' something we dont want to do. I guess i could combine both FREETEXTTABLE  and CONTAINSTABLE, that would give a more accurate result, but it seems overly expensive and wouldnt solve the problem very well. How does the NEAR term work, it doesnt seem to regard text in different fields as near, how near is near? I could concatanate all the search into one field if you think that would work. I currently have the following

SELECT SL.Country_Code, SL.Hotel_Name, SL.Hotel_Code,SL.Currency_Rate,SL.Rate_Value, A.Rank
FROM   SearchLookup AS SL
INNER JOIN CONTAINSTABLE(SearchLookup, *, '(new) OR (zealand) OR (new NEAR zealand) OR ("new zealand")')  AS A ON SL.ID = A.[KEY]
GROUP BY  SL.Country_Code, SL.Hotel_Name, SL.Hotel_Code,SL.Currency_Rate,SL.Rate_Value, A.Rank
ORDER BY A.Rank Desc

..this sql brings up New Delhi as the highest rank...

Am i missing something?
0
 

Author Comment

by:gravity100
ID: 11685200
0
Backup Your Microsoft Windows Server®

Backup all your Microsoft Windows Server – on-premises, in remote locations, in private and hybrid clouds. Your entire Windows Server will be backed up in one easy step with patented, block-level disk imaging. We achieve RTOs (recovery time objectives) as low as 15 seconds.

 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 11685288
>>but as i see it that will exclude 'new' and 'zealand' something we dont want to do.<<
Than I guess I am not following you.  If your search has "new" on its own, and New Delhi is the highest than I can only assume that it must be in your database more times than others.  In other words the way, that I would expect the following search string to work:
'(new) OR (zealand) OR (new NEAR zealand) OR ("new zealand")'

Is to give equal importance to each of the phrases delimited by OR.  So it would seem quite plausible that New Delhi could show up as the highest rank.

As far as the NEAR is concerned, this is from BOL:
<quote>
NEAR | ~
Indicates that the word or phrase on the left side of the NEAR or ~ operator should be approximately close to the word or phrase on the right side of the NEAR or ~ operator. Multiple proximity terms can be chained, for example:
a NEAR b NEAR c

This means that word or phrase a should be near word or phrase b, which should be near word or phrase c.

Microsoft® SQL Server™ ranks the distance between the left and right word or phrase. A low rank value (for example, 0) indicates a large distance between the two. If the specified words or phrases are far apart from each other, the query is considered to be satisfied; however, the query has a very low (0) rank value. However, if <contains_search_condition> consists of only one or more NEAR proximity terms, SQL Server does not return rows with a rank value of 0. For more information about ranking, see CONTAINSTABLE.
</quote>
0
 

Author Comment

by:gravity100
ID: 11685325
Yeah exactly, 'far apart' , whats that supposed to mean, no help MS. I will try concatanating all the searched fields into one field and se how that fairs...

" If the specified words or phrases are far apart from each other, the query is considered to be satisfied" - Not across fields by the looks of it..
0
 

Author Comment

by:gravity100
ID: 11685638
I only want to use NEARs, i dont want any ORs in the search, that condition can never bring back accurate results.The OR in my example was just iterating the point. I want all results, but 'New Zealand' has to come before 'New' or 'Zealand'.

I have just concatinated all the search fields into one field, I am using a CONTAINSTABLE and a NEAR term on this table, much better results are being returned thanks, although that still does not solve the initial problem, 'New' and 'zealand' results are not being returned.

"if <contains_search_condition> consists of only one or more NEAR proximity terms, SQL Server does not return rows with a rank value of 0"

I guess i'll have to ask the user to broaden there search criteria if this happens.


Thx
0
 

Author Comment

by:gravity100
ID: 11686860

Just to close this thread with the implemented solution:

1. Parse search phrase into words (removing noise words - still to do)
2. Build string from words (ord1 NEAR word2 NEAR word3)
3. Search using: CONTAINSTABLE and NEAR (word1 NEAR word2 NEAR word3 etc..) on whole lookup table
4. Return in rank order

This produces by far the best results

the FREETEXTTABLE does not appear to be very useful for accurate results on free form search text.

P.S
"
" If the specified words or phrases are far apart from each other, the query is considered to be satisfied" - Not across fields by the looks of it.. "
This is not true sorry, so much easier in the light of day...

Thanks all
0
 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 11688447
You may want to re-read the EE Guidelines regarding grading standards:
What's the right grade to give?
http://www.experts-exchange.com/Databases/Microsoft_SQL_Server/help.jsp#hi73

And specifically this section:
<quote>
C: Because Experts' reliability are often judged by their grading records, many Experts would like the opportunity to clarify if you have questions about their solutions. If you have given the Expert(s) ample time to respond to your clarification posts and you have responded to each of their posts providing requested information; or if the answers, after clarification, lack finality or do not completely address the issue presented, then a "C" grade is an option. You also have the option here of just asking Community Support to delete the question.
Remember, the Expert helping you today is probably going to be helping you next time you post a question. Give them a fair chance to earn an 'Excellent!' grade and they'll provide you with some amazing support. It's also true that a "C" is the lowest grade you can give, and the Experts know that -- so use it judiciously.
</quote>

I wish you the best of luck.
0

Featured Post

Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

Join & Write a Comment

When you hear the word proxy, you may become apprehensive. This article will help you to understand Proxy and when it is useful. Let's talk Proxy for SQL Server. (Not in terms of Internet access.) Typically, you'll run into this type of problem w…
Ever needed a SQL 2008 Database replicated/mirrored/log shipped on another server but you can't take the downtime inflicted by initial snapshot or disconnect while T-logs are restored or mirror applied? You can use SQL Server Initialize from Backup…
This video shows how to set up a shell script to accept a positional parameter when called, pass that to a SQL script, accept the output from the statement back and then manipulate it in the Shell.
Viewers will learn how to use the SELECT statement in SQL to return specific rows and columns, with various degrees of sorting and limits in place.

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

21 Experts available now in Live!

Get 1:1 Help Now