Full Text Search Problems MSSQL

Hi,

I'm having a problem where I have a full-text catalogue setup on a table "candidates" where the "CVText" field is the text that is indexed. Some our our clients have noticed that, when refering to a candidates name, no CV's are returned even thou the "CVText" field clearly contains it. The text is saved from our server via the Word Object and saved as Word HTML which sucks really as Word doesn't make a good job at all.

Would the HTML be causing a problem? Here is an example of "CVText":

------------------
<div class=Section1>
<p class=MsoTitle><i style='mso-bidi-font-style:normal'><span style='font-size:
24.0pt;font-family:"Arial Black";color:black;text-decoration:none;text-underline:
none'>ANDRE SANTANA<o:p></o:p></span></i></p>
................. etc
------------------

And part of the SQL statement is:
... INNER JOIN CONTAINSTABLE(candidates, *, '("ANDRE" AND "SANTANA")') AS K ...

The HTML, as bad as it is, does contain "ANDRE" AND "SANTANA" so why is it not being returned?

Thanks.
LVL 9
blandyukAsked:
Who is Participating?
 
Anthony PerkinsCommented:
>>Would the HTML be causing a problem? Here is an example of "CVText":<<
That could well be the case.  But first I would correct your syntax as follows (no paranthesis):

INNER JOIN CONTAINSTABLE(candidates, *, '"ANDRE" AND "SANTANA"') AS K ...

If that still fails than try this:
INNER JOIN CONTAINSTABLE(candidates, *, '">ANDRE" AND "SANTANA*"') AS K ...

Of course that could be a problem if the order of the names is reversed.
0
 
rmacfadyenCommented:
Haven't worked with full text catalogues... but I do seem to recall that you'll need a second column to specify the DocumentType (file extension) of the column being indexed. The DocumentType is used to pick the appropriate filter... and its the filtering that is responsible for pulling out the actual bits of text.

This may only apply to columns of Image type.

See the "Filtering Supported File Types" page in SQL Books Online.

Rob
0
 
blandyukAuthor Commented:
Ah! Thanks acperkins, It's returning the record now. The key attribute there is the * which I assume is a wildcard.

What other parameters can be specified? I noticed the > in the query as well. Just looked in our MCDBA book but there was no mention of these :(

Thanks.
0
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

 
rmacfadyenCommented:
One "advantage" to switching to a Image column and using a filter might be the index will consume considerably less data. Word produces fairly large HTML files and the HTML Filter will essentially pull out just the displayed text for the filter.

Regards,

Rob
0
 
Anthony PerkinsCommented:
>>The key attribute there is the * which I assume is a wildcard. <<
Not exactly, think of it as a Prefix Search and won't go wrong.

>>What other parameters can be specified?<<
They are all covered in BOL, namely:
simple
prefix
generation
proximity
weighted
boolean

>>I noticed the > in the query as well. Just looked in our MCDBA book but there was no mention of these <<
Not surprising as they have no meaning, other than they match the opening HTML tag :)
0
 
blandyukAuthor Commented:
Ah! :) I see you were referring the the HTML with the > lol. I'll not be storing it as HTML from now on anyway. I never designed the system so I'm currently in the process of changing a lot of it.

Thanks.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.