Link to home
Start Free TrialLog in
Avatar of Member_2_1348041
Member_2_1348041Flag for Ireland

asked on

Fulltext Search Mystery Result

I'm performing a fulltext search like so:

SELECT *
  FROM CONTAINSTABLE([dbo].[FulltextDiary], ([DocumentText]), '"Shaun*"') CTT

It returns a whole set of results, but one of them seems a bit odd. I check the [DocumentText] column in the relevant table, and, indeed, the string "Shaun" is nowhere to be found, let alone anything starting with the string "Shaun". I also cannot find anything looking remotely like Shaun, e.g. Shawn, Séan, etc etc etc.

How can I find out what it was that led to a particular row being included in the search result?
Avatar of Guy Hengel [angelIII / a3]
Guy Hengel [angelIII / a3]
Flag of Luxembourg image

I think it is not mentioned in the full text index anywhere where the indexed value was found actually.
so, without having the document/file itself, impossible to reproduce the issue ...
if it is a office document, anything in the file properties?
Avatar of Member_2_1348041

ASKER

Hey Guy, long time .... :-)

No, it's plain text in the [DocumentText] column. Tis a mystery to confound....

Cheers


Pino
ASKER CERTIFIED SOLUTION
Avatar of Guy Hengel [angelIII / a3]
Guy Hengel [angelIII / a3]
Flag of Luxembourg image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Ok... I think I know what the problem is. The text in the [DocumentText] column is the content of a .msg file. How it works, exactly, is irrelevant. Suffice to say that we extract the plain text from the .msg file and then store it in that column for full-text indexing.

The problem is that in our original run we extracted the COMPLETE text from the .msg. That included not only the text in the .msg file itself, but also ALL the text that was contained in ALL its attachments, and lo and behold there is the string "Shaun" in one of the attachments.

However, we since fixed the procedure and re-ran the process over all the .msg files, and now it only extracts the body text, leaving the attachments alone. So the [DocumentText] column now no longer contains the text "Shaun".

So the real issue is: why has the fulltext index not updated itself?
the full text index is not updated real-time, but only on intervals.
check up the full text indexing jobs related to the full text index(es), normally configured by the full text index catalog
https://msdn.microsoft.com/en-us/library/ms142575.aspx
I had change tracking switched on, but I found an error in the LOG, so I ran

ALTER FULLTEXT INDEX ON [dbo].[FullTextDiary] RESUME POPULATION

and for good measure I've also done

ALTER FULLTEXT INDEX ON [dbo].[FullTextDiary] SET CHANGE_TRACKING AUTO

I guess time will tell whether that will lead to it updating the index. If not, I will drop and re-create.
The only thing that I'm a little worried about is that it told me "Warning: Full-text auto propagation is currently enabled for table or indexed view" when I ran the second statement. Is that cause for concern?
Also.... [dbo].[FullTextDiary] is actually a view. Please tell me that Change Tracking still works in that situation?
Well... it seems to be doing something. When I run this

SELECT COUNT(CASE WHEN CTT.[KEY] = 5256811 THEN 1 ELSE NULL END) AS [StillThere],
       COUNT(CASE WHEN CTT.[KEY] = 5256811 THEN NULL ELSE 1 END) AS [TheRest]
  FROM CONTAINSTABLE([dbo].[FulltextDiary], ([DocumentText]), '"Shaun*"') CTT

Open in new window


(the offending row was one with key value 5256811)

The TheRest value is steadily decreasing now. I am guessing that at some point StillThere will also go back to 0.
Yup. That sorted the problem. Gee whiz.
Here's a few poinks for old time's sake :P