Solved

SQL 2012 FileTable Full Text Search Issue

Posted on 2014-04-04
15
262 Views
Last Modified: 2015-04-29
I have a FileTable that consists mostly of resumes in Word DOC/DOCX format.  For some reason, I have certain people who's e-mail address or name is not being found even though their resume exists in the FileStream Share.  I even opened their resume, copied the e-mail address directly from it and pasted it into the query (not real e-mail address):
SELECT	* 
FROM	RESUMES.dbo.ftbl_resume_files 
WHERE	CONTAINS(file_stream, 'john.smith@yahoo.com')

Open in new window

But, I can find my own resume with my e-mail address and many others.

I've tried initiating Rebuilds on all Indexes and the Catalog, but when they're finished, the same names and e-mail addresses aren't found.  I'm new to this so I don't even know where to start looking.  Any help would be appreciated.
0
Comment
Question by:wchestnut
  • 8
  • 2
15 Comments
 

Author Comment

by:wchestnut
ID: 39978376
Here's something else I noticed when trying to troubleshoot this.  In the two SELECT statements below, the first one does not work but the second one does:
SELECT	* 
FROM	RESUMES.dbo.ftbl_resume_files 
WHERE	CONTAINS(name, 'Keay')

SELECT	* 
FROM	RESUMES.dbo.ftbl_resume_files 
WHERE	LEFT(name,4) = 'Keay'

Open in new window

0
 
LVL 39

Expert Comment

by:lcohan
ID: 39978561
BTW....."I have a FileTable that consists mostly of resumes in Word DOC/DOCX format."
DOCX is not a supported default document type in SQL 2012 - try the select below to see supported attachment types and also:
http://social.msdn.microsoft.com/Forums/en-US/2346bf93-d7e9-4720-aac2-de71d866cb3e/sql-server-2012-filetables-document-types-docx?forum=sqldatabaseengine


select * from sys.fulltext_document_types
-- as you can see DOCX is not listed as defaults therefor most likely those emails are those type of documents only.



http://technet.microsoft.com/en-us/library/ms143272.aspx

"
Behavior Changes in Full-Text Search in SQL Server 2012
SQL Server 2012 installs a new version of the word breakers and stemmers for US English (LCID 1033) and UK English (LCID 2057). However you can switch to the previous version of these components if you want to retain the previous behavior. For more information, see Change the Word Breaker Used for US English and UK English.
"

http://technet.microsoft.com/en-us/library/ms187787.aspx

"
CONTAINS can search for:
•A word or phrase.

•The prefix of a word or phrase.

•A word near another word.

•A word inflectionally generated from another (for example, the word drive is the inflectional stem of drives, drove, driving, and driven).

•A word that is a synonym of another word using a thesaurus (for example, the word "metal" can have synonyms such as "aluminum" and "steel").

"

Therefor try for instance:

SELECT      *
FROM      RESUMES.dbo.ftbl_resume_files
WHERE      CONTAINS(file_stream, 'john and smith and yahoo')
0
 

Author Comment

by:wchestnut
ID: 39979202
That statement didn't work, either.  I see where you were going with this... but supposedly the new components in 2012 are supposed to include more not less results.  An e-mail address should be considered a "phrase" according to that article and they list "cat@dog.com" should be found under the new word breakers and stemmers.  And it doesn't explain why some whole e-mail addresses are found while others are not.
0
 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 39979441
An e-mail address should be considered a "phrase" according to that article and they list "cat@dog.com" should be found under the new word breakers and stemmers.
So that means you have attempted to search for email addresses as phrases?  As in:
SELECT      *
FROM      RESUMES.dbo.ftbl_resume_files
WHERE      CONTAINS(file_stream, '"john.smith@yahoo.com"')
0
 

Author Comment

by:wchestnut
ID: 39983358
Yes.  That works for most other e-mail addresses, but not for others.  It's inconsistent as if the index doesn't include them.  The same goes for "this and that" name searches as well.  Some names are found but not others yet I can see the names un the "name" column.

I'm thinking there's a problem with the index and/or catalog as if it didn't complete the rebuild but don't know how to troubleshoot them.  I didn't see anything unusual in the SQL Logs, either.
0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 
LVL 39

Expert Comment

by:lcohan
ID: 39984208
Maybe the ones not found are located in DOCX documents as per your original question comments and I believe these are not supported by default SQL 2012.

http://social.msdn.microsoft.com/Forums/en-US/2346bf93-d7e9-4720-aac2-de71d866cb3e/sql-server-2012-filetables-document-types-docx?forum=sqldatabaseengine


Try run query below against your FT database:

select * from sys.fulltext_document_types

and see if DOCX is listed in the result set
0
 

Author Comment

by:wchestnut
ID: 39984419
That sounded like another great idea.  Unfortunately, some of the doc types not found are DOC's and some are DOCX's.  Your query resulted in this:ScreenshotMy existing queries are pulling up some Names and E-mail Addresses from DOCX files.  

Here's another tidbit.  Each night, we delete all of the documents in the FILESTREAM Share and copy a fresh set of files from our main File Server.  I used to use the DEL command, but that blew up my LOG file.  So, I created a query that does it better:
TRUNCATE TABLE RESUMES.dbo.ftbl_resume_files
CHECKPOINT
DBCC SHRINKFILE('RESUMES_log', 1000)
ALTER FULLTEXT CATALOG [ResumeFullTextCatalog] REBUILD

Open in new window

Then I have a separate job that copies the files over.  Do you think this could be part of the problem?
0
 

Author Comment

by:wchestnut
ID: 39994458
I haven't received a reply since Monday.  Is there anyone that can help??  Here's a real example again.  These queries work:
SELECT	* 
FROM	RESUMES.dbo.ftbl_resume_files 
WHERE	CONTAINS(file_stream, 'wchestnut@gmail.com')

SELECT	* 
FROM	RESUMES.dbo.ftbl_resume_files 
WHERE	LEFT(name,4) = 'Keay'

Open in new window

But neither of these work:
SELECT	* 
FROM	RESUMES.dbo.ftbl_resume_files 
WHERE	CONTAINS(name, 'Keay')

SELECT	* 
FROM	RESUMES.dbo.ftbl_resume_files 
WHERE	CONTAINS(file_stream, 'john_keay@madeup.com')

Open in new window

I verified the document does contain the (real) e-mail address.
0
 

Author Comment

by:wchestnut
ID: 39994599
Here's another update:

I opened one of the original Word DOC files that can't be found with CONTAINS but could with LEFT(name,4), then I saved it as a TXT (for kicks) and a DOCX file.  Then I went back and ran the CONTAINS query -- and both showed up, yet the original DOC does not and they're all in the same folder.

Then I opened the original Word DOC again, made a change, then Saved it (as DOC).  I was hoping an Index/Catalog would be updated, but it did not after running the CONTAINS query again.

So, for some reason, this DOC is not being indexed/cataloged but it is showing up in the FileTable.
0
 

Accepted Solution

by:
wchestnut earned 0 total points
ID: 40009837
UPDATE: I ended up building a new Windows 2012 Server and re-installing SQL Server 2012 Standard from scratch -- and I'm still having the same problem!

This time I only copied 11 files over -- a mix of DOC, DOCX and PDFs.  The CONTAINS only seems to show DOCX and PDF files but not DOC.

On another forum, someone suggested looking at the SQLFT*.LOG files.  This is what I see:
2014-04-18 19:06:34.82 spid36s     Informational: Full-text Full population initialized for table or indexed view '[RESUMES].[dbo].[ftbl_resume_files]' (table or indexed view ID '277576027', database ID '7'). Population sub-tasks: 1.
2014-04-18 19:06:41.79 spid17s     Error '0x8004fd02: The filter daemon MSFTEFD failed to load an IFilter interface for document, so it can't be indexed.' occurred during full-text index population for table or indexed view '[RESUMES].[dbo].[ftbl_resume_files]' (table or indexed view ID '277576027', database ID '7'), full-text key value '/220820988334460.222408990898065.1646681206/'. Attempt will be made to reindex it.
2014-04-18 19:06:41.79 spid17s     Error '0x8004fd02: The filter daemon MSFTEFD failed to load an IFilter interface for document, so it can't be indexed.' occurred during full-text index population for table or indexed view '[RESUMES].[dbo].[ftbl_resume_files]' (table or indexed view ID '277576027', database ID '7'), full-text key value '/256996619641691.50910503131551.2878327702/'. Attempt will be made to reindex it.
2014-04-18 19:06:41.79 spid17s     Error '0x8004fd02: The filter daemon MSFTEFD failed to load an IFilter interface for document, so it can't be indexed.' occurred during full-text index population for table or indexed view '[RESUMES].[dbo].[ftbl_resume_files]' (table or indexed view ID '277576027', database ID '7'), full-text key value '/161897244742993.83895979287730.2618536739/'. Attempt will be made to reindex it.
2014-04-18 19:06:41.80 spid17s     Error '0x8004fd02: The filter daemon MSFTEFD failed to load an IFilter interface for document, so it can't be indexed.' occurred during full-text index population for table or indexed view '[RESUMES].[dbo].[ftbl_resume_files]' (table or indexed view ID '277576027', database ID '7'), full-text key value '/147236074136024.42072411279640.2435451708/'. Attempt will be made to reindex it.
2014-04-18 19:06:42.72 spid17s     Informational: Full-text Full population completed for table or indexed view '[RESUMES].[dbo].[ftbl_resume_files]' (table or indexed view ID '277576027', database ID '7'). Number of documents processed: 11. Number of documents failed: 4. Number of documents that will be retried: 4.
2014-04-18 19:06:43.42 spid39s     A full-text retry pass of Full population started for table or indexed view '[RESUMES].[dbo].[ftbl_resume_files]'. Table or indexed view ID is '277576027'. Database ID is '7'. 
2014-04-18 19:06:45.32 spid39s     Informational: Full-text retry pass of Full population completed for table or indexed view '[RESUMES].[dbo].[ftbl_resume_files]' (table or indexed view ID '277576027', database ID '7'). Number of retry documents processed: 0. Number of documents failed: 0.

Open in new window


I tried researching that error, but I only found one blogger's site saying he thinks it's related to too much disk contention when new files are added.  But I only have 11 rather small files.

Does anyone have any more ideas?  If I can't get this resolved soon, my boss is probably going to force me to abandon using SQL natively for searches and that would just suck.
0
 

Assisted Solution

by:wchestnut
wchestnut earned 0 total points
ID: 40042938
I finally broke down and calling Microsoft Support and expected to pay the $259 fee, but it doesn't look like I'll will need to.

After some poking around and collecting a lot of data, the Microsoft Engineer didn't know off-hand what was causing the problem.  After consulting with his team, he found out what the problem is -- and it's not just happening to us.

Apparently if the header of a Word DOC file is not properly formatted or as expected for a 97-2003 format, it throws an error and won't index the file.  We opened the same files in Word, then used Save As and 97-2003 format, dropped it back into the Share and it got indexed.

So, we're waiting on a cure from Microsoft... but we're not holding our breath.
0

Featured Post

Do email signature updates give you a headache?

Do you feel like you are constantly making changes to email signatures? Are the images not formatting how you want them to? Want high-quality HTML signatures on all devices, including on mobiles and Macs? Then, let Exclaimer solve all your email signature problems today.

Join & Write a Comment

This article explains all about SQL Server Piecemeal Restore with examples in step by step manner.
International Data Corporation (IDC) prognosticates that before the current the year gets over disbursing on IT framework products to be sent in cloud environs will be $37.1B.
Via a live example combined with referencing Books Online, show some of the information that can be extracted from the Catalog Views in SQL Server.
Using examples as well as descriptions, and references to Books Online, show the different Recovery Models available in SQL Server and explain, as well as show how full, differential and transaction log backups are performed

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now