Solved

MS SQL FTS Index Population Error

Posted on 2008-06-22
8
1,811 Views
Last Modified: 2012-08-13
I have a MS SQL Server 2005 running a FTS over some PDFs.
On the whole its working, but the Indexing is showing an error, which i cant work out how to resolve:

Errors were encountered during full-text index population for table or indexed view '[****].[dbo].[PDFFiles]', database '****' (table or indexed view ID '133575514', database ID '6'). Please see full-text crawl logs for details.

I understand that i should look in this directory
C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\LOG
and have a look in the event log.

It showed an entry like this:
Errors were encountered during full-text index population for table or indexed view '[****].[dbo].[PDFFiles]', database '****' (table or indexed view ID '133575514', database ID '6'). Please see full-text crawl logs for details.

I assume the crawl logs are the ones names like this: SQLFT....LOG

And there are a few there, with names like

SQLFT0000600005.LOG
SQLFT0000600006.LOG
SQLFT0000600006.LOG.1
SQLFT0000600006.LOG.2
SQLFT0000600007.LOG
etc

In those logs are Errors like this:

2008-05-12 23:59:54.18 spid21s     The component 'AcroRdIF.dll' reported error while indexing. Component path 'C:\Program Files\Adobe\Reader 8.0\Reader\AcroRdIF.dll'.
2008-05-12 23:59:54.18 spid21s     Informational: Full-text retry pass of Full population completed for table or indexed view '[****].[dbo].[PDFFiles]' (table or indexed view ID '133575514', database ID '6'). Number of retry documents processed: 1. Number of documents failed: 0.
2008-05-12 23:59:55.18 spid21s     Error '0x80004005' occurred during full-text index population for table or indexed view '[****].[dbo].[PDFFiles]' (table or indexed view ID '133575514', database ID '6'), full-text key value 0xD491EECCA4EBD7448EBC99FFF50E26CD. Attempt will be made to reindex it.

Now if that is the problem row, is that value (0xD491EECCA4EBD7448EBC99FFF50E26CD) a GUID, and is it the Primary Key of the Row containing the PDF?

I need to know how to locate the PDF files causing the issue with the indexing, and work out why they are cauing an issue.

Thanks
TheTimp
0
Comment
Question by:thetimp
  • 4
  • 4
8 Comments
 
LVL 14

Expert Comment

by:Jagdish Devaku
ID: 21843920
Hi,

try the following to resolve the issue...

Download and install the Adobe PDF iFilter from Adobe, and test it using the iFiltTst on a directory with PDF files. It does index the pdf files.

I think Adobe has recently released IFilter 6.0

Then try to reindex them in SQL server...

If you still face the issue... run the following...

use master
go
sp_fulltext_service 'load_os_resources',1
go
sp_fulltext_service 'verfiy_signature', 0
go
reconfigure with override

let me know if u face any issues...


all the best...
0
 

Author Comment

by:thetimp
ID: 21844091
Hi,
I have the Adobe PDF Filter 6.0 installed and correctly indexing 700+ PDF's.
It is that filter (as i see it) that is raising an error with some of the PDF's.

I am trying to locate which PDF's are having the issue, and then what the issue is with them.

Thanks anyway.
0
 
LVL 14

Expert Comment

by:Jagdish Devaku
ID: 21844203
Locating the pdf's causing issues is hard to find...

check the following issues...

·         Verify that the PDF file is not damaged or password protected. The PDF iFilter cannot index PDF files that cannot be opened in Acrobat. Open the PDF file in Adobe Reader 6, Adobe Acrobat 6 Standard or Adobe Acrobat 6 Professional to verify that the file is not damaged or password protected.

·         Verify that the PDF file does not contain Security Permissions prohibiting you from searching the file.  Open the file and choose File > Document Properties > Security.  Verify that the value for "Content Copying or Extraction" is "Allowed". If it is "Not Allowed", the PDF iFilter cannot index the text.

·         Verify that the PDF file contains searchable text. You can verify if there is searchable text in the PDF file by opening it in Reader or Acrobat and selecting the text with the Text Select tool.  Or, you can open the file and using the Edit > Select All command to select text.  If neither method highlights any text, it is likely that the PDF contains a source image of text which the PDF iFilter cannot index.  You can then use a tool like Acrobats Paper Capture tool (Document > Paper Capture > Start Capture&) to convert the source image of text into searchable text which PDF iFilter can index.

·         Verify that the PDF file contains text that is properly encoded.  If the file contains searchable text, yet the Acrobat Search tool (Edit > Search) cannot find the text, then the text may not be properly encoded.  Select a word using the Text Select tool, copy it to your clipboard, and paste it into a text editor (e.g. Notepad).  If the word is not legible in the text editor, the file contains text or fonts that are not properly encoded.

·         Ensure the PDF file has a ".pdf" filename extension. Clients, such as Index Services, find the PDF iFilter by looking up the filename's extension in the Windows Registry.

·         If the original document was created in a Windows 95 application and uses TrueType fonts, either reformat the text with Type 1 fonts or use Adobe PDF Printer to recreate the PDF file. In Windows 95, PDF files generated from PostScript files containing TrueType fonts include text that is not searchable.  Both the Microsoft and Adobe PostScript® printer drivers remap TrueType fonts at PostScript generation time, which disconnects the character you see on screen from that character in the font (for more information, see http://www.adobe.com/support/techdocs/12f6e.htm ).


0
 

Author Comment

by:thetimp
ID: 21844261
Excellent, that's what I can do once I locate which of the 700+ Pdf's the Filter is having trouble with...

The PDF's are only in the database,  so to test them I would have to extract them from the database first and some are over 200mb..

I need to know how to locate the specific PDF files causing the issue with the indexing by translating the error message...I am happy to extract and correct/ replace those.

Thanks
TheTimp
0
Free Gift Card with Acronis Backup Purchase!

Backup any data in any location: local and remote systems, physical and virtual servers, private and public clouds, Macs and PCs, tablets and mobile devices, & more! For limited time only, buy any Acronis backup products and get a FREE Amazon/Best Buy gift card worth up to $200!

 
LVL 14

Accepted Solution

by:
Jagdish Devaku earned 500 total points
ID: 21844754
i think the below code works...

it gives you the name of the pdf in which letter 'a' is there... all readable pdf's will be listed... out of that we need to find out the unreadable pdf's....

SELECT FT_TBL.id
    ,FT_TBL.pdf_name
FROM <table_name> AS FT_TBL
    INNER JOIN FREETEXTTABLE(<table_name>, <pdf_file_column_name>,
        'a') AS KEY_TBL
        ON FT_TBL.CategoryID = KEY_TBL.[KEY];

all the best...
0
 

Author Comment

by:thetimp
ID: 21845091
Ok now we are getting somewhere..

the 'a' didn't work but I found a common word for most projects , but it gave me some results - brilliant!. (See the attached code)
It exposed some PDFs that had been secured. (I can fix that) :-)

But it also found a PDF that is not secured and I can search with acrobat reader that contains the <common word>, and returns results inside of Acrobat...perhaps i will get the PDF re-created from the original document and see if the problem persists?. (FYI Properties say it was created with Distiller 8.1.0 , PDF version  1.4  for Acrobat 5.x)

But getting back to my question how do the logs point to those files..It seems the ID's in the logs are not too helpful?
The full-text key value in the log is not in my list of PDFFilesID's.... is it not an ID.. how does it relate?

Thanks
TheTimp
Select cast(PDFFilesID as NvarChar(50)),PDFName from

PDFFiles

Where

PDFFilesID not in 

(

SELECT FT_TBL.PDFFilesID   

FROM dbo.PDFFiles AS FT_TBL

INNER JOIN FREETEXTTABLE(PDFFiles, *, '<Common Word>') AS KEY_TBL

       

        ON FT_TBL.PDFFilesID = KEY_TBL.[KEY])

Order By cast(PDFFilesID as NvarChar(50));

Open in new window

0
 
LVL 14

Expert Comment

by:Jagdish Devaku
ID: 21852588
I think its the ID generated by full text while creating the log....
0
 

Author Comment

by:thetimp
ID: 21852769
OK,

So there is no relation between the ID's in the Log and the table.
How disappointing.

You provided a Solution to the issue so I will considier this issue closed.

Thanks
TheTimp
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
ms sql stored procedure 22 77
SQL Mirror and Replication 5 26
Test a query 23 19
SQL Server 2012 Express to Full 5 26
Let's review the features of new SQL Server 2012 (Denali CTP3). It listed as below: PERCENT_RANK(): PERCENT_RANK() function will returns the percentage value of rank of the values among its group. PERCENT_RANK() function value always in be…
In this article I will describe the Copy Database Wizard method as one possible migration process and I will add the extra tasks needed for an upgrade when and where is applied so it will cover all.
Familiarize people with the process of utilizing SQL Server functions from within Microsoft Access. Microsoft Access is a very powerful client/server development tool. One of the SQL Server objects that you can interact with from within Microsoft Ac…
Via a live example, show how to backup a database, simulate a failure backup the tail of the database transaction log and perform the restore.

746 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now