• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 197
  • Last Modified:

verity collection wont grab .pdf files

I have a policy search on my website http://www.fulton.cnyric.org/policies/default.cfm where you can either search by policy number or topic.  In the administrator I have done the collection with .doc, and .pdf - from what I can see this isn't a problem so it must be in my code.  I have the search_policy.cfm page information below.  Please let me know if you see whats wrong.  Thanks!

<CFSEARCH
   name = "GetPolicy"
   collection = "policy"
   criteria = "#Form.Criteria_1#"
   maxRows = "1000"
   startRow = "#FORM.StartRow#">

<CFIF GetPolicy.RecordCount is 0>
   <B>No files were found.  Please try again.</B>
   <CFELSE>
   <!--- At least one file fund --->
   <TABLE cellspacing=0 cellpadding=2>
   <TR bgcolor="cccccc">
      <TD><B>No</B></TD>
      <TD>&nbsp;</TD>
        <TD><B>Score</B></TD>
        <TD>&nbsp;</TD>
      <TD><B>File</B></TD>
     
   </TR>

   <CFOUTPUT query="GetPolicy" maxrows="#form.maxrows#">
   <TR bgcolor="#IIf(CurrentRow Mod 2, DE('ffffff'), DE('ffffcf'))#">

      <!--- current row information --->
      <TD>#Evaluate(Form.StartRow + CurrentRow - 1)#</TD>

      <TD>&nbsp;</TD>
        <TD>#Score#</TD>
        <TD>&nbsp;</TD>

      <!--- file name with the link returning the file --->
      <TD>
          <CFSET FileName=GetFileFromPath(Key)>
         <CFSET Ext=Right(FileName,Evaluate(Find(".", Reverse(FileName))-1))>
         <CFIF (Find(Ext,"doc,pdf") GT 0)>
            <!--- If it's a web doc, use URL returned --->
            <A target="_blank" HREF="#GetPolicy.URL#">#GetFileFromPath(Key)#</A>
         <CFELSE>
            <!--- It's not a web doc, use file path in KEY from result --->
            <A target="_blank" HREF="#Key#">#GetFileFromPath(Key)#</A>
         </CFIF>
      </TD>
     </TR>
   </CFOUTPUT>
   </TABLE>
<cfif getpolicy.recordcount GT form.maxrows>
<FORM action="search_policy.cfm" method="post">
      <CFOUTPUT>
         <INPUT type="hidden" name="Criteria_1"
          value="#Replace(Form.Criteria_1, """", "'", "ALL")#">
         <INPUT type="hidden" name="MaxRows" value="#Form.MaxRows#">
         <INPUT type="hidden" name="StartRow" value="#Evaluate(Form.StartRow + Form.MaxRows)#">
         <INPUT type="submit" value="     Next   ">
      </CFOUTPUT>
      </FORM></cfif>
 
   </CFIF>
0
ahillman
Asked:
ahillman
  • 19
  • 8
  • 8
1 Solution
 
CFDevHeadCommented:
Can you post the code that creates your verity?
0
 
ahillmanAuthor Commented:
I do it through the Administrator.
0
 
CFDevHeadCommented:
try changing this criteria = "*#LCASE(Form.Criteria_1)#*" and see what happens
0
Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
ahillmanAuthor Commented:
where?
0
 
CFDevHeadCommented:
<CFSEARCH
   name = "GetPolicy"
   collection = "policy"
criteria = "*#LCASE(Form.Criteria_1)#*"
   maxRows = "1000"
   startRow = "#FORM.StartRow#">
0
 
ahillmanAuthor Commented:
tried that and its still not happening......
0
 
CFDevHeadCommented:
try created the verity useing code instead of CFadmin
http://www.experts-exchange.com/Web/WebDevSoftware/ColdFusion/Q_21158967.html
0
 
ahillmanAuthor Commented:
I am not using  the local host I am using IIS.  Would I set the directory path to point to the wwwroot\folder to be indexed on the server?

Here is what I have tried and it still didn't seem to work - everytime I tried to run I got a site wide error.

<cflock  type="exclusive" timeout="30" name="policies">

<cfindex
   collection="policy"
   action="refresh"
   type="path"
   key="d:\CFfulton\Policies"
   Extensions=".doc, .pdf"
   recurse="yes"
   language="english">

</cflock>
0
 
ahillmanAuthor Commented:
okay - now I don't get an error - but it hasn't changed anything - I still can't get the .pdf's to show up and also if there is a policy that I look up that is ie: 5303E.1  it says there isn't such a policy when I know there is.
Any ideas? Thanks!
0
 
ahillmanAuthor Commented:
Could this have something to do with the way some pdf's were created?  If you try to look up some policies ie:6122 then the pdf shows up - others ie:0330 do not.  What do you think?  and why if you look up 0330E does it come back with not finding anything?
0
 
CFDevHeadCommented:
can you post a link to another pdf that you exisit
0
 
CFDevHeadCommented:
I think the problem is you are trying to search on scaned in docs. which in that case you can not do that because adobe turns them into image not text
0
 
CFDevHeadCommented:
Also when you search it seems to be searching on the whole words not just part of the word
example search 42 and you don't get any pdfs but search for 4200 and you get pdfs.

Good luck
And I hope this help.
0
 
ahillmanAuthor Commented:
I just created a new pdf for policy 0000 and it didn't find it when I did a search for 0000.  my thought as well was scanned in docs - I do think that is part of the problem - but what about the new one I created - and still no idea why 0330 returns results but 0330E does not?
0
 
CFDevHeadCommented:
after you created the file did you reindex the verity?
If not your search will not find it.
0
 
ahillmanAuthor Commented:
yep - almost forgot too - but remembered at that last second! :)
0
 
ahillmanAuthor Commented:
Im goin to up the points - this seems to be a bit harder than I had anticipated.  
0
 
KoldFuzunCommented:
Hi Aimee, good to talk to you again :)

There are a couple issues we can check out:

First, were the PDFs created on a Mac? I have had some issues with this, especially using Acrobat 5 & 5.5

Also, can you check the permissions on the PDFs and make sure Everyone has Full Control?

Thanks
TJ
0
 
ahillmanAuthor Commented:
No Macs here - :(   Just PC's.  Some were scanned in - so I can eliminate them from the picture, no pun intended, since I already know they won't work.  Let me take a look and see the permissions.
0
 
ahillmanAuthor Commented:
Well - I took a look and actually I don't want people to have full control - these are policies that have security on them - read and execute only and printing are allowed.  If you go under the Series # you can see the 0000.pdf its only under the search that you don't get it.
0
 
KoldFuzunCommented:
Hi Aimee

In your code lets try temprarily changing    

<CFIF (Find(Ext,"doc,pdf") GT 0)>
            <!--- If it's a web doc, use URL returned --->
            <A target="_blank" HREF="#GetPolicy.URL#">#GetFileFromPath(Key)#</A>
         <CFELSE>
            <!--- It's not a web doc, use file path in KEY from result --->
            <A target="_blank" HREF="#Key#">#GetFileFromPath(Key)#</A>
         </CFIF>

to Just


#GetFileFromPath(Key)#

Are the results different?
0
 
ahillmanAuthor Commented:
Did that - it returns only the doc still with no pdf? Go figure - its just not finding it for some reason.
0
 
ahillmanAuthor Commented:
Just what you needed to spice up your day huh TJ.  :)
0
 
KoldFuzunCommented:
hehe, uhhhh ya!

I am beginning to wonder if it isnt the PDFs themselves.

The ones that show up under 0000, do these come from the collection too?
0
 
ahillmanAuthor Commented:
Well - The items on the left of the page that have a series # are just in folders and are listed out when the page is called.  The search on the right side is the collection.
0
 
KoldFuzunCommented:
ok thats what I thought. I do not believe the PDFs are being indexed for some reason. I am going to create a test here and see what I come up with. You are still using CF 5 if I remember correctly, is this so?
0
 
ahillmanAuthor Commented:
Nope - its CFMX  - just so ya know I have a meeting to go to at 3:00 and prob. wont be back for the day - I will check things out from home tonight.  Maybe a little time away will make the answer blaze to the front!  Thanks for thinking on your end and I will def. get back to ya.
Much appreciated!
Aimee
0
 
ahillmanAuthor Commented:
By the way - if they aren't being indexed then why do some of them show up?  ie 4200 and 6122?  Frustrating! :)
0
 
KoldFuzunCommented:
Aimee I think I undestand why now. The text in those files is from a scanned image, therefore not indexable as text. You can see here:

http://www.sanative.net/aimee

My search term is "is". This should bring up many docs because many of the ones I grabbed from your site contain the word "is". Only my document came up. The link is broken intentionally :)
0
 
KoldFuzunCommented:
the way to fix this, by the way, is to have a REALLY good OCR program render them as text and create the PDF from that, or just recreate the documents as text before converting to PDF
0
 
KoldFuzunCommented:
I just noticed CFDevHead had this answer before me. I should have read this previously. Sorry CFDevHead! Aimee, you should award the points to him :)
0
 
ahillmanAuthor Commented:
final question - if the docs are actually images - then why are they showing up at all when a search is done?  like the 0330 one?
0
 
ahillmanAuthor Commented:
Thanks a bunch CFDevHead for your help and you too TJ!
0
 
KoldFuzunCommented:
the only ones that come up in the search for me are not scanned images :)
0
 
ahillmanAuthor Commented:
Gotcha - it is def. the way the individual did things when they created the documents.  Thanks! :)
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Cloud Class® Course: Microsoft Exchange Server

The MCTS: Microsoft Exchange Server 2010 certification validates your skills in supporting the maintenance and administration of the Exchange servers in an enterprise environment. Learn everything you need to know with this course.

  • 19
  • 8
  • 8
Tackle projects and never again get stuck behind a technical roadblock.
Join Now