Solved

verity collection wont grab .pdf files

Posted on 2004-10-21
188 Views
Last Modified: 2013-12-24
I have a policy search on my website http://www.fulton.cnyric.org/policies/default.cfm where you can either search by policy number or topic.  In the administrator I have done the collection with .doc, and .pdf - from what I can see this isn't a problem so it must be in my code.  I have the search_policy.cfm page information below.  Please let me know if you see whats wrong.  Thanks!

<CFSEARCH
   name = "GetPolicy"
   collection = "policy"
   criteria = "#Form.Criteria_1#"
   maxRows = "1000"
   startRow = "#FORM.StartRow#">

<CFIF GetPolicy.RecordCount is 0>
   <B>No files were found.  Please try again.</B>
   <CFELSE>
   <!--- At least one file fund --->
   <TABLE cellspacing=0 cellpadding=2>
   <TR bgcolor="cccccc">
      <TD><B>No</B></TD>
      <TD>&nbsp;</TD>
        <TD><B>Score</B></TD>
        <TD>&nbsp;</TD>
      <TD><B>File</B></TD>
     
   </TR>

   <CFOUTPUT query="GetPolicy" maxrows="#form.maxrows#">
   <TR bgcolor="#IIf(CurrentRow Mod 2, DE('ffffff'), DE('ffffcf'))#">

      <!--- current row information --->
      <TD>#Evaluate(Form.StartRow + CurrentRow - 1)#</TD>

      <TD>&nbsp;</TD>
        <TD>#Score#</TD>
        <TD>&nbsp;</TD>

      <!--- file name with the link returning the file --->
      <TD>
          <CFSET FileName=GetFileFromPath(Key)>
         <CFSET Ext=Right(FileName,Evaluate(Find(".", Reverse(FileName))-1))>
         <CFIF (Find(Ext,"doc,pdf") GT 0)>
            <!--- If it's a web doc, use URL returned --->
            <A target="_blank" HREF="#GetPolicy.URL#">#GetFileFromPath(Key)#</A>
         <CFELSE>
            <!--- It's not a web doc, use file path in KEY from result --->
            <A target="_blank" HREF="#Key#">#GetFileFromPath(Key)#</A>
         </CFIF>
      </TD>
     </TR>
   </CFOUTPUT>
   </TABLE>
<cfif getpolicy.recordcount GT form.maxrows>
<FORM action="search_policy.cfm" method="post">
      <CFOUTPUT>
         <INPUT type="hidden" name="Criteria_1"
          value="#Replace(Form.Criteria_1, """", "'", "ALL")#">
         <INPUT type="hidden" name="MaxRows" value="#Form.MaxRows#">
         <INPUT type="hidden" name="StartRow" value="#Evaluate(Form.StartRow + Form.MaxRows)#">
         <INPUT type="submit" value="     Next   ">
      </CFOUTPUT>
      </FORM></cfif>
 
   </CFIF>
0
Question by:ahillman
    35 Comments
     
    LVL 9

    Expert Comment

    by:CFDevHead
    Can you post the code that creates your verity?
    0
     

    Author Comment

    by:ahillman
    I do it through the Administrator.
    0
     
    LVL 9

    Expert Comment

    by:CFDevHead
    try changing this criteria = "*#LCASE(Form.Criteria_1)#*" and see what happens
    0
     

    Author Comment

    by:ahillman
    where?
    0
     
    LVL 9

    Expert Comment

    by:CFDevHead
    <CFSEARCH
       name = "GetPolicy"
       collection = "policy"
    criteria = "*#LCASE(Form.Criteria_1)#*"
       maxRows = "1000"
       startRow = "#FORM.StartRow#">
    0
     

    Author Comment

    by:ahillman
    tried that and its still not happening......
    0
     
    LVL 9

    Expert Comment

    by:CFDevHead
    try created the verity useing code instead of CFadmin
    http://www.experts-exchange.com/Web/WebDevSoftware/ColdFusion/Q_21158967.html
    0
     

    Author Comment

    by:ahillman
    I am not using  the local host I am using IIS.  Would I set the directory path to point to the wwwroot\folder to be indexed on the server?

    Here is what I have tried and it still didn't seem to work - everytime I tried to run I got a site wide error.

    <cflock  type="exclusive" timeout="30" name="policies">

    <cfindex
       collection="policy"
       action="refresh"
       type="path"
       key="d:\CFfulton\Policies"
       Extensions=".doc, .pdf"
       recurse="yes"
       language="english">

    </cflock>
    0
     

    Author Comment

    by:ahillman
    okay - now I don't get an error - but it hasn't changed anything - I still can't get the .pdf's to show up and also if there is a policy that I look up that is ie: 5303E.1  it says there isn't such a policy when I know there is.
    Any ideas? Thanks!
    0
     

    Author Comment

    by:ahillman
    Could this have something to do with the way some pdf's were created?  If you try to look up some policies ie:6122 then the pdf shows up - others ie:0330 do not.  What do you think?  and why if you look up 0330E does it come back with not finding anything?
    0
     
    LVL 9

    Expert Comment

    by:CFDevHead
    can you post a link to another pdf that you exisit
    0
     
    LVL 9

    Accepted Solution

    by:
    I think the problem is you are trying to search on scaned in docs. which in that case you can not do that because adobe turns them into image not text
    0
     
    LVL 9

    Expert Comment

    by:CFDevHead
    Also when you search it seems to be searching on the whole words not just part of the word
    example search 42 and you don't get any pdfs but search for 4200 and you get pdfs.

    Good luck
    And I hope this help.
    0
     

    Author Comment

    by:ahillman
    I just created a new pdf for policy 0000 and it didn't find it when I did a search for 0000.  my thought as well was scanned in docs - I do think that is part of the problem - but what about the new one I created - and still no idea why 0330 returns results but 0330E does not?
    0
     
    LVL 9

    Expert Comment

    by:CFDevHead
    after you created the file did you reindex the verity?
    If not your search will not find it.
    0
     

    Author Comment

    by:ahillman
    yep - almost forgot too - but remembered at that last second! :)
    0
     

    Author Comment

    by:ahillman
    Im goin to up the points - this seems to be a bit harder than I had anticipated.  
    0
     
    LVL 2

    Expert Comment

    by:KoldFuzun
    Hi Aimee, good to talk to you again :)

    There are a couple issues we can check out:

    First, were the PDFs created on a Mac? I have had some issues with this, especially using Acrobat 5 & 5.5

    Also, can you check the permissions on the PDFs and make sure Everyone has Full Control?

    Thanks
    TJ
    0
     

    Author Comment

    by:ahillman
    No Macs here - :(   Just PC's.  Some were scanned in - so I can eliminate them from the picture, no pun intended, since I already know they won't work.  Let me take a look and see the permissions.
    0
     

    Author Comment

    by:ahillman
    Well - I took a look and actually I don't want people to have full control - these are policies that have security on them - read and execute only and printing are allowed.  If you go under the Series # you can see the 0000.pdf its only under the search that you don't get it.
    0
     
    LVL 2

    Expert Comment

    by:KoldFuzun
    Hi Aimee

    In your code lets try temprarily changing    

    <CFIF (Find(Ext,"doc,pdf") GT 0)>
                <!--- If it's a web doc, use URL returned --->
                <A target="_blank" HREF="#GetPolicy.URL#">#GetFileFromPath(Key)#</A>
             <CFELSE>
                <!--- It's not a web doc, use file path in KEY from result --->
                <A target="_blank" HREF="#Key#">#GetFileFromPath(Key)#</A>
             </CFIF>

    to Just


    #GetFileFromPath(Key)#

    Are the results different?
    0
     

    Author Comment

    by:ahillman
    Did that - it returns only the doc still with no pdf? Go figure - its just not finding it for some reason.
    0
     

    Author Comment

    by:ahillman
    Just what you needed to spice up your day huh TJ.  :)
    0
     
    LVL 2

    Expert Comment

    by:KoldFuzun
    hehe, uhhhh ya!

    I am beginning to wonder if it isnt the PDFs themselves.

    The ones that show up under 0000, do these come from the collection too?
    0
     

    Author Comment

    by:ahillman
    Well - The items on the left of the page that have a series # are just in folders and are listed out when the page is called.  The search on the right side is the collection.
    0
     
    LVL 2

    Expert Comment

    by:KoldFuzun
    ok thats what I thought. I do not believe the PDFs are being indexed for some reason. I am going to create a test here and see what I come up with. You are still using CF 5 if I remember correctly, is this so?
    0
     

    Author Comment

    by:ahillman
    Nope - its CFMX  - just so ya know I have a meeting to go to at 3:00 and prob. wont be back for the day - I will check things out from home tonight.  Maybe a little time away will make the answer blaze to the front!  Thanks for thinking on your end and I will def. get back to ya.
    Much appreciated!
    Aimee
    0
     

    Author Comment

    by:ahillman
    By the way - if they aren't being indexed then why do some of them show up?  ie 4200 and 6122?  Frustrating! :)
    0
     
    LVL 2

    Expert Comment

    by:KoldFuzun
    Aimee I think I undestand why now. The text in those files is from a scanned image, therefore not indexable as text. You can see here:

    http://www.sanative.net/aimee

    My search term is "is". This should bring up many docs because many of the ones I grabbed from your site contain the word "is". Only my document came up. The link is broken intentionally :)
    0
     
    LVL 2

    Expert Comment

    by:KoldFuzun
    the way to fix this, by the way, is to have a REALLY good OCR program render them as text and create the PDF from that, or just recreate the documents as text before converting to PDF
    0
     
    LVL 2

    Expert Comment

    by:KoldFuzun
    I just noticed CFDevHead had this answer before me. I should have read this previously. Sorry CFDevHead! Aimee, you should award the points to him :)
    0
     

    Author Comment

    by:ahillman
    final question - if the docs are actually images - then why are they showing up at all when a search is done?  like the 0330 one?
    0
     

    Author Comment

    by:ahillman
    Thanks a bunch CFDevHead for your help and you too TJ!
    0
     
    LVL 2

    Expert Comment

    by:KoldFuzun
    the only ones that come up in the search for me are not scanned images :)
    0
     

    Author Comment

    by:ahillman
    Gotcha - it is def. the way the individual did things when they created the documents.  Thanks! :)
    0

    Write Comment

    Please enter a first name

    Please enter a last name

    We will never share this with anyone.

    Featured Post

    Top 6 Sources for Identifying Threat Actor TTPs

    Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

    Most ColdFusion developers get confused between the CFSet, Duplicate, and Structcopy methods of copying a Structure, especially which one to use when. This Article will explain the differences in the approaches with examples; therefore, after readin…
    Meet the world's only “Transparent Cloud™” from Superb Internet Corporation. Now, you can experience firsthand a cloud platform that consistently outperforms Amazon Web Services (AWS), IBM’s Softlayer, and Microsoft’s Azure when it comes to CPU and …
    Hi everyone! This is Experts Exchange customer support.  This quick video will show you how to change your primary email address.  If you have any questions, then please Write a Comment below!
    how to add IIS SMTP to handle application/Scanner relays into office 365.

    846 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    7 Experts available now in Live!

    Get 1:1 Help Now