Solved

browser-based file search for a Document Imaging System

Posted on 2004-10-22
319 Views
Last Modified: 2013-11-19
Hello,

We have a document imaging system where paper files are converted into searchable pdfs.  Currently, users access these files by way of a mapped drive.  Searching within the documents works very well.  Even searching between documents for content works well.  However, finding the file you're looking for in the first place is a pain.  They're all named according to case number, which is a 1-8 character number, then further broken down into folders by year of the case (ie. all the 2004 cases go into a 2004 folder within the DIS).  Scrolling through hundreds of files looking for, say, 35873, is getting to be a major pain as this thing grows, moreso if you don't know which year the case is from (although we have to break it up that way for archival purposes).

I would like to put a search engine on our Intranet site that does nothing more than reach out to that file server and look at the titles of the pdf's.  The user should be able to put in the case number they're looking for and press submit.  Then, they get the result, which should be a link to the file.  They click the link and, being as it's a pdf, the link will get opened by Adobe Acrobat within the browser.

The users can't edit these files, so opening read-only is fine.  I'm looking for a largely turnkey solution.  I don't mind getting in and editing html or asp, but I don't know enough to write it from scratch.  Most of the search engines I've looked at don't return file titles, which is all I want.  They want to open and parse the page, which would normally be asp or html, and return data from within the file.  Acrobat can already do that within the pdfs.  I only want file titles.  Has anybody written or know of a browser-based search engine that can search for and return the name of and a link to a pdf file?

This will be installed on an IIS6 Intranet site running on Windows 2003.  ASP, ASP.NET, vbscript, and javascript are all supported.  The file server is also Windows 2003 (actually, part of a 2-node DFS root.  The files in question are within the DFS root).  All part of a 2003 native Active Directory.  If there's an issue going from the intranet server to a different server, I'm not opposed to, although I'd prefer not to, installing IIS on the file server itself.

I'm going to start this question at 250 points.  That would be for a simple search engine that can search through the entire DIS for a file name, and provide that file name as a clickable link to the file.  For 500 points, I'd also like a drop-down list that lets the user narrow the search path.  Something like this, with each line being a searchable selection:

Search entire DIS (default selection)
Central Records (this would search only the Central Records subfolder)
  2004  (this would search only the 2004 folder within Central Records)
  2003
  2002
Medical Records
  2004
  2003
Visitation Logs
  2004
  2003
etc.

Thanks.
0
Question by:WerewolfTA
    9 Comments
     
    LVL 18

    Assisted Solution

    by:arantius
    This page should be useful for you

    http://planet-source-code.com/vb/scripts/ShowCode.asp?lngWId=4&txtCodeId=6210

    It says it only works with an old version of IIS, but it's also from 2000.  There's probably (hopefully) a similar, newer version available today.

    Possibly http://www.experts-exchange.com/Web/Q_21007777.html
    0
     

    Accepted Solution

    by:
    One little tool that may be of use is "zoom search" (http://www.wrensoft.com/zoom/). It's not a real time search indexer, but I beleive you can configure it to run on a schedule. There are add-ins to be able to search into documents, but it looks like you are'nt after this functionality for your PDF's. It will auto generate php or asp for the search pages, that you can then customise to suite your site, and the search indexes it creates seem to work prety quickly.

    Certainly worth a look. I use it for a couple of client sites, and it works a treat for only $99
    0
     
    LVL 4

    Author Comment

    by:WerewolfTA
    Thanks for the posts.  A little more than what we were looking for.  We had taken a look at the pdf ifilter.  However, it and most of the other provided solutions so far do more than what we want in that they try to search through the contents of the files.  I only want to be able to search on the file names, which I imagine would be a heck of lot faster and put less of a strain on the server(s).  That said, these are good suggestions for anyone looking to search through the files, and the zoom search looks impressive.

    I believe I need a search engine that uses the FSO object, or so I got from the web wiz forums (http://www.webwizguide.info), trying to see if I could modify their site search to do what I wanted.  

    The link below and the sublinks coming off of it seem to dance around what I'm looking for, but don't come quite close enough that I'm able to use what they have to make anything workable.  Probably would if I knew a little more.  I can see where you can set the objFolder.Name and where you can have the files returned as objFile.Name, although I don't know if it searches subfolders, which I would need.  But I just don't know how to take user input and return links to the files whose objFile.Names are like the user entry or to set up a drop down list where I could input subfolder names.  I think this is the right track, but I don't know how to put it all together.  Is this the right track, and if so, can somebody put it together for me or help me put it together?
    http://www.4guysfromrolla.com/webtech/faq/FileSystemObject/faq5.shtml

    Thanks for the help, so far.
    0
     
    LVL 5

    Assisted Solution

    by:crimson117
    Google Search Appliance. http://www.google.com/appliance/features.html

    You can restrict it to certain file types (pdf's) and can configure the results to show however you like.

    Only $32,000 for one that'll index up to 150,000 documents :-)


    But seriously folks...

    That said, if you have the ability to modify the DIS, I'd say that anytime you scan a document and add it to the repositiry, you should execute a simple script that writes a record to a database containing, for example, the year and the filename or case number.  Then you could write a very simple search in asp and mysql.  You would need to do some tricks to fill the database with your existing records, but then you'd be on easy street.
    0
     
    LVL 18

    Assisted Solution

    by:arantius
    Just the file names?

    In that case you just need a file list (dir /b /s) and a teeny program to read the file line by line and look for the search terms.

    Or, possibly, load the list line by line into a database and do an SQL query on it.
    0
     
    LVL 1

    Expert Comment

    by:CyberAdy
    Hi,

    I have some thing similar written in ASP.NET, and it can be customised in 2-3 days for your special needs. can I ask for money for this solution (PS: i dont know if its correct to ask money in this board, so please apologise for my naive question)

    Thanks and Regards
    CyberAdy
    0
     
    LVL 5

    Expert Comment

    by:crimson117
    No, you may not solicit business on this site.
    0
     
    LVL 4

    Author Comment

    by:WerewolfTA
    Ok, I modified Web Wiz's Site Search to search on file title.  It returns that, I can click the link and the pdf opens in the browser.  Yeah!  Thanks to everyone who's posted so far, there've been some good ideas (searching from an index is certainly more efficient than searching through the directories every time, but we do a lot of moving and merging of files, so the realtime search probably works a little better for us; however, were that not the case, I'd like that solution better than mine).

    I'm going to leave this open a little longer because I've been unsuccessful at restricting the searches to certain subfolders.  I'll post more on what I've done to try to get that to work a little later, but for now I have to go to class.  Peace!
    0
     
    LVL 4

    Author Comment

    by:WerewolfTA
    Alright fellas.  I got it to do what I want it to do, and that is search by file name and be able to restrict the search to subfolders.  I appreciate the help and ideas.
    0

    Write Comment

    Please enter a first name

    Please enter a last name

    We will never share this with anyone.

    Featured Post

    IT, Stop Being Called Into Every Meeting

    Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

    The Client Need Led Us to RSS I recently had an investment company ask me how they might notify their constituents about their newsworthy publications.  Probably you would think "Facebook" or "Twitter" but this is an interesting client.  Their cons…
    The first time you look at a web page and its source code, you are probably a little intimidated by the use of symbols and jargon that really looks foreign to you. You might not even know where to start to begin learning what it all means. That’…
    Viewers will get an overview of the benefits and risks of using Bitcoin to accept payments. What Bitcoin is: Legality: Risks: Benefits: Which businesses are best suited?: Other things you should know: How to get started:
    The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.

    846 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    12 Experts available now in Live!

    Get 1:1 Help Now