browser-based file search for a Document Imaging System


We have a document imaging system where paper files are converted into searchable pdfs.  Currently, users access these files by way of a mapped drive.  Searching within the documents works very well.  Even searching between documents for content works well.  However, finding the file you're looking for in the first place is a pain.  They're all named according to case number, which is a 1-8 character number, then further broken down into folders by year of the case (ie. all the 2004 cases go into a 2004 folder within the DIS).  Scrolling through hundreds of files looking for, say, 35873, is getting to be a major pain as this thing grows, moreso if you don't know which year the case is from (although we have to break it up that way for archival purposes).

I would like to put a search engine on our Intranet site that does nothing more than reach out to that file server and look at the titles of the pdf's.  The user should be able to put in the case number they're looking for and press submit.  Then, they get the result, which should be a link to the file.  They click the link and, being as it's a pdf, the link will get opened by Adobe Acrobat within the browser.

The users can't edit these files, so opening read-only is fine.  I'm looking for a largely turnkey solution.  I don't mind getting in and editing html or asp, but I don't know enough to write it from scratch.  Most of the search engines I've looked at don't return file titles, which is all I want.  They want to open and parse the page, which would normally be asp or html, and return data from within the file.  Acrobat can already do that within the pdfs.  I only want file titles.  Has anybody written or know of a browser-based search engine that can search for and return the name of and a link to a pdf file?

This will be installed on an IIS6 Intranet site running on Windows 2003.  ASP, ASP.NET, vbscript, and javascript are all supported.  The file server is also Windows 2003 (actually, part of a 2-node DFS root.  The files in question are within the DFS root).  All part of a 2003 native Active Directory.  If there's an issue going from the intranet server to a different server, I'm not opposed to, although I'd prefer not to, installing IIS on the file server itself.

I'm going to start this question at 250 points.  That would be for a simple search engine that can search through the entire DIS for a file name, and provide that file name as a clickable link to the file.  For 500 points, I'd also like a drop-down list that lets the user narrow the search path.  Something like this, with each line being a searchable selection:

Search entire DIS (default selection)
Central Records (this would search only the Central Records subfolder)
  2004  (this would search only the 2004 folder within Central Records)
Medical Records
Visitation Logs

Who is Participating?
catmurConnect With a Mentor Commented:
One little tool that may be of use is "zoom search" ( It's not a real time search indexer, but I beleive you can configure it to run on a schedule. There are add-ins to be able to search into documents, but it looks like you are'nt after this functionality for your PDF's. It will auto generate php or asp for the search pages, that you can then customise to suite your site, and the search indexes it creates seem to work prety quickly.

Certainly worth a look. I use it for a couple of client sites, and it works a treat for only $99
arantiusConnect With a Mentor Commented:
This page should be useful for you

It says it only works with an old version of IIS, but it's also from 2000.  There's probably (hopefully) a similar, newer version available today.

WerewolfTAAuthor Commented:
Thanks for the posts.  A little more than what we were looking for.  We had taken a look at the pdf ifilter.  However, it and most of the other provided solutions so far do more than what we want in that they try to search through the contents of the files.  I only want to be able to search on the file names, which I imagine would be a heck of lot faster and put less of a strain on the server(s).  That said, these are good suggestions for anyone looking to search through the files, and the zoom search looks impressive.

I believe I need a search engine that uses the FSO object, or so I got from the web wiz forums (, trying to see if I could modify their site search to do what I wanted.  

The link below and the sublinks coming off of it seem to dance around what I'm looking for, but don't come quite close enough that I'm able to use what they have to make anything workable.  Probably would if I knew a little more.  I can see where you can set the objFolder.Name and where you can have the files returned as objFile.Name, although I don't know if it searches subfolders, which I would need.  But I just don't know how to take user input and return links to the files whose objFile.Names are like the user entry or to set up a drop down list where I could input subfolder names.  I think this is the right track, but I don't know how to put it all together.  Is this the right track, and if so, can somebody put it together for me or help me put it together?

Thanks for the help, so far.
Easily Design & Build Your Next Website

Squarespace’s all-in-one platform gives you everything you need to express yourself creatively online, whether it is with a domain, website, or online store. Get started with your free trial today, and when ready, take 10% off your first purchase with offer code 'EXPERTS'.

crimson117Connect With a Mentor Commented:
Google Search Appliance.

You can restrict it to certain file types (pdf's) and can configure the results to show however you like.

Only $32,000 for one that'll index up to 150,000 documents :-)

But seriously folks...

That said, if you have the ability to modify the DIS, I'd say that anytime you scan a document and add it to the repositiry, you should execute a simple script that writes a record to a database containing, for example, the year and the filename or case number.  Then you could write a very simple search in asp and mysql.  You would need to do some tricks to fill the database with your existing records, but then you'd be on easy street.
arantiusConnect With a Mentor Commented:
Just the file names?

In that case you just need a file list (dir /b /s) and a teeny program to read the file line by line and look for the search terms.

Or, possibly, load the list line by line into a database and do an SQL query on it.

I have some thing similar written in ASP.NET, and it can be customised in 2-3 days for your special needs. can I ask for money for this solution (PS: i dont know if its correct to ask money in this board, so please apologise for my naive question)

Thanks and Regards
No, you may not solicit business on this site.
WerewolfTAAuthor Commented:
Ok, I modified Web Wiz's Site Search to search on file title.  It returns that, I can click the link and the pdf opens in the browser.  Yeah!  Thanks to everyone who's posted so far, there've been some good ideas (searching from an index is certainly more efficient than searching through the directories every time, but we do a lot of moving and merging of files, so the realtime search probably works a little better for us; however, were that not the case, I'd like that solution better than mine).

I'm going to leave this open a little longer because I've been unsuccessful at restricting the searches to certain subfolders.  I'll post more on what I've done to try to get that to work a little later, but for now I have to go to class.  Peace!
WerewolfTAAuthor Commented:
Alright fellas.  I got it to do what I want it to do, and that is search by file name and be able to restrict the search to subfolders.  I appreciate the help and ideas.
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.