• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 383
  • Last Modified:

Searching large PDF files

Hello All,
I have a network share with almost 3 TB worth of PDF files (legal briefs, maps, scanned documents). The files average around 200mb each, with some going all the way to 1 GB. These files all have Fast Web View enabled and are in the most current PDF format. The issue we are having is that to open these files takes quite a while still and then searching them is almost impossible. It can take upwards of 10 minutes to search for one word terms using Acrobat Pros Find utility. What utilities or management programs would you suggest to manage these PDF files and make them searchable. We are even willing to pay for the google search appliance if that would work.

Thanks, Justin
0
JAPorter1983
Asked:
JAPorter1983
  • 6
  • 4
  • 2
2 Solutions
 
Dan CraciunIT ConsultantCommented:
Try dtsearch. Not cheap, but should work well for your needs.

HTH,
Dan
0
 
Joe Winograd, Fellow&MVEDeveloperCommented:
Hi Justin,
For indexing and fast searching of PDF files (and any files that already have text in them), I strongly recommend dtSearch:
http://www.dtsearch.com/

I have been using it for around 20 years...extraordinarily good piece of software!

When it indexes documents that are mixed binary and text files (such as Word files and PDF Searchable Image files that have been created by scanning and OCR), it has an option to filter out the binary. This makes the index much smaller than other products which also index the binary code (for no good reason). dtSearch has an interesting filtering algorithm that scans a binary file for anything that looks like text using multiple encoding detection methods. The algorithm detects sequences of text with different encodings or formats, and ignores the binary. This is perfect for PDF Searchable Image files created by OCR (and Word documents, too).

It has built-in viewers for most common file types, including, of course, PDF (and Word — both DOC and DOCX), but can also launch an external program automatically when the hit is on a file type for which it doesn't have a viewer.

It has special handling for PDF files, allowing you either to view the PDF file in place (in dtSearch) or in a separate instance of Adobe Reader (and in both cases, hits are highlighted). Also, to improve performance, there's an option that lets you tell dtSearch to automatically open Adobe Reader for PDF files (the point is that Adobe Reader runs embedded in dtSearch and it opens PDF files much more quickly if Adobe Reader is already running separately when a PDF is opened in dtSearch).

It has extensive search options, including exact phrase, stemming, phonic, fuzzy, synonym, any words, all words, and Boolean. Here's the search request dialog:

dtSearch Search RequestdtSearch has a strong presence in the Legal profession, as you can see here:
http://www.dtsearch.com/CS_legal.html

It is not an inexpensive product, but it is worth every penny. You are getting what you pay for! It is an excellent search tool.

As a disclaimer, I want to emphasize that I have no affiliation with this company and no financial interest in it whatsoever. I am simply a happy user/customer. Regards, Joe
0
 
JAPorter1983Author Commented:
Thanks for the recommendations. I am going to head over and download a trial to test it out.
0
Get expert help—faster!

Need expert help—fast? Use the Help Bell for personalized assistance getting answers to your important questions.

 
Joe Winograd, Fellow&MVEDeveloperCommented:
Justin,
Sounds good. If you need any help while you're trying it, don't hesitate to ask. I'm very familiar with the program and can probably help you through any problems. Also, their technical support is excellent. When you send an email to <tech@dtsearch.com>, you will get a prompt reply from someone who is extremely knowledgeable. Regards, Joe
0
 
JAPorter1983Author Commented:
Thanks for the help again. This looks pretty close to what I am looking for. I am still testing it out, but I wanted to give you a quick update.
0
 
Joe Winograd, Fellow&MVEDeveloperCommented:
Justin,
Thanks for the update — it's very nice to hear back from the asker during a thread (I wish more folks did it). Regards, Joe
0
 
JAPorter1983Author Commented:
Thanks for the help. From my testing this seems like a good fit.

Justin
0
 
Joe Winograd, Fellow&MVEDeveloperCommented:
Glad to hear it! Regards, Joe
0
 
Dan CraciunIT ConsultantCommented:
@justin: Thank you for the points, but I think Joe deserves the majority. I simply gave you an option, Joe went above and beyond.
0
 
Joe Winograd, Fellow&MVEDeveloperCommented:
Dan,
As always, you are a true gentleman in the EE community!

Justin,
Although it is very kind of Dan to make that offer, don't worry about taking the time to reopen the question to change the points. I'm fine with it as-is.

One other thing about dtSearch. Dan and I both mentioned that it is not the low-priced spread, but one positive point in their pricing is the approach to technical support and product updates. Their store page says, "Technical support and product updates are free for a minimum of one year with all purchases." The "minimum of one year" statement is vague and there is no fee mentioned. Also, the dtSearch Desktop/Network Upgrades page says it is a "free upgrade", but it's not clear if these upgrades are forever free. So I wrote to dtSearch asking for a clarification of the policy and here's what they wrote back (with permission to share the answer publicly):

----- Begin dtSearch response -----

I appreciate your email, and sorry for the confusion!

Our setup licenses provide for a minimum of one year of support and upgrades on all licenses. That said, we have provided support and upgrades at no charge since Year 2000 for all end-user Desktop / Network licenses (!). Because of the higher average cost of developer support, we have been charging annually for developer (Web / Engine / Publish) upgrades and support, but again not Desktop / Network upgrades and support.

I can't always guarantee that this will be the case until the end of time, but that's why you don't find any "upgrade charge" indicators for Desktop / Network on our site currently.

----- End dtSearch response -----

Amortized over a large number of years for technical support and software upgrades/updates, the $199 license fee becomes much more reasonable. dtSearch was careful to say in the response that they "can't always guarantee" no upgrade charge, but I have been using dtSearch for around 20 years, have received technical support and product upgrades on a continuous basis (am currently running the latest release), and have never paid anything beyond the initial license fee. So it's a pretty good bet, if not a guarantee. Regards, Joe
0
 
JAPorter1983Author Commented:
Joe,
Thanks again for all your information. I have setup a few test indexes for my client and they are testing out the dtSearch results.

Thanks for all the help.
Justin
0
 
Joe Winograd, Fellow&MVEDeveloperCommented:
Justin,
You're welcome again. I always like to hear about "real world" results (good or bad) with my favorite products, so I hope you'll post back here with your client's results after their dtSearch testing. Thanks much, Joe
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

  • 6
  • 4
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now