Link to home
Start Free TrialLog in
Avatar of kaiseritc
kaiseritcFlag for United States of America

asked on

Batch convert PDF to searchable format

I have many files that are scanned to pdf format.  I have a copy of Adobe Acrobat 7.0 that came with my scanner.  I can use the menu option Recognize Text using OCR to make my files searchable within Acrobat Reader and with Google Desktop.

I have 100's of files.  Is there anyway to automate or script the conversion from image/pdf to searchable pdf so that I can do an entire folder at the same time.

Thanks
ASKER CERTIFIED SOLUTION
Avatar of Ripin
Ripin
Flag of Finland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Forced accept.

Computer101
Community Support Moderator
Avatar of fi8224
fi8224

Mr Xmas
I've been recently charged with converting 1000's of pdf files to searchable pdf''s on a client network. Many of these documents are 20+MB large. We have purchased the Omnipage Pro and am using the Batch Manager to process the files but it is taking a crazy amount of time to do this. I suspect there is something that I am doing wrong. I've had a single batch runing over the entire weekend consisting of 8 files, about 23MB each, and not even one file is complete and it looks like it was skipping some. This morning I pick one of the files to do and it appears to be doing it, but about 1 page every 15-20 seconds out of the few hundred page document. At this rate I will die an old man before the files get completed. What would you recommend?
fi8224,

The OCR is what takes a long time, especially if your files are scanned at a fairly high resolution.
One thing to try is copying the files down to the local machine before processing, and having Omnipage save the files to the local machine.  
For our process I found this makes a huge difference.  In many cases, someone in our department will scan a lease (a 60 paage document on average) and the file will be converted a minute or two after they return to their desks.

Hope it helps,

Jim Christmas
is there a way to create a batch so that the original directory structure is kept. We don't want to lose that and its incredibly cumbersome to do the batching one little directory at a time. The client is a law firm and we are trying to get all of the old pdf's made into searchable and there are hundreds of directories within directories etc.
fi8224,

I'm not sure what version of OmniPage you're using.  We've got v 15 Pro.  When I go into the batch manager and create a new job, I can tell the system to include subfolders by picking a box, then clicking a checkbox right next to the folder.  The checkbox itself doesn't say "include subfolders" but if you read the text at the top of the screen, it says tells you that the checkbox is for including subfolders.

Then in the save options we use TEXT - PDF with Image on Text.  And I believe you can choose to save the files with the original file names into subfolders.  I've never actually tried this part myself, but the option does appear to be there on the screen.

Good luck,

Jim Christmas