• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 3911
  • Last Modified:

Batch convert PDF to searchable format

I have many files that are scanned to pdf format.  I have a copy of Adobe Acrobat 7.0 that came with my scanner.  I can use the menu option Recognize Text using OCR to make my files searchable within Acrobat Reader and with Google Desktop.

I have 100's of files.  Is there anyway to automate or script the conversion from image/pdf to searchable pdf so that I can do an entire folder at the same time.

2 Solutions
Choose Advanced>Batch Processing
Click the New Sequence button.
Give sequence name.
Click Select Commands
Select all the commands that you like that Batch to do
Save it and then Run a Batch Sequence

Place all the files you wish to process in a single folder on your hard drive.
Choose Advanced>Batch Processing
Select the sequence to run (just what you have saved)
Click OK
Select the folder to process
Select the Output Folder


We're using OmniPage Pro for this function.  We originally tried a batch convert feature from Adobe.  It was expensive and required regular "dongles" to be purchased.  

OmniPage lets us do all our files without limit (We've done over 40,000 documents to date - long lease documents - perhaps 60 pages each).

--Jim Christmas
Forced accept.

Community Support Moderator
Cloud Class® Course: Certified Penetration Testing

This CPTE Certified Penetration Testing Engineer course covers everything you need to know about becoming a Certified Penetration Testing Engineer. Career Path: Professional roles include Ethical Hackers, Security Consultants, System Administrators, and Chief Security Officers.

Mr Xmas
I've been recently charged with converting 1000's of pdf files to searchable pdf''s on a client network. Many of these documents are 20+MB large. We have purchased the Omnipage Pro and am using the Batch Manager to process the files but it is taking a crazy amount of time to do this. I suspect there is something that I am doing wrong. I've had a single batch runing over the entire weekend consisting of 8 files, about 23MB each, and not even one file is complete and it looks like it was skipping some. This morning I pick one of the files to do and it appears to be doing it, but about 1 page every 15-20 seconds out of the few hundred page document. At this rate I will die an old man before the files get completed. What would you recommend?

The OCR is what takes a long time, especially if your files are scanned at a fairly high resolution.
One thing to try is copying the files down to the local machine before processing, and having Omnipage save the files to the local machine.  
For our process I found this makes a huge difference.  In many cases, someone in our department will scan a lease (a 60 paage document on average) and the file will be converted a minute or two after they return to their desks.

Hope it helps,

Jim Christmas
is there a way to create a batch so that the original directory structure is kept. We don't want to lose that and its incredibly cumbersome to do the batching one little directory at a time. The client is a law firm and we are trying to get all of the old pdf's made into searchable and there are hundreds of directories within directories etc.

I'm not sure what version of OmniPage you're using.  We've got v 15 Pro.  When I go into the batch manager and create a new job, I can tell the system to include subfolders by picking a box, then clicking a checkbox right next to the folder.  The checkbox itself doesn't say "include subfolders" but if you read the text at the top of the screen, it says tells you that the checkbox is for including subfolders.

Then in the save options we use TEXT - PDF with Image on Text.  And I believe you can choose to save the files with the original file names into subfolders.  I've never actually tried this part myself, but the option does appear to be there on the screen.

Good luck,

Jim Christmas
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Cloud Class® Course: Microsoft Azure 2017

Azure has a changed a lot since it was originally introduce by adding new services and features. Do you know everything you need to about Azure? This course will teach you about the Azure App Service, monitoring and application insights, DevOps, and Team Services.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now