Extract the text data from 200 PDF files and insert it into an Excel spreadsheet

I have over 300 one page PDF reports relating to information on vehicles. They are all the same format except for the vehicle information. None of them are written on or stamped.

I have Win7, Adobe Acrobat 9.0 and office 2013. How can I extract the data fields from each page and insert it into an excel spreadsheet or CSV file for Excel?

I'm thinking some Visual; basic code may help, but I'm not a VB programmer.

Thanks in advance!
LVL 25
Tony GiangrecoAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Check out the tools under the form menu, such as Manage Form Data which has a few different export options. Without seeing the document youare working with it's hard to recommend anything more specific.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Try doing search on batch convert pdf to text on Google. If your pdf's are not image based text, it will be much easier and you can forgo doing ocr on them. I'm thinking that the Pro Version of Adobe may be necessary for batch process... below is one link to get you started.


Once the files have been converted, you have many options to populate excel. First step is conversion, and hope there is some commonality in exact text placement in each file is consistent to some degree.

Hope this helps.
Karl Heinz KremerCommented:
With Adobe Acrobat X and later, you can export your PDF as MS Excel file.

Acrobat's export functionality has gotten quite a bit better since Acrobat 9, but that means you would have to upgrade to a newer version to get the spreadsheet option.

As was already mentioned, with Acrobat Pro, you can use batch processing (or Actions, as this feature is called in Acrobat X and later) to process multiple documents at the same time. So, with X or XI you could load your 300 files and have them exported to MS Excel format with just a few mouse clicks.

If you do not want (or cannot upgrade), you can download the 30 day free evaluation version of Acrobat XI Pro and install it on a computer that does not have a licensed version of Acrobat already installed (otherwise you will have to remove both versions and install 9 from scratch again). Process your 200 documents and remove the eval version again.

You can also subscribe to Adobe's online PDF export service ExportPDF (http://exportpdf.com) to convert your files. This would however be a manual process, and you would have to upload your 300 documents manually (and download the results).
Exploring ASP.NET Core: Fundamentals

Learn to build web apps and services, IoT apps, and mobile backends by covering the fundamentals of ASP.NET Core and  exploring the core foundations for app libraries.

Tony GiangrecoAuthor Commented:
I;m not looking to export a table from the PDF file, just specific areas of text that is listed on each page in a specific location. Since the page layout is static, I'm assuming I could map each location in Acrobat or other app and instruct it to put the data in sequential columns of the spreadsheet per page. When it's done, I;d expect a row per pdf file.

Is this the functionality Acrobat 11 provides? Would I need a specific version?

Depending on the format of the PDF's, I believe that this can be done with a macro. (For example, using Acrobat 8 I have a macro that pulls text from PDF's.)

Please post a sample PDF.

Using Acrobat 9 Standard
Click on File... Export... Multiple Files

Select files you want to export... click ok
Then select plain text.

This will get your pdf files into text files (Assuming they are not pdf image files...)

Now comes the hard part. It requires some knowledge of creating .csv files. If you have that ability, post back and I will help. I know of no tool or utility that will work with native pdf files to accomplish what you want.
Let me append my last comment... there may be a tool or method, but I am not aware of one to do exactly what you want.
Tony GiangrecoAuthor Commented:
I've been on another project and have not been able to test the suggested solution. I should be able to try it out by Wednesday. I'll let you know what happens.

Sorry for the delay in response.
Tony GiangrecoAuthor Commented:
Tom KincheloeCommented:
There is a tool/software I found that can parse out specific areas of the PDFs pretty quick, with a free-form selection of the area you want to extract the data from. Once you set the "rule" you just upload all the files you want to extract the data from, or if they are inbound emails delivering the attachments, you can just forward them to the software to get the same results. The solution is a pdf parser called docparser.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Adobe Acrobat

From novice to tech pro — start learning today.