Link to home
Start Free TrialLog in
Avatar of Tony Giangreco
Tony GiangrecoFlag for United States of America

asked on

Extract the text data from 200 PDF files and insert it into an Excel spreadsheet

I have over 300 one page PDF reports relating to information on vehicles. They are all the same format except for the vehicle information. None of them are written on or stamped.

I have Win7, Adobe Acrobat 9.0 and office 2013. How can I extract the data fields from each page and insert it into an excel spreadsheet or CSV file for Excel?

I'm thinking some Visual; basic code may help, but I'm not a VB programmer.

Thanks in advance!
ASKER CERTIFIED SOLUTION
Avatar of duttcom
duttcom
Flag of Australia image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Tony Giangreco

ASKER

I;m not looking to export a table from the PDF file, just specific areas of text that is listed on each page in a specific location. Since the page layout is static, I'm assuming I could map each location in Acrobat or other app and instruct it to put the data in sequential columns of the spreadsheet per page. When it's done, I;d expect a row per pdf file.

Is this the functionality Acrobat 11 provides? Would I need a specific version?
SOLUTION
Avatar of redmondb
redmondb
Flag of Afghanistan image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of PJL2
PJL2

Let me append my last comment... there may be a tool or method, but I am not aware of one to do exactly what you want.
I've been on another project and have not been able to test the suggested solution. I should be able to try it out by Wednesday. I'll let you know what happens.

Sorry for the delay in response.
Thanks
There is a tool/software I found that can parse out specific areas of the PDFs pretty quick, with a free-form selection of the area you want to extract the data from. Once you set the "rule" you just upload all the files you want to extract the data from, or if they are inbound emails delivering the attachments, you can just forward them to the software to get the same results. The solution is a pdf parser called docparser.