Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17


PDF Extraction to Excel

Posted on 2011-03-23
Medium Priority
Last Modified: 2012-06-21
I have a multiple page scanned PDF document that contains several 1 page invoices.  I need a solution to OCR the document so that the data may be extracted and then select specific fields from the document to export them to a spreadsheet.  The specific fields are repeated on each page.

I've looked at a couple of solutions, but you have to copy each field from all pages to extract the data fields that I want and that takes too much time.
Question by:curtconner
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
LVL 33

Accepted Solution

jppinto earned 400 total points
ID: 35200184
Did you tryed PDF2XL? Take a look at my review to this program on my blog here:


Author Comment

ID: 35200581
jppinto:  The OCR piece didn't work very well with the document that I'm scanning.  Loved the features, but the OCR failed.

Assisted Solution

InfoStranger earned 400 total points
ID: 35202192
Do you have Adobe Acrobat?

My instructions below are for Acrobat 8.0.  To convert picture to text using OCR,
1) open PDF in Acrobat
2) Select Document Menu
3) Select OCR Text Recognition
4) Recognize Text Using OCR...
5) Click OK

You may want to try this first then try it again.  The OCR may not work as well if the document is faded or too crooked.
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

LVL 26

Assisted Solution

redmondb earned 400 total points
ID: 35203938

I've frequently used ABBYY FineReader for tasks such as this. (My version is V8, the current is V10 -

Initially, you create a template specifying the fields that you want to extract from the invoice (a few minutes work for a typical invoice layout) and set up a job to open, read and export the fields to Excel (another minute's work).

From then on, simply run the job which opens the PDF, OCRs the required fields and exports them to Excel.


Assisted Solution

jyk_aus earned 400 total points
ID: 35205871

Have you considered purchasing the full version of Acrobat Reader?  Amongst other things it has the facility to convert PDF to quite a few formats, Excel included.

See here:

Best regards
LVL 21

Assisted Solution

viki2000 earned 400 total points
ID: 35419680
Try this
It is programmable with macros, has customizable areas...
LVL 26

Expert Comment

ID: 35857619
Thanks, curtconner.

Hope it worked out OK in the end.


Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

PaperPort 14.5 Patch 1 update is often not detected or downloaded automatically. This article provides direct download links to solve the problem for retail (non-bundled) versions of the Standard and Professional editions, as well as the Professiona…
When you see single cell contains number and text, and you have to get any date out of it seems like cracking our heads.
In this sixth video of the Xpdf series, we discuss and demonstrate the PDFtoPNG utility, which converts a multi-page PDF file to separate color, grayscale, or monochrome PNG files, creating one PNG file for each page in the PDF. It does this via a c…
In a recent question ( here at Experts Exchange, a member asked how to add page numbers to a PDF file using Adobe Acrobat XI Pro. This short video Micro Tutorial sh…

715 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question