Link to home
Start Free TrialLog in
Avatar of groovymonkey
groovymonkeyFlag for Canada

asked on

Copying data from large pdf doc into excel

I have a large adobe document with pages of records that I want to export into excel while maintaing the data structure and record integrity.  I have had no luck so far in pasting and copying the entire doc. Each page has the the header row followed by rows with multiple columns of data (numbers dates etc).

I had exported to rtf but it did not keep the records seperated and I am desperately hoping to not have to go through 1000s of records and enter a hard carriage return....

Thanks
ASKER CERTIFIED SOLUTION
Avatar of redmondb
redmondb
Flag of Afghanistan image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
My post and Brian's post crossed each other, but his comment about OCR raises a question. I'm assuming that your PDF is NOT a pure image, that is, I'm assuming it has structured data (text) in it. If so, then OCR is not needed. However, if it IS a pure image (no structured/textual data), then OCR is needed, and I agree with Brian that ABBYY FineReader is excellent OCR, and so is Nuance's OmniPage:

http://finereader.abbyy.com/
http://nuance.com/for-individuals/by-product/omnipage/index.htm

Here are links to feature comparison charts:

http://nuance.com/ucmprod/groups/imaging/@web-enus/documents/collateral/nc_016052.pdf
http://finereader.abbyy.com/editions_comparison_chart/

But based on your original question, I'm pretty sure you don't need OCR. Regards, Joe
Joe,

Even if the PDF has the underlying text, it can still be appropriate to use an OCR program. As groovymonkey has found out the hard way, the underlying text is often badly laid out. Lots of OCR programs will give you sufficient control of the layout to correct these problems.

Regards,
Brian.
Hi Brian,
I guess we'll have to agree to disagree. If there's already text, I would not run it through OCR. I think groovymonkey's problem is probably the copying/pasting technique that he's using. My gut tells me that a top quality PDF to Excel converter will do the job for him without resorting to OCR. Just my humble. Regards, Joe
Joe,

I've had had a dig around for a PDF behaving as I described.

Couldn't find one.

I bow to your superior knowledge.

Regards,
Brian.
Avatar of groovymonkey

ASKER

Great answers guys.  I am limited at work so I could not install a program for the pdf to convert it back to excel.  It was laid out bad (no colums) but was not an image (there were rows of data).  Ended up using a scanner (OCR) to scan the printed doc back into a pdf (that now has the columns and can be copied into excel with mainatined data intergrity.  Nothing pretty about it but it worked.  So thanks to all for your input.

Groovymonkey
Thanks, groovymonkey.