Solved

pdf to excel converter

Posted on 2014-03-03
11
282 Views
Last Modified: 2014-05-24
Hi,

i am looking for a utility (application, code) to accurately convert financial data tables in pdf to excel or html.

the problem i face in normally available converters is that they do not properly align headers with financial data in the column.

you can take a look at pdf in below section
http://www.bp.com/en/global/corporate/investors/results-and-reporting/annual-f-oi.html

i require only the data in financial tables.


Regards
0
Comment
Question by:Hydra01
11 Comments
 
LVL 34

Assisted Solution

by:Dan Craciun
Dan Craciun earned 150 total points
ID: 39900593
I usually get good results with this: http://www.pdftoexcel.org/

But I usually convert tables that were originally made in Excel (or Word, in worst cases).

That pdf from your link is professionally made using Indesign/Quark or something similar. Trying to convert from apples into plastic is bound to give you some errors.

But, as one of my friends likes to say, that is what interns are for: manual labor :)

HTH,
Dan
0
 
LVL 19

Accepted Solution

by:
*** Hopeleonie *** earned 200 total points
ID: 39900628
0
 

Author Comment

by:Hydra01
ID: 39903198
Hi Dan/ Hope,

Thanks for replying , as you pointed out, all applications might miss a thing or two.

I have now started to think, if it possible to get someone to create a customized code to make this possible ?

As i am only looking at financial releases of companies the scope is defined.

if you can point me to someone?
0
 
LVL 34

Expert Comment

by:Dan Craciun
ID: 39903210
Do you have a regular look on those financial releases? Most I looked at were fairly different, depending on the designer.

The problem is that those are intended to look good, not transfer data. Have all sorts of flourishes, color scales etc. All stuff that will interfere with the conversion.

Not saying it can't be done. Just that it might be more work than it's worth.

Dan
0
3 Use Cases for Connected Systems

Our Dev teams are like yours. They’re continually cranking out code for new features/bugs fixes, testing, deploying, testing some more, responding to production monitoring events and more. It’s complex. So, we thought you’d like to see what’s working for us.

 

Author Comment

by:Hydra01
ID: 39903266
These releases come on Q on Q basis and they have more or less same format on company to company basis.

Now i can give a start and end point of the table which i need, lets say "income statement"and will do this for all companies, i will extract data for.

I was reading about pdf co -ordinate system so in a table all financials which fall on same vertical line can be differentiated as columns as we do when we import text into excel.

Similarly all financials on horizontal will be in same row.

Do you think this can be coded ?

Also following this approach will save hassles of converting the entire pdf.
0
 
LVL 19

Assisted Solution

by:regmigrant
regmigrant earned 150 total points
ID: 39909487
unfortunately there's no guarantee that the tables in the PDF contain actual character data, depending on how the document was created and what the original designer added or subtracted to get the effect they want a PDF could be all image data and even if only part of it uses that technique to display information you want the converter would have to cope with both styles. On that basis it would be much cheaper and more accurate to use one of the commercial offerings and you will still need to do a side by side check to catch problems.

if you have access to One Note it has an ok OCR option built in and would be worth a try for a first pass but like many others it will be easily defeated by any security elements added to the document
0
 

Author Comment

by:Hydra01
ID: 40064850
I've requested that this question be deleted for the following reason:

question coudnt be answered , looks like it cant be done
0
 
LVL 34

Expert Comment

by:Dan Craciun
ID: 40064851
"It can't be done" is a perfectly valid answer.
0
 

Author Comment

by:Hydra01
ID: 40073551
Thanks for links eenookami,

I will pick "self answered with help" as all experts helped me reach this conclusion and i am going to distribute the points.
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Let’s list some of the technologies that enable smooth teleworking. 
This article shows how to deploy dynamic backgrounds to computers depending on the aspect ratio of display
This video demonstrates basic masking and how to edit the mask to reveal the desired image.
This video shows how use content aware, what it’s used for, and when to use it over other tools.

920 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now