Solved

pdf to excel converter

Posted on 2014-03-03
11
287 Views
Last Modified: 2014-05-24
Hi,

i am looking for a utility (application, code) to accurately convert financial data tables in pdf to excel or html.

the problem i face in normally available converters is that they do not properly align headers with financial data in the column.

you can take a look at pdf in below section
http://www.bp.com/en/global/corporate/investors/results-and-reporting/annual-f-oi.html

i require only the data in financial tables.


Regards
0
Comment
Question by:Hydra01
11 Comments
 
LVL 34

Assisted Solution

by:Dan Craciun
Dan Craciun earned 150 total points
ID: 39900593
I usually get good results with this: http://www.pdftoexcel.org/

But I usually convert tables that were originally made in Excel (or Word, in worst cases).

That pdf from your link is professionally made using Indesign/Quark or something similar. Trying to convert from apples into plastic is bound to give you some errors.

But, as one of my friends likes to say, that is what interns are for: manual labor :)

HTH,
Dan
0
 
LVL 19

Accepted Solution

by:
*** Hopeleonie *** earned 200 total points
ID: 39900628
0
 

Author Comment

by:Hydra01
ID: 39903198
Hi Dan/ Hope,

Thanks for replying , as you pointed out, all applications might miss a thing or two.

I have now started to think, if it possible to get someone to create a customized code to make this possible ?

As i am only looking at financial releases of companies the scope is defined.

if you can point me to someone?
0
How Do You Stack Up Against Your Peers?

With today’s modern enterprise so dependent on digital infrastructures, the impact of major incidents has increased dramatically. Grab the report now to gain insight into how your organization ranks against your peers and learn best-in-class strategies to resolve incidents.

 
LVL 34

Expert Comment

by:Dan Craciun
ID: 39903210
Do you have a regular look on those financial releases? Most I looked at were fairly different, depending on the designer.

The problem is that those are intended to look good, not transfer data. Have all sorts of flourishes, color scales etc. All stuff that will interfere with the conversion.

Not saying it can't be done. Just that it might be more work than it's worth.

Dan
0
 

Author Comment

by:Hydra01
ID: 39903266
These releases come on Q on Q basis and they have more or less same format on company to company basis.

Now i can give a start and end point of the table which i need, lets say "income statement"and will do this for all companies, i will extract data for.

I was reading about pdf co -ordinate system so in a table all financials which fall on same vertical line can be differentiated as columns as we do when we import text into excel.

Similarly all financials on horizontal will be in same row.

Do you think this can be coded ?

Also following this approach will save hassles of converting the entire pdf.
0
 
LVL 19

Assisted Solution

by:regmigrant
regmigrant earned 150 total points
ID: 39909487
unfortunately there's no guarantee that the tables in the PDF contain actual character data, depending on how the document was created and what the original designer added or subtracted to get the effect they want a PDF could be all image data and even if only part of it uses that technique to display information you want the converter would have to cope with both styles. On that basis it would be much cheaper and more accurate to use one of the commercial offerings and you will still need to do a side by side check to catch problems.

if you have access to One Note it has an ok OCR option built in and would be worth a try for a first pass but like many others it will be easily defeated by any security elements added to the document
0
 

Author Comment

by:Hydra01
ID: 40064850
I've requested that this question be deleted for the following reason:

question coudnt be answered , looks like it cant be done
0
 
LVL 34

Expert Comment

by:Dan Craciun
ID: 40064851
"It can't be done" is a perfectly valid answer.
0
 

Author Comment

by:Hydra01
ID: 40073551
Thanks for links eenookami,

I will pick "self answered with help" as all experts helped me reach this conclusion and i am going to distribute the points.
0

Featured Post

How our DevOps Teams Maximize Uptime

Our Dev teams are like yours. They’re continually cranking out code for new features/bugs fixes, testing, deploying, responding to production monitoring events and more. It’s complex. So, we thought you’d like to see what’s working for us. Read the use case whitepaper.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Workplace bullying has increased with the use of email and social media. Retain evidence of this with email archiving to protect your employees.
Developer portfolios can be a bit of an enigma—how do you present yourself to employers without burying them in lines of code?  A modern portfolio is more than just work samples, it’s also a statement of how you work.
The viewer will learn how to set up a document for the web and print and the recommended PPI for printing.
This video shows how use content aware, what it’s used for, and when to use it over other tools.

840 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question