Solved

pdf to excel converter

Posted on 2014-03-03
11
290 Views
Last Modified: 2014-05-24
Hi,

i am looking for a utility (application, code) to accurately convert financial data tables in pdf to excel or html.

the problem i face in normally available converters is that they do not properly align headers with financial data in the column.

you can take a look at pdf in below section
http://www.bp.com/en/global/corporate/investors/results-and-reporting/annual-f-oi.html

i require only the data in financial tables.


Regards
0
Comment
Question by:Hydra01
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
11 Comments
 
LVL 35

Assisted Solution

by:Dan Craciun
Dan Craciun earned 150 total points
ID: 39900593
I usually get good results with this: http://www.pdftoexcel.org/

But I usually convert tables that were originally made in Excel (or Word, in worst cases).

That pdf from your link is professionally made using Indesign/Quark or something similar. Trying to convert from apples into plastic is bound to give you some errors.

But, as one of my friends likes to say, that is what interns are for: manual labor :)

HTH,
Dan
0
 
LVL 19

Accepted Solution

by:
*** Hopeleonie *** earned 200 total points
ID: 39900628
0
 

Author Comment

by:Hydra01
ID: 39903198
Hi Dan/ Hope,

Thanks for replying , as you pointed out, all applications might miss a thing or two.

I have now started to think, if it possible to get someone to create a customized code to make this possible ?

As i am only looking at financial releases of companies the scope is defined.

if you can point me to someone?
0
The Eight Noble Truths of Backup and Recovery

How can IT departments tackle the challenges of a Big Data world? This white paper provides a roadmap to success and helps companies ensure that all their data is safe and secure, no matter if it resides on-premise with physical or virtual machines or in the cloud.

 
LVL 35

Expert Comment

by:Dan Craciun
ID: 39903210
Do you have a regular look on those financial releases? Most I looked at were fairly different, depending on the designer.

The problem is that those are intended to look good, not transfer data. Have all sorts of flourishes, color scales etc. All stuff that will interfere with the conversion.

Not saying it can't be done. Just that it might be more work than it's worth.

Dan
0
 

Author Comment

by:Hydra01
ID: 39903266
These releases come on Q on Q basis and they have more or less same format on company to company basis.

Now i can give a start and end point of the table which i need, lets say "income statement"and will do this for all companies, i will extract data for.

I was reading about pdf co -ordinate system so in a table all financials which fall on same vertical line can be differentiated as columns as we do when we import text into excel.

Similarly all financials on horizontal will be in same row.

Do you think this can be coded ?

Also following this approach will save hassles of converting the entire pdf.
0
 
LVL 19

Assisted Solution

by:regmigrant
regmigrant earned 150 total points
ID: 39909487
unfortunately there's no guarantee that the tables in the PDF contain actual character data, depending on how the document was created and what the original designer added or subtracted to get the effect they want a PDF could be all image data and even if only part of it uses that technique to display information you want the converter would have to cope with both styles. On that basis it would be much cheaper and more accurate to use one of the commercial offerings and you will still need to do a side by side check to catch problems.

if you have access to One Note it has an ok OCR option built in and would be worth a try for a first pass but like many others it will be easily defeated by any security elements added to the document
0
 

Author Comment

by:Hydra01
ID: 40064850
I've requested that this question be deleted for the following reason:

question coudnt be answered , looks like it cant be done
0
 
LVL 35

Expert Comment

by:Dan Craciun
ID: 40064851
"It can't be done" is a perfectly valid answer.
0
 

Author Comment

by:Hydra01
ID: 40073551
Thanks for links eenookami,

I will pick "self answered with help" as all experts helped me reach this conclusion and i am going to distribute the points.
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I use more than 1 computer in my office for various reasons. Multiple keyboards and mice take up more than just extra space, they make working a little more complicated. Using one mouse and keyboard for all of my computers makes life easier. This co…
This article shows how to deploy dynamic backgrounds to computers depending on the aspect ratio of display
The viewer will learn how to successfully download and install the SARDU utility on Windows 7, without downloading adware.
An overview on how to enroll an hourly employee into the employee database and how to give them access into the clock in terminal.

749 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question