Solved

pdf to excel converter

Posted on 2014-03-03
11
304 Views
Last Modified: 2014-05-24
Hi,

i am looking for a utility (application, code) to accurately convert financial data tables in pdf to excel or html.

the problem i face in normally available converters is that they do not properly align headers with financial data in the column.

you can take a look at pdf in below section
http://www.bp.com/en/global/corporate/investors/results-and-reporting/annual-f-oi.html

i require only the data in financial tables.


Regards
0
Comment
Question by:Hydra01
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
11 Comments
 
LVL 35

Assisted Solution

by:Dan Craciun
Dan Craciun earned 150 total points
ID: 39900593
I usually get good results with this: http://www.pdftoexcel.org/

But I usually convert tables that were originally made in Excel (or Word, in worst cases).

That pdf from your link is professionally made using Indesign/Quark or something similar. Trying to convert from apples into plastic is bound to give you some errors.

But, as one of my friends likes to say, that is what interns are for: manual labor :)

HTH,
Dan
0
 
LVL 19

Accepted Solution

by:
*** Hopeleonie *** earned 200 total points
ID: 39900628
0
 

Author Comment

by:Hydra01
ID: 39903198
Hi Dan/ Hope,

Thanks for replying , as you pointed out, all applications might miss a thing or two.

I have now started to think, if it possible to get someone to create a customized code to make this possible ?

As i am only looking at financial releases of companies the scope is defined.

if you can point me to someone?
0
Creating Instructional Tutorials  

For Any Use & On Any Platform

Contextual Guidance at the moment of need helps your employees/users adopt software o& achieve even the most complex tasks instantly. Boost knowledge retention, software adoption & employee engagement with easy solution.

 
LVL 35

Expert Comment

by:Dan Craciun
ID: 39903210
Do you have a regular look on those financial releases? Most I looked at were fairly different, depending on the designer.

The problem is that those are intended to look good, not transfer data. Have all sorts of flourishes, color scales etc. All stuff that will interfere with the conversion.

Not saying it can't be done. Just that it might be more work than it's worth.

Dan
0
 

Author Comment

by:Hydra01
ID: 39903266
These releases come on Q on Q basis and they have more or less same format on company to company basis.

Now i can give a start and end point of the table which i need, lets say "income statement"and will do this for all companies, i will extract data for.

I was reading about pdf co -ordinate system so in a table all financials which fall on same vertical line can be differentiated as columns as we do when we import text into excel.

Similarly all financials on horizontal will be in same row.

Do you think this can be coded ?

Also following this approach will save hassles of converting the entire pdf.
0
 
LVL 19

Assisted Solution

by:regmigrant
regmigrant earned 150 total points
ID: 39909487
unfortunately there's no guarantee that the tables in the PDF contain actual character data, depending on how the document was created and what the original designer added or subtracted to get the effect they want a PDF could be all image data and even if only part of it uses that technique to display information you want the converter would have to cope with both styles. On that basis it would be much cheaper and more accurate to use one of the commercial offerings and you will still need to do a side by side check to catch problems.

if you have access to One Note it has an ok OCR option built in and would be worth a try for a first pass but like many others it will be easily defeated by any security elements added to the document
0
 

Author Comment

by:Hydra01
ID: 40064850
I've requested that this question be deleted for the following reason:

question coudnt be answered , looks like it cant be done
0
 
LVL 35

Expert Comment

by:Dan Craciun
ID: 40064851
"It can't be done" is a perfectly valid answer.
0
 

Author Comment

by:Hydra01
ID: 40073551
Thanks for links eenookami,

I will pick "self answered with help" as all experts helped me reach this conclusion and i am going to distribute the points.
0

Featured Post

VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In our personal lives, we have well-designed consumer apps to delight us and make even the most complex transactions simple. Many enterprise applications, however, are a bit behind the times. For an enterprise app to be successful in today's tech wo…
Developer portfolios can be a bit of an enigma—how do you present yourself to employers without burying them in lines of code?  A modern portfolio is more than just work samples, it’s also a statement of how you work.
The viewer will learn how to successfully download and install the SARDU utility on Windows 8, without downloading adware.
Learn how to automatically add page numbers in your next InDesign project. This can be very helpful in multi-page books and magazines that you are designing. Make sure your Pages window visible.:  In the document you wish to add page numbers to. Act…

627 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question