Solved

pdf to excel converter

Posted on 2014-03-03
11
284 Views
Last Modified: 2014-05-24
Hi,

i am looking for a utility (application, code) to accurately convert financial data tables in pdf to excel or html.

the problem i face in normally available converters is that they do not properly align headers with financial data in the column.

you can take a look at pdf in below section
http://www.bp.com/en/global/corporate/investors/results-and-reporting/annual-f-oi.html

i require only the data in financial tables.


Regards
0
Comment
Question by:Hydra01
11 Comments
 
LVL 34

Assisted Solution

by:Dan Craciun
Dan Craciun earned 150 total points
ID: 39900593
I usually get good results with this: http://www.pdftoexcel.org/

But I usually convert tables that were originally made in Excel (or Word, in worst cases).

That pdf from your link is professionally made using Indesign/Quark or something similar. Trying to convert from apples into plastic is bound to give you some errors.

But, as one of my friends likes to say, that is what interns are for: manual labor :)

HTH,
Dan
0
 
LVL 19

Accepted Solution

by:
*** Hopeleonie *** earned 200 total points
ID: 39900628
0
 

Author Comment

by:Hydra01
ID: 39903198
Hi Dan/ Hope,

Thanks for replying , as you pointed out, all applications might miss a thing or two.

I have now started to think, if it possible to get someone to create a customized code to make this possible ?

As i am only looking at financial releases of companies the scope is defined.

if you can point me to someone?
0
DevOps Toolchain Recommendations

Read this Gartner Research Note and discover how your IT organization can automate and optimize DevOps processes using a toolchain architecture.

 
LVL 34

Expert Comment

by:Dan Craciun
ID: 39903210
Do you have a regular look on those financial releases? Most I looked at were fairly different, depending on the designer.

The problem is that those are intended to look good, not transfer data. Have all sorts of flourishes, color scales etc. All stuff that will interfere with the conversion.

Not saying it can't be done. Just that it might be more work than it's worth.

Dan
0
 

Author Comment

by:Hydra01
ID: 39903266
These releases come on Q on Q basis and they have more or less same format on company to company basis.

Now i can give a start and end point of the table which i need, lets say "income statement"and will do this for all companies, i will extract data for.

I was reading about pdf co -ordinate system so in a table all financials which fall on same vertical line can be differentiated as columns as we do when we import text into excel.

Similarly all financials on horizontal will be in same row.

Do you think this can be coded ?

Also following this approach will save hassles of converting the entire pdf.
0
 
LVL 19

Assisted Solution

by:regmigrant
regmigrant earned 150 total points
ID: 39909487
unfortunately there's no guarantee that the tables in the PDF contain actual character data, depending on how the document was created and what the original designer added or subtracted to get the effect they want a PDF could be all image data and even if only part of it uses that technique to display information you want the converter would have to cope with both styles. On that basis it would be much cheaper and more accurate to use one of the commercial offerings and you will still need to do a side by side check to catch problems.

if you have access to One Note it has an ok OCR option built in and would be worth a try for a first pass but like many others it will be easily defeated by any security elements added to the document
0
 

Author Comment

by:Hydra01
ID: 40064850
I've requested that this question be deleted for the following reason:

question coudnt be answered , looks like it cant be done
0
 
LVL 34

Expert Comment

by:Dan Craciun
ID: 40064851
"It can't be done" is a perfectly valid answer.
0
 

Author Comment

by:Hydra01
ID: 40073551
Thanks for links eenookami,

I will pick "self answered with help" as all experts helped me reach this conclusion and i am going to distribute the points.
0

Featured Post

U.S. Department of Agriculture and Acronis Access

With the new era of mobile computing, smartphones and tablets, wireless communications and cloud services, the USDA sought to take advantage of a mobilized workforce and the blurring lines between personal and corporate computing resources.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
in Dot net,lastest version of MVC 3 24
C# bracket error 3 31
Store results in vb.net 3 22
Need help deploying my first MVC.Net app with a SQL Server backend 3 13
If your app took Google’s lash recently, here are the 5 most likely reasons.
Healthcare organizations in the United States must adhere to the guidance of both the HIPAA (Health Insurance Portability and Accountability Act) and HITECH (Health Information Technology for Economic and Clinical Health Act) for securing and protec…
This video demonstrates how to use each tool, their shortcuts, where and when to use them, and how to use the keyboard to improve workflow.
This video will demonstrate how to find the puppet warp tool from the edit menu and where to put the points to edit.

810 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question