Solved

PDF to HTML - best method

Posted on 2004-08-28
6
215 Views
Last Modified: 2010-04-09
I get dozens of PDF files from a Printer.  I'm a Web Developer and need to convert the PDF files into HTML and post on the client's web site.

EXAMPLES:
WEB PAGE MADE FROM PDF FILE
http://www.netafim-usa-mining.com/Mining/p-galaxy-disc-kleen.php

ACTUAL PDF FILE
http://www.netafim-usa-mining.com/galaxy.pdf

Is there any easy way to do this?  I've tried lots of methods - including PDF to HTML software - but nothing works.  The images never appear clear and/or the text is in the wrong order.

I don't think there is an easy way to do this - but I wanted to check as I'm charging the client for adding the information to the web using a different source (naming EPS files).

Maybe I should be asking this question under Graphics category.

I'm kind of lost . . . can you give me some help?

Thanks, April

P.S.  We do offer the complete PDF file as a download - but still need the data in an HTML file.
0
Comment
Question by:aprillougheed
  • 3
  • 2
6 Comments
 
LVL 53

Assisted Solution

by:COBOLdinosaur
COBOLdinosaur earned 125 total points
ID: 11923516
0
 

Author Comment

by:aprillougheed
ID: 11923617
Hi.  I tried Magellan and I should have noted this in my question.  All I got was garbage - no pictures and text not in the correct places.  Although in their sales literature they say this should not happen.

I'm going to award you some points though because my goal in posting this question is to find out if I'm missing something obvious.

I'd like to keep the question open just a bit longer - couple of days - to see if anyone else has some ideas.

Also - triggered by your response - I've emailed BCL Magellan's customer support to see if I just didn't use their product correctly.

Thanks, April

P.S.  You've answered a lot of questions for me.   GOOD JOB!!
0
 
LVL 53

Expert Comment

by:COBOLdinosaur
ID: 11923726
Keeping it open is a good idea.  I don't work much with PDFs, but I'm sure some of the other experts in the TA, have had to deal with the same situation, you have, and they might have sometihing that we can both learn from.

Thanks for the kind words.

Cd&
0
Announcing the Most Valuable Experts of 2016

MVEs are more concerned with the satisfaction of those they help than with the considerable points they can earn. They are the types of people you feel privileged to call colleagues. Join us in honoring this amazing group of Experts.

 
LVL 5

Accepted Solution

by:
pmsyyz earned 375 total points
ID: 11924054
There is no easy way to do it.  PDFs are a final output format.  PDFs do not have an internal structure that is a easily changed to HTML.  Every single thing in a PDF has a certain position assigned to it.

Converting a PDF to HTML is best done by hand.  I use the xpdf command line tools to dump PDF content.  http://www.foolabs.com/xpdf/
Win32 command line tools: ftp://ftp.foolabs.com/pub/xpdf/xpdf-3.00-win32.zip
pdftotext will dump the text of the PDF and pdfimages will dump the images.

You can look at what Google does when they try to represent a PDF has HTML.
http://64.233.167.104/search?q=cache:uscis.gov/graphics/formsfee/forms/files/i-9.pdf
it doesn't come out too well.  Every line is absolutely positioned.  Increase your browser's text size to see what I mean, they lines start to overlap.

Of course, if you can get the original content that the PDFs were created from, it would probably be much easier.
0
 
LVL 53

Expert Comment

by:COBOLdinosaur
ID: 11926006
Thanks April.  Maybe we be able to do better on the next one.  Thanks for the A. :^)

Cd&
0
 

Author Comment

by:aprillougheed
ID: 11972984
FYI - for those that come after me . . . .

BCL Magellan answered my question - they were very helpful.  Basically, they explained that since my PDF file used many fonts that are not available in HTML - the results would not look exactly like the PDF file.

Since my clients want the page to look exactly like the PDF - I'll be doing it manually.

Thanks to all.  I love EE.

April
0

Featured Post

Announcing the Most Valuable Experts of 2016

MVEs are more concerned with the satisfaction of those they help than with the considerable points they can earn. They are the types of people you feel privileged to call colleagues. Join us in honoring this amazing group of Experts.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Safari On Gmail only html 12 58
How can I convert HTML files to .pdf files? 9 58
Are there any non javascript based chart/graph solutions? 14 34
key press alert 2 21
This article demonstrates how to create a simple responsive confirmation dialog with Ok and Cancel buttons using HTML, CSS, jQuery and Promises
Is your Office 365 signature not working the way you want it to? Are signature updates taking up too much of your time? Let's run through the most common problems that an IT administrator can encounter when dealing with Office 365 email signatures.
In this tutorial viewers will learn how to style a corner ribbon overlay for an image using CSS Create a new class by typing ".Ribbon":  Define the class' "display:" as "inline-block": Define its "position:" as "relative": Define its "overflow:" as …
In this tutorial viewers will learn how to embed Flash content in a webpage using HTML5. Ensure your DOCTYPE declaration is set to HTML5: "<!DOCTYPE html>": Use the <object> tag to embed Flash content.: To specify that the object is Flash content, d…

825 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question