[Last Call] Learn how to a build a cloud-first strategyRegister Now

x
?
Solved

PDF to HTML - best method

Posted on 2004-08-28
6
Medium Priority
?
227 Views
Last Modified: 2010-04-09
I get dozens of PDF files from a Printer.  I'm a Web Developer and need to convert the PDF files into HTML and post on the client's web site.

EXAMPLES:
WEB PAGE MADE FROM PDF FILE
http://www.netafim-usa-mining.com/Mining/p-galaxy-disc-kleen.php

ACTUAL PDF FILE
http://www.netafim-usa-mining.com/galaxy.pdf

Is there any easy way to do this?  I've tried lots of methods - including PDF to HTML software - but nothing works.  The images never appear clear and/or the text is in the wrong order.

I don't think there is an easy way to do this - but I wanted to check as I'm charging the client for adding the information to the web using a different source (naming EPS files).

Maybe I should be asking this question under Graphics category.

I'm kind of lost . . . can you give me some help?

Thanks, April

P.S.  We do offer the complete PDF file as a download - but still need the data in an HTML file.
0
Comment
Question by:aprillougheed
  • 3
  • 2
6 Comments
 
LVL 53

Assisted Solution

by:COBOLdinosaur
COBOLdinosaur earned 500 total points
ID: 11923516
0
 

Author Comment

by:aprillougheed
ID: 11923617
Hi.  I tried Magellan and I should have noted this in my question.  All I got was garbage - no pictures and text not in the correct places.  Although in their sales literature they say this should not happen.

I'm going to award you some points though because my goal in posting this question is to find out if I'm missing something obvious.

I'd like to keep the question open just a bit longer - couple of days - to see if anyone else has some ideas.

Also - triggered by your response - I've emailed BCL Magellan's customer support to see if I just didn't use their product correctly.

Thanks, April

P.S.  You've answered a lot of questions for me.   GOOD JOB!!
0
 
LVL 53

Expert Comment

by:COBOLdinosaur
ID: 11923726
Keeping it open is a good idea.  I don't work much with PDFs, but I'm sure some of the other experts in the TA, have had to deal with the same situation, you have, and they might have sometihing that we can both learn from.

Thanks for the kind words.

Cd&
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 5

Accepted Solution

by:
pmsyyz earned 1500 total points
ID: 11924054
There is no easy way to do it.  PDFs are a final output format.  PDFs do not have an internal structure that is a easily changed to HTML.  Every single thing in a PDF has a certain position assigned to it.

Converting a PDF to HTML is best done by hand.  I use the xpdf command line tools to dump PDF content.  http://www.foolabs.com/xpdf/
Win32 command line tools: ftp://ftp.foolabs.com/pub/xpdf/xpdf-3.00-win32.zip
pdftotext will dump the text of the PDF and pdfimages will dump the images.

You can look at what Google does when they try to represent a PDF has HTML.
http://64.233.167.104/search?q=cache:uscis.gov/graphics/formsfee/forms/files/i-9.pdf
it doesn't come out too well.  Every line is absolutely positioned.  Increase your browser's text size to see what I mean, they lines start to overlap.

Of course, if you can get the original content that the PDFs were created from, it would probably be much easier.
0
 
LVL 53

Expert Comment

by:COBOLdinosaur
ID: 11926006
Thanks April.  Maybe we be able to do better on the next one.  Thanks for the A. :^)

Cd&
0
 

Author Comment

by:aprillougheed
ID: 11972984
FYI - for those that come after me . . . .

BCL Magellan answered my question - they were very helpful.  Basically, they explained that since my PDF file used many fonts that are not available in HTML - the results would not look exactly like the PDF file.

Since my clients want the page to look exactly like the PDF - I'll be doing it manually.

Thanks to all.  I love EE.

April
0

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Building a website can seem like a daunting task to the uninitiated but it really only requires knowledge of two basic languages: HTML and CSS.
Originally, this post was published on Monitis Blog, you can check it here . In business circles, we sometimes hear that today is the “age of the customer.” And so it is. Thanks to the enormous advances over the past few years in consumer techno…
In this tutorial viewers will learn how to style a corner ribbon overlay for an image using CSS Create a new class by typing ".Ribbon":  Define the class' "display:" as "inline-block": Define its "position:" as "relative": Define its "overflow:" as …
In this tutorial viewers will learn how to embed videos in a webpage using HTML5. Ensure your DOCTYPE declaration is set to HTML5: "<!DOCTYPE html>": Use the <video> tag to insert a video. Define the src as the URL of your video; this is similar to …
Suggested Courses

834 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question