?
Solved

PDF to HTML - best method

Posted on 2004-08-28
6
Medium Priority
?
222 Views
Last Modified: 2010-04-09
I get dozens of PDF files from a Printer.  I'm a Web Developer and need to convert the PDF files into HTML and post on the client's web site.

EXAMPLES:
WEB PAGE MADE FROM PDF FILE
http://www.netafim-usa-mining.com/Mining/p-galaxy-disc-kleen.php

ACTUAL PDF FILE
http://www.netafim-usa-mining.com/galaxy.pdf

Is there any easy way to do this?  I've tried lots of methods - including PDF to HTML software - but nothing works.  The images never appear clear and/or the text is in the wrong order.

I don't think there is an easy way to do this - but I wanted to check as I'm charging the client for adding the information to the web using a different source (naming EPS files).

Maybe I should be asking this question under Graphics category.

I'm kind of lost . . . can you give me some help?

Thanks, April

P.S.  We do offer the complete PDF file as a download - but still need the data in an HTML file.
0
Comment
Question by:aprillougheed
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
6 Comments
 
LVL 53

Assisted Solution

by:COBOLdinosaur
COBOLdinosaur earned 500 total points
ID: 11923516
0
 

Author Comment

by:aprillougheed
ID: 11923617
Hi.  I tried Magellan and I should have noted this in my question.  All I got was garbage - no pictures and text not in the correct places.  Although in their sales literature they say this should not happen.

I'm going to award you some points though because my goal in posting this question is to find out if I'm missing something obvious.

I'd like to keep the question open just a bit longer - couple of days - to see if anyone else has some ideas.

Also - triggered by your response - I've emailed BCL Magellan's customer support to see if I just didn't use their product correctly.

Thanks, April

P.S.  You've answered a lot of questions for me.   GOOD JOB!!
0
 
LVL 53

Expert Comment

by:COBOLdinosaur
ID: 11923726
Keeping it open is a good idea.  I don't work much with PDFs, but I'm sure some of the other experts in the TA, have had to deal with the same situation, you have, and they might have sometihing that we can both learn from.

Thanks for the kind words.

Cd&
0
WordPress Tutorial 2: Terminology

An important part of learning any new piece of software is understanding the terminology it uses. Thankfully WordPress uses fairly simple names for everything that make it easy to start using the software.

 
LVL 5

Accepted Solution

by:
pmsyyz earned 1500 total points
ID: 11924054
There is no easy way to do it.  PDFs are a final output format.  PDFs do not have an internal structure that is a easily changed to HTML.  Every single thing in a PDF has a certain position assigned to it.

Converting a PDF to HTML is best done by hand.  I use the xpdf command line tools to dump PDF content.  http://www.foolabs.com/xpdf/
Win32 command line tools: ftp://ftp.foolabs.com/pub/xpdf/xpdf-3.00-win32.zip
pdftotext will dump the text of the PDF and pdfimages will dump the images.

You can look at what Google does when they try to represent a PDF has HTML.
http://64.233.167.104/search?q=cache:uscis.gov/graphics/formsfee/forms/files/i-9.pdf
it doesn't come out too well.  Every line is absolutely positioned.  Increase your browser's text size to see what I mean, they lines start to overlap.

Of course, if you can get the original content that the PDFs were created from, it would probably be much easier.
0
 
LVL 53

Expert Comment

by:COBOLdinosaur
ID: 11926006
Thanks April.  Maybe we be able to do better on the next one.  Thanks for the A. :^)

Cd&
0
 

Author Comment

by:aprillougheed
ID: 11972984
FYI - for those that come after me . . . .

BCL Magellan answered my question - they were very helpful.  Basically, they explained that since my PDF file used many fonts that are not available in HTML - the results would not look exactly like the PDF file.

Since my clients want the page to look exactly like the PDF - I'll be doing it manually.

Thanks to all.  I love EE.

April
0

Featured Post

7 Extremely Useful Linux Commands for Beginners

Just getting started with Linux? Here's a quick start guide that has 7 commands that we believe will come in handy.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article describes how to create custom column layout styles for Bootstrap. The article uses 5 columns to illustrate the concept, but the principle can be extended to any number of columns.
This article discusses how to create an extensible mechanism for linked drop downs.
In this tutorial viewers will learn how to style a corner ribbon overlay for an image using CSS Create a new class by typing ".Ribbon":  Define the class' "display:" as "inline-block": Define its "position:" as "relative": Define its "overflow:" as …
HTML5 has deprecated a few of the older ways of showing media as well as offering up a new way to create games and animations. Audio, video, and canvas are just a few of the adjustments made between XHTML and HTML5. As we learned in our last micr…
Suggested Courses

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question