Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

PDF to HTML - best method

Posted on 2004-08-28
6
Medium Priority
?
225 Views
Last Modified: 2010-04-09
I get dozens of PDF files from a Printer.  I'm a Web Developer and need to convert the PDF files into HTML and post on the client's web site.

EXAMPLES:
WEB PAGE MADE FROM PDF FILE
http://www.netafim-usa-mining.com/Mining/p-galaxy-disc-kleen.php

ACTUAL PDF FILE
http://www.netafim-usa-mining.com/galaxy.pdf

Is there any easy way to do this?  I've tried lots of methods - including PDF to HTML software - but nothing works.  The images never appear clear and/or the text is in the wrong order.

I don't think there is an easy way to do this - but I wanted to check as I'm charging the client for adding the information to the web using a different source (naming EPS files).

Maybe I should be asking this question under Graphics category.

I'm kind of lost . . . can you give me some help?

Thanks, April

P.S.  We do offer the complete PDF file as a download - but still need the data in an HTML file.
0
Comment
Question by:aprillougheed
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
6 Comments
 
LVL 53

Assisted Solution

by:COBOLdinosaur
COBOLdinosaur earned 500 total points
ID: 11923516
0
 

Author Comment

by:aprillougheed
ID: 11923617
Hi.  I tried Magellan and I should have noted this in my question.  All I got was garbage - no pictures and text not in the correct places.  Although in their sales literature they say this should not happen.

I'm going to award you some points though because my goal in posting this question is to find out if I'm missing something obvious.

I'd like to keep the question open just a bit longer - couple of days - to see if anyone else has some ideas.

Also - triggered by your response - I've emailed BCL Magellan's customer support to see if I just didn't use their product correctly.

Thanks, April

P.S.  You've answered a lot of questions for me.   GOOD JOB!!
0
 
LVL 53

Expert Comment

by:COBOLdinosaur
ID: 11923726
Keeping it open is a good idea.  I don't work much with PDFs, but I'm sure some of the other experts in the TA, have had to deal with the same situation, you have, and they might have sometihing that we can both learn from.

Thanks for the kind words.

Cd&
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
LVL 5

Accepted Solution

by:
pmsyyz earned 1500 total points
ID: 11924054
There is no easy way to do it.  PDFs are a final output format.  PDFs do not have an internal structure that is a easily changed to HTML.  Every single thing in a PDF has a certain position assigned to it.

Converting a PDF to HTML is best done by hand.  I use the xpdf command line tools to dump PDF content.  http://www.foolabs.com/xpdf/
Win32 command line tools: ftp://ftp.foolabs.com/pub/xpdf/xpdf-3.00-win32.zip
pdftotext will dump the text of the PDF and pdfimages will dump the images.

You can look at what Google does when they try to represent a PDF has HTML.
http://64.233.167.104/search?q=cache:uscis.gov/graphics/formsfee/forms/files/i-9.pdf
it doesn't come out too well.  Every line is absolutely positioned.  Increase your browser's text size to see what I mean, they lines start to overlap.

Of course, if you can get the original content that the PDFs were created from, it would probably be much easier.
0
 
LVL 53

Expert Comment

by:COBOLdinosaur
ID: 11926006
Thanks April.  Maybe we be able to do better on the next one.  Thanks for the A. :^)

Cd&
0
 

Author Comment

by:aprillougheed
ID: 11972984
FYI - for those that come after me . . . .

BCL Magellan answered my question - they were very helpful.  Basically, they explained that since my PDF file used many fonts that are not available in HTML - the results would not look exactly like the PDF file.

Since my clients want the page to look exactly like the PDF - I'll be doing it manually.

Thanks to all.  I love EE.

April
0

Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

When it comes to write a Context Sensitive Help (an online help that is obtained from a specific point in state of software to provide help with that state) ,  first we need to make the file that contains all topics, which are given exclusive IDs. …
Q&A with Course Creator, Mark Lassoff, on the importance of HTML5 in the career of a modern-day developer.
In this tutorial viewers will learn how to style a corner ribbon overlay for an image using CSS Create a new class by typing ".Ribbon":  Define the class' "display:" as "inline-block": Define its "position:" as "relative": Define its "overflow:" as …
The viewer will learn the basics of jQuery including how to code hide show and toggles. Reference your jQuery libraries: (CODE) Include your new external js/jQuery file: (CODE) Write your first lines of code to setup your site for jQuery…

636 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question