Improve company productivity with a Business Account.Sign Up

x
?
Solved

PDF to HTML - best method

Posted on 2004-08-28
6
Medium Priority
?
231 Views
Last Modified: 2010-04-09
I get dozens of PDF files from a Printer.  I'm a Web Developer and need to convert the PDF files into HTML and post on the client's web site.

EXAMPLES:
WEB PAGE MADE FROM PDF FILE
http://www.netafim-usa-mining.com/Mining/p-galaxy-disc-kleen.php

ACTUAL PDF FILE
http://www.netafim-usa-mining.com/galaxy.pdf

Is there any easy way to do this?  I've tried lots of methods - including PDF to HTML software - but nothing works.  The images never appear clear and/or the text is in the wrong order.

I don't think there is an easy way to do this - but I wanted to check as I'm charging the client for adding the information to the web using a different source (naming EPS files).

Maybe I should be asking this question under Graphics category.

I'm kind of lost . . . can you give me some help?

Thanks, April

P.S.  We do offer the complete PDF file as a download - but still need the data in an HTML file.
0
Comment
Question by:aprillougheed
  • 3
  • 2
6 Comments
 
LVL 53

Assisted Solution

by:COBOLdinosaur
COBOLdinosaur earned 500 total points
ID: 11923516
0
 

Author Comment

by:aprillougheed
ID: 11923617
Hi.  I tried Magellan and I should have noted this in my question.  All I got was garbage - no pictures and text not in the correct places.  Although in their sales literature they say this should not happen.

I'm going to award you some points though because my goal in posting this question is to find out if I'm missing something obvious.

I'd like to keep the question open just a bit longer - couple of days - to see if anyone else has some ideas.

Also - triggered by your response - I've emailed BCL Magellan's customer support to see if I just didn't use their product correctly.

Thanks, April

P.S.  You've answered a lot of questions for me.   GOOD JOB!!
0
 
LVL 53

Expert Comment

by:COBOLdinosaur
ID: 11923726
Keeping it open is a good idea.  I don't work much with PDFs, but I'm sure some of the other experts in the TA, have had to deal with the same situation, you have, and they might have sometihing that we can both learn from.

Thanks for the kind words.

Cd&
0
What Kind of Coding Program is Right for You?

There are many ways to learn to code these days. From coding bootcamps like Flatiron School to online courses to totally free beginner resources. The best way to learn to code depends on many factors, but the most important one is you. See what course is best for you.

 
LVL 5

Accepted Solution

by:
pmsyyz earned 1500 total points
ID: 11924054
There is no easy way to do it.  PDFs are a final output format.  PDFs do not have an internal structure that is a easily changed to HTML.  Every single thing in a PDF has a certain position assigned to it.

Converting a PDF to HTML is best done by hand.  I use the xpdf command line tools to dump PDF content.  http://www.foolabs.com/xpdf/
Win32 command line tools: ftp://ftp.foolabs.com/pub/xpdf/xpdf-3.00-win32.zip
pdftotext will dump the text of the PDF and pdfimages will dump the images.

You can look at what Google does when they try to represent a PDF has HTML.
http://64.233.167.104/search?q=cache:uscis.gov/graphics/formsfee/forms/files/i-9.pdf
it doesn't come out too well.  Every line is absolutely positioned.  Increase your browser's text size to see what I mean, they lines start to overlap.

Of course, if you can get the original content that the PDFs were created from, it would probably be much easier.
0
 
LVL 53

Expert Comment

by:COBOLdinosaur
ID: 11926006
Thanks April.  Maybe we be able to do better on the next one.  Thanks for the A. :^)

Cd&
0
 

Author Comment

by:aprillougheed
ID: 11972984
FYI - for those that come after me . . . .

BCL Magellan answered my question - they were very helpful.  Basically, they explained that since my PDF file used many fonts that are not available in HTML - the results would not look exactly like the PDF file.

Since my clients want the page to look exactly like the PDF - I'll be doing it manually.

Thanks to all.  I love EE.

April
0

Featured Post

What Kind of Coding Program is Right for You?

There are many ways to learn to code these days. From coding bootcamps like Flatiron School to online courses to totally free beginner resources. The best way to learn to code depends on many factors, but the most important one is you. See what course is best for you.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Finding original email is quite difficult due to their duplicates. From this article, you will come to know why multiple duplicates of same emails appear and how to delete duplicate emails from Outlook securely and instantly while vital emails remai…
The article shows the basic steps of integrating an HTML theme template into an ASP.NET MVC project
In this tutorial viewers will learn how to style transparent/translucent elements using alpha transparency in CSS Start with a normal styled element, such as a div.: Define its "background-color" property as "rgba (255, 255, 255, .5): The numbers in…
In this tutorial viewers will learn how to embed videos in a webpage using HTML5. Ensure your DOCTYPE declaration is set to HTML5: "<!DOCTYPE html>": Use the <video> tag to insert a video. Define the src as the URL of your video; this is similar to …

595 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question