Solved

pdf to html?

Posted on 2006-06-19
9
1,558 Views
Last Modified: 2013-12-20
Running coldfusion mx7, plus php extensions installed.

What I want to do is dynamically create an html document from a pdf file.  I am curious if there is a public API for this type of conversion.  I know that google employs this technology in Gmail for their "view as html" links.  I have a stand alone copy of the pdftohtml opensource project that utilizes ghostwriter, but i want to eliminate that step.  I have > 250 ever changing pdf files that i want to be able to display in my site upon request in the stead of using a embedded pdf (there is a reason, the print button points to a print version of the pdf, whereas the 'web' version has more content [links, reference materials, etc] that i want to just display as html and not give the user the ability to print.  

So the question is, are there any server-side API's that can be used for conversion of pdf's to html on-demand?  examples of use?

most people want to go the other way from html -> pdf (and i love the cfdocument feature for this)... i just want to go the opposite way.



0
Comment
Question by:RussoMA
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
  • 2
9 Comments
 
LVL 36

Expert Comment

by:SidFishes
ID: 16937216
you might be able to use this

http://www.adobe.com/products/acrobat/access_onlinetools.html

in combination with cfhttp

(you'd have to check whether adobe's licensing would allow this tho...)
0
 
LVL 36

Accepted Solution

by:
SidFishes earned 250 total points
ID: 16937240
you also might get this (has a basic free version) to work with cfexecute...

http://www.pdf-to-html.com/details.html
0
 

Author Comment

by:RussoMA
ID: 16937713
i like the idea of using cfexecute, here's the problem i am having with it, i send this:

<cfexecute name="c:\pdf\pdftohtml.exe" arguments="-noframes -c c:\inetpub\wwwroot\ordersheet\#rtrim(ordersheet)#.pdf c:\inetpub\wwwroot\ordersheet\#rtrim(ordersheet)#.html"></cfexecute>

the html file is created, but it does not invoke the ghostscript (http://sourceforge.net/projects/ghostscript) to create the image background.

pdftohtml 0.36 (http://pdftohtml.sourceforge.net/) uses ghostscript to create the image used for the background of the single page using the -c tag (complex, instead of text only).

it seems to not invoke that part of pdftohtml, because no png file is created.  

is there something i need to do to declare the path of the ghostscript executable so that CF can call upon that while executing pdftohtml?

i logged the output and it looks the same as it does when it uses ghostscript successfullt (no error messages) just

Page-1

which it sends for each page that it converts.

suggestions?
0
Guide to Performance: Optimization & Monitoring

Nowadays, monitoring is a mixture of tools, systems, and codes—making it a very complex process. And with this complexity, comes variables for failure. Get DZone’s new Guide to Performance to learn how to proactively find these variables and solve them before a disruption occurs.

 
LVL 18

Assisted Solution

by:Plucka
Plucka earned 250 total points
ID: 16938994
RussoMA,

Does it work from the command line?

Regards
Plucka
0
 

Author Comment

by:RussoMA
ID: 16938998
yes, i can run it from comand line and it creates a png file as the background image
0
 
LVL 18

Expert Comment

by:Plucka
ID: 16939020
Ok,

You will probably find it's to do with paths etc.

Try creating a batch file .bat that takes you can run from the command line with no paramaters and will work.

Then try running this from CFEXECUTE

the benefit of this, is you can change directories etc, within the batch file.
0
 

Author Comment

by:RussoMA
ID: 16939175
ok, i have gotten it to work -
i had not formally installed the ghostscript on that machine (the command line question from plucka lead me down this road of realization), I had only copied the files to the directory on the webserver, and not had the install register whatever it is that now made it work.  [black box that i dont care to dive into today]

but on some of the files, it doesnt show the png and instead shows a background image missing icon (IE is the browser used to test).  if i refresh the page seconds later, the image is shown.

is there a way to make cfexecute pause before displaying the rest of the page?  or is this a server issue that may not be recognizing the file creations fast enough to acknowledge the new image is indeed ready to be served on a cfinclude?



0
 
LVL 18

Expert Comment

by:Plucka
ID: 16939234
This will make coldfusion sleep for a bit

<cfset thread = CreateObject("java", "java.lang.Thread") />
<cfset thread.sleep(5000) />

put these two lines after the <CFEXECUTE and before the display

This is 5000 miliseconds, thus 5 seconds, so you can change it to whatever you like.

CF MX or later.
0
 

Author Comment

by:RussoMA
ID: 16939281
that wraps this up, thanks to the help, i got my answer and i think this question will help out others that seek to convert pdf to html on the fly.  i am splitting the points because sidfishes lead me to use cfexecute and plucka helped me to get it running as well as the pause (thanks)
0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I. Introduction In a previous article (http://www.experts-exchange.com/Web_Development/Document_Imaging/A_6537-PaperPort-Upgrade-How-to-download-and-install-updated-versions-of-PaperPort-11-and-12.html) (now deprecated), I discussed how to upgrad…
This article shows how to convert a multi-page PDF file into multiple image files, with one image file created for each page of the PDF. It does this by utilizing an excellent, free software package called GraphicsMagick. The solution is amazingly s…
We often encounter PDF files that are pure images, that is, they do not have text characters, but instead contain only raster graphics. The most common causes of this are document scanning software and faxing software/services that create image-only…
In this sixth video of the Xpdf series, we discuss and demonstrate the PDFtoPNG utility, which converts a multi-page PDF file to separate color, grayscale, or monochrome PNG files, creating one PNG file for each page in the PDF. It does this via a c…

738 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question