Solved

pdf to html?

Posted on 2006-06-19
9
1,549 Views
Last Modified: 2013-12-20
Running coldfusion mx7, plus php extensions installed.

What I want to do is dynamically create an html document from a pdf file.  I am curious if there is a public API for this type of conversion.  I know that google employs this technology in Gmail for their "view as html" links.  I have a stand alone copy of the pdftohtml opensource project that utilizes ghostwriter, but i want to eliminate that step.  I have > 250 ever changing pdf files that i want to be able to display in my site upon request in the stead of using a embedded pdf (there is a reason, the print button points to a print version of the pdf, whereas the 'web' version has more content [links, reference materials, etc] that i want to just display as html and not give the user the ability to print.  

So the question is, are there any server-side API's that can be used for conversion of pdf's to html on-demand?  examples of use?

most people want to go the other way from html -> pdf (and i love the cfdocument feature for this)... i just want to go the opposite way.



0
Comment
Question by:RussoMA
  • 4
  • 3
  • 2
9 Comments
 
LVL 36

Expert Comment

by:SidFishes
ID: 16937216
you might be able to use this

http://www.adobe.com/products/acrobat/access_onlinetools.html

in combination with cfhttp

(you'd have to check whether adobe's licensing would allow this tho...)
0
 
LVL 36

Accepted Solution

by:
SidFishes earned 250 total points
ID: 16937240
you also might get this (has a basic free version) to work with cfexecute...

http://www.pdf-to-html.com/details.html
0
 

Author Comment

by:RussoMA
ID: 16937713
i like the idea of using cfexecute, here's the problem i am having with it, i send this:

<cfexecute name="c:\pdf\pdftohtml.exe" arguments="-noframes -c c:\inetpub\wwwroot\ordersheet\#rtrim(ordersheet)#.pdf c:\inetpub\wwwroot\ordersheet\#rtrim(ordersheet)#.html"></cfexecute>

the html file is created, but it does not invoke the ghostscript (http://sourceforge.net/projects/ghostscript) to create the image background.

pdftohtml 0.36 (http://pdftohtml.sourceforge.net/) uses ghostscript to create the image used for the background of the single page using the -c tag (complex, instead of text only).

it seems to not invoke that part of pdftohtml, because no png file is created.  

is there something i need to do to declare the path of the ghostscript executable so that CF can call upon that while executing pdftohtml?

i logged the output and it looks the same as it does when it uses ghostscript successfullt (no error messages) just

Page-1

which it sends for each page that it converts.

suggestions?
0
 
LVL 18

Assisted Solution

by:Plucka
Plucka earned 250 total points
ID: 16938994
RussoMA,

Does it work from the command line?

Regards
Plucka
0
Superior storage. Superior surveillance.

WD Purple drives are built for 24/7, always-on, high-definition security systems. With support for up to 8 hard drives and 32 cameras, WD Purple drives are optimized for surveillance.

 

Author Comment

by:RussoMA
ID: 16938998
yes, i can run it from comand line and it creates a png file as the background image
0
 
LVL 18

Expert Comment

by:Plucka
ID: 16939020
Ok,

You will probably find it's to do with paths etc.

Try creating a batch file .bat that takes you can run from the command line with no paramaters and will work.

Then try running this from CFEXECUTE

the benefit of this, is you can change directories etc, within the batch file.
0
 

Author Comment

by:RussoMA
ID: 16939175
ok, i have gotten it to work -
i had not formally installed the ghostscript on that machine (the command line question from plucka lead me down this road of realization), I had only copied the files to the directory on the webserver, and not had the install register whatever it is that now made it work.  [black box that i dont care to dive into today]

but on some of the files, it doesnt show the png and instead shows a background image missing icon (IE is the browser used to test).  if i refresh the page seconds later, the image is shown.

is there a way to make cfexecute pause before displaying the rest of the page?  or is this a server issue that may not be recognizing the file creations fast enough to acknowledge the new image is indeed ready to be served on a cfinclude?



0
 
LVL 18

Expert Comment

by:Plucka
ID: 16939234
This will make coldfusion sleep for a bit

<cfset thread = CreateObject("java", "java.lang.Thread") />
<cfset thread.sleep(5000) />

put these two lines after the <CFEXECUTE and before the display

This is 5000 miliseconds, thus 5 seconds, so you can change it to whatever you like.

CF MX or later.
0
 

Author Comment

by:RussoMA
ID: 16939281
that wraps this up, thanks to the help, i got my answer and i think this question will help out others that seek to convert pdf to html on the fly.  i am splitting the points because sidfishes lead me to use cfexecute and plucka helped me to get it running as well as the pause (thanks)
0

Featured Post

Backup Your Microsoft Windows Server®

Backup all your Microsoft Windows Server – on-premises, in remote locations, in private and hybrid clouds. Your entire Windows Server will be backed up in one easy step with patented, block-level disk imaging. We achieve RTOs (recovery time objectives) as low as 15 seconds.

Join & Write a Comment

I. Introduction In a previous article (http://www.experts-exchange.com/Web_Development/Document_Imaging/A_6537-PaperPort-Upgrade-How-to-download-and-install-updated-versions-of-PaperPort-11-and-12.html) (now deprecated), I discussed how to upgrad…
I. Introduction In a previous article (http://www.experts-exchange.com/Web_Development/Document_Imaging/A_6537-PaperPort-Upgrade-How-to-download-and-install-updated-versions-of-PaperPort-11-and-12.html) (now deprecated), I discussed how to upgrad…
In this second video of the Xpdf series, we discuss and demonstrate the PDFimages utility, which, in a single command, is able to extract all the images from a PDF file and save each one in a separate image file (PBM, PPM, or JPG). Download and inst…
In this video, we show how to perform Bates Numbering/Stamping of PDF documents using Power PDF Advanced, the newest product from the Document Imaging division of Nuance Communications. There are two editions of Power PDF — Standard and Advanced. Th…

757 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

22 Experts available now in Live!

Get 1:1 Help Now