Go Premium for a chance to win a PS4. Enter to Win

x
?
Solved

pdf to html?

Posted on 2006-06-19
9
Medium Priority
?
1,569 Views
Last Modified: 2013-12-20
Running coldfusion mx7, plus php extensions installed.

What I want to do is dynamically create an html document from a pdf file.  I am curious if there is a public API for this type of conversion.  I know that google employs this technology in Gmail for their "view as html" links.  I have a stand alone copy of the pdftohtml opensource project that utilizes ghostwriter, but i want to eliminate that step.  I have > 250 ever changing pdf files that i want to be able to display in my site upon request in the stead of using a embedded pdf (there is a reason, the print button points to a print version of the pdf, whereas the 'web' version has more content [links, reference materials, etc] that i want to just display as html and not give the user the ability to print.  

So the question is, are there any server-side API's that can be used for conversion of pdf's to html on-demand?  examples of use?

most people want to go the other way from html -> pdf (and i love the cfdocument feature for this)... i just want to go the opposite way.



0
Comment
Question by:RussoMA
  • 4
  • 3
  • 2
9 Comments
 
LVL 36

Expert Comment

by:SidFishes
ID: 16937216
you might be able to use this

http://www.adobe.com/products/acrobat/access_onlinetools.html

in combination with cfhttp

(you'd have to check whether adobe's licensing would allow this tho...)
0
 
LVL 36

Accepted Solution

by:
SidFishes earned 1000 total points
ID: 16937240
you also might get this (has a basic free version) to work with cfexecute...

http://www.pdf-to-html.com/details.html
0
 

Author Comment

by:RussoMA
ID: 16937713
i like the idea of using cfexecute, here's the problem i am having with it, i send this:

<cfexecute name="c:\pdf\pdftohtml.exe" arguments="-noframes -c c:\inetpub\wwwroot\ordersheet\#rtrim(ordersheet)#.pdf c:\inetpub\wwwroot\ordersheet\#rtrim(ordersheet)#.html"></cfexecute>

the html file is created, but it does not invoke the ghostscript (http://sourceforge.net/projects/ghostscript) to create the image background.

pdftohtml 0.36 (http://pdftohtml.sourceforge.net/) uses ghostscript to create the image used for the background of the single page using the -c tag (complex, instead of text only).

it seems to not invoke that part of pdftohtml, because no png file is created.  

is there something i need to do to declare the path of the ghostscript executable so that CF can call upon that while executing pdftohtml?

i logged the output and it looks the same as it does when it uses ghostscript successfullt (no error messages) just

Page-1

which it sends for each page that it converts.

suggestions?
0
NFR key for Veeam Agent for Linux

Veeam is happy to provide a free NFR license for one year.  It allows for the non‑production use and valid for five workstations and two servers. Veeam Agent for Linux is a simple backup tool for your Linux installations, both on‑premises and in the public cloud.

 
LVL 18

Assisted Solution

by:Plucka
Plucka earned 1000 total points
ID: 16938994
RussoMA,

Does it work from the command line?

Regards
Plucka
0
 

Author Comment

by:RussoMA
ID: 16938998
yes, i can run it from comand line and it creates a png file as the background image
0
 
LVL 18

Expert Comment

by:Plucka
ID: 16939020
Ok,

You will probably find it's to do with paths etc.

Try creating a batch file .bat that takes you can run from the command line with no paramaters and will work.

Then try running this from CFEXECUTE

the benefit of this, is you can change directories etc, within the batch file.
0
 

Author Comment

by:RussoMA
ID: 16939175
ok, i have gotten it to work -
i had not formally installed the ghostscript on that machine (the command line question from plucka lead me down this road of realization), I had only copied the files to the directory on the webserver, and not had the install register whatever it is that now made it work.  [black box that i dont care to dive into today]

but on some of the files, it doesnt show the png and instead shows a background image missing icon (IE is the browser used to test).  if i refresh the page seconds later, the image is shown.

is there a way to make cfexecute pause before displaying the rest of the page?  or is this a server issue that may not be recognizing the file creations fast enough to acknowledge the new image is indeed ready to be served on a cfinclude?



0
 
LVL 18

Expert Comment

by:Plucka
ID: 16939234
This will make coldfusion sleep for a bit

<cfset thread = CreateObject("java", "java.lang.Thread") />
<cfset thread.sleep(5000) />

put these two lines after the <CFEXECUTE and before the display

This is 5000 miliseconds, thus 5 seconds, so you can change it to whatever you like.

CF MX or later.
0
 

Author Comment

by:RussoMA
ID: 16939281
that wraps this up, thanks to the help, i got my answer and i think this question will help out others that seek to convert pdf to html on the fly.  i am splitting the points because sidfishes lead me to use cfexecute and plucka helped me to get it running as well as the pause (thanks)
0

Featured Post

Free learning courses: Active Directory Deep Dive

Get a firm grasp on your IT environment when you learn Active Directory best practices with Veeam! Watch all, or choose any amount, of this three-part webinar series to improve your skills. From the basics to virtualization and backup, we got you covered.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Microsoft Office Picture Manager is not included in Office 2013. This comes as a shock to users upgrading from earlier versions of Office, such as 2007 and 2010, where Picture Manager was included as a standard application. This article explains how…
Microsoft Office Picture Manager was included in Office 2003, 2007, and 2010, but not in Office 2013. Users had hopes that it would be in Office 2016/Office 365, but it is not. Fortunately, the same zero-cost technique that works to install it with …
In this fourth video of the Xpdf series, we discuss and demonstrate the PDFinfo utility, which retrieves the contents of a PDF's Info Dictionary, as well as some other information, including the page count. We show how to isolate the page count in a…
This video Micro Tutorial shows how to password-protect PDF files with free software. Many software products can do this, such as Adobe Acrobat (but not Adobe Reader), Nuance PaperPort, and Nuance Power PDF, but they are not free products. This vide…

916 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question