Solved

pdf to html?

Posted on 2006-06-19
9
1,556 Views
Last Modified: 2013-12-20
Running coldfusion mx7, plus php extensions installed.

What I want to do is dynamically create an html document from a pdf file.  I am curious if there is a public API for this type of conversion.  I know that google employs this technology in Gmail for their "view as html" links.  I have a stand alone copy of the pdftohtml opensource project that utilizes ghostwriter, but i want to eliminate that step.  I have > 250 ever changing pdf files that i want to be able to display in my site upon request in the stead of using a embedded pdf (there is a reason, the print button points to a print version of the pdf, whereas the 'web' version has more content [links, reference materials, etc] that i want to just display as html and not give the user the ability to print.  

So the question is, are there any server-side API's that can be used for conversion of pdf's to html on-demand?  examples of use?

most people want to go the other way from html -> pdf (and i love the cfdocument feature for this)... i just want to go the opposite way.



0
Comment
Question by:RussoMA
  • 4
  • 3
  • 2
9 Comments
 
LVL 36

Expert Comment

by:SidFishes
ID: 16937216
you might be able to use this

http://www.adobe.com/products/acrobat/access_onlinetools.html

in combination with cfhttp

(you'd have to check whether adobe's licensing would allow this tho...)
0
 
LVL 36

Accepted Solution

by:
SidFishes earned 250 total points
ID: 16937240
you also might get this (has a basic free version) to work with cfexecute...

http://www.pdf-to-html.com/details.html
0
 

Author Comment

by:RussoMA
ID: 16937713
i like the idea of using cfexecute, here's the problem i am having with it, i send this:

<cfexecute name="c:\pdf\pdftohtml.exe" arguments="-noframes -c c:\inetpub\wwwroot\ordersheet\#rtrim(ordersheet)#.pdf c:\inetpub\wwwroot\ordersheet\#rtrim(ordersheet)#.html"></cfexecute>

the html file is created, but it does not invoke the ghostscript (http://sourceforge.net/projects/ghostscript) to create the image background.

pdftohtml 0.36 (http://pdftohtml.sourceforge.net/) uses ghostscript to create the image used for the background of the single page using the -c tag (complex, instead of text only).

it seems to not invoke that part of pdftohtml, because no png file is created.  

is there something i need to do to declare the path of the ghostscript executable so that CF can call upon that while executing pdftohtml?

i logged the output and it looks the same as it does when it uses ghostscript successfullt (no error messages) just

Page-1

which it sends for each page that it converts.

suggestions?
0
Ransomware: The New Cyber Threat & How to Stop It

This infographic explains ransomware, type of malware that blocks access to your files or your systems and holds them hostage until a ransom is paid. It also examines the different types of ransomware and explains what you can do to thwart this sinister online threat.  

 
LVL 18

Assisted Solution

by:Plucka
Plucka earned 250 total points
ID: 16938994
RussoMA,

Does it work from the command line?

Regards
Plucka
0
 

Author Comment

by:RussoMA
ID: 16938998
yes, i can run it from comand line and it creates a png file as the background image
0
 
LVL 18

Expert Comment

by:Plucka
ID: 16939020
Ok,

You will probably find it's to do with paths etc.

Try creating a batch file .bat that takes you can run from the command line with no paramaters and will work.

Then try running this from CFEXECUTE

the benefit of this, is you can change directories etc, within the batch file.
0
 

Author Comment

by:RussoMA
ID: 16939175
ok, i have gotten it to work -
i had not formally installed the ghostscript on that machine (the command line question from plucka lead me down this road of realization), I had only copied the files to the directory on the webserver, and not had the install register whatever it is that now made it work.  [black box that i dont care to dive into today]

but on some of the files, it doesnt show the png and instead shows a background image missing icon (IE is the browser used to test).  if i refresh the page seconds later, the image is shown.

is there a way to make cfexecute pause before displaying the rest of the page?  or is this a server issue that may not be recognizing the file creations fast enough to acknowledge the new image is indeed ready to be served on a cfinclude?



0
 
LVL 18

Expert Comment

by:Plucka
ID: 16939234
This will make coldfusion sleep for a bit

<cfset thread = CreateObject("java", "java.lang.Thread") />
<cfset thread.sleep(5000) />

put these two lines after the <CFEXECUTE and before the display

This is 5000 miliseconds, thus 5 seconds, so you can change it to whatever you like.

CF MX or later.
0
 

Author Comment

by:RussoMA
ID: 16939281
that wraps this up, thanks to the help, i got my answer and i think this question will help out others that seek to convert pdf to html on the fly.  i am splitting the points because sidfishes lead me to use cfexecute and plucka helped me to get it running as well as the pause (thanks)
0

Featured Post

PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

PaperPort (http://www.nuance.com/for-individuals/by-product/paperport/index.htm) is among the most important applications that I run on my Windows computers. I use it every day, for nearly all of my document and photo scanning, as well as most of my…
Microsoft Office Picture Manager was included in Office 2003, 2007, and 2010, but not in Office 2013. Users had hopes that it would be in Office 2016/Office 365, but it is not. Fortunately, the same zero-cost technique that works to install it with …
This video is the first in a two-part series that discusses PaperPort's "Send To Bar" feature . This first video tutorial explains the purpose of the Send To Bar, how to use it, and how to hide unwanted items that are automatically created on it whe…
We often encounter PDF files that are pure images, that is, they do not have text characters, but instead contain only raster graphics. The most common causes of this are document scanning software and faxing software/services that create image-only…

820 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question