Solved

pdf to html?

Posted on 2006-06-19
9
1,564 Views
Last Modified: 2013-12-20
Running coldfusion mx7, plus php extensions installed.

What I want to do is dynamically create an html document from a pdf file.  I am curious if there is a public API for this type of conversion.  I know that google employs this technology in Gmail for their "view as html" links.  I have a stand alone copy of the pdftohtml opensource project that utilizes ghostwriter, but i want to eliminate that step.  I have > 250 ever changing pdf files that i want to be able to display in my site upon request in the stead of using a embedded pdf (there is a reason, the print button points to a print version of the pdf, whereas the 'web' version has more content [links, reference materials, etc] that i want to just display as html and not give the user the ability to print.  

So the question is, are there any server-side API's that can be used for conversion of pdf's to html on-demand?  examples of use?

most people want to go the other way from html -> pdf (and i love the cfdocument feature for this)... i just want to go the opposite way.



0
Comment
Question by:RussoMA
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
  • 2
9 Comments
 
LVL 36

Expert Comment

by:SidFishes
ID: 16937216
you might be able to use this

http://www.adobe.com/products/acrobat/access_onlinetools.html

in combination with cfhttp

(you'd have to check whether adobe's licensing would allow this tho...)
0
 
LVL 36

Accepted Solution

by:
SidFishes earned 250 total points
ID: 16937240
you also might get this (has a basic free version) to work with cfexecute...

http://www.pdf-to-html.com/details.html
0
 

Author Comment

by:RussoMA
ID: 16937713
i like the idea of using cfexecute, here's the problem i am having with it, i send this:

<cfexecute name="c:\pdf\pdftohtml.exe" arguments="-noframes -c c:\inetpub\wwwroot\ordersheet\#rtrim(ordersheet)#.pdf c:\inetpub\wwwroot\ordersheet\#rtrim(ordersheet)#.html"></cfexecute>

the html file is created, but it does not invoke the ghostscript (http://sourceforge.net/projects/ghostscript) to create the image background.

pdftohtml 0.36 (http://pdftohtml.sourceforge.net/) uses ghostscript to create the image used for the background of the single page using the -c tag (complex, instead of text only).

it seems to not invoke that part of pdftohtml, because no png file is created.  

is there something i need to do to declare the path of the ghostscript executable so that CF can call upon that while executing pdftohtml?

i logged the output and it looks the same as it does when it uses ghostscript successfullt (no error messages) just

Page-1

which it sends for each page that it converts.

suggestions?
0
Will your db performance match your db growth?

In Percona’s white paper “Performance at Scale: Keeping Your Database on Its Toes,” we take a high-level approach to what you need to think about when planning for database scalability.

 
LVL 18

Assisted Solution

by:Plucka
Plucka earned 250 total points
ID: 16938994
RussoMA,

Does it work from the command line?

Regards
Plucka
0
 

Author Comment

by:RussoMA
ID: 16938998
yes, i can run it from comand line and it creates a png file as the background image
0
 
LVL 18

Expert Comment

by:Plucka
ID: 16939020
Ok,

You will probably find it's to do with paths etc.

Try creating a batch file .bat that takes you can run from the command line with no paramaters and will work.

Then try running this from CFEXECUTE

the benefit of this, is you can change directories etc, within the batch file.
0
 

Author Comment

by:RussoMA
ID: 16939175
ok, i have gotten it to work -
i had not formally installed the ghostscript on that machine (the command line question from plucka lead me down this road of realization), I had only copied the files to the directory on the webserver, and not had the install register whatever it is that now made it work.  [black box that i dont care to dive into today]

but on some of the files, it doesnt show the png and instead shows a background image missing icon (IE is the browser used to test).  if i refresh the page seconds later, the image is shown.

is there a way to make cfexecute pause before displaying the rest of the page?  or is this a server issue that may not be recognizing the file creations fast enough to acknowledge the new image is indeed ready to be served on a cfinclude?



0
 
LVL 18

Expert Comment

by:Plucka
ID: 16939234
This will make coldfusion sleep for a bit

<cfset thread = CreateObject("java", "java.lang.Thread") />
<cfset thread.sleep(5000) />

put these two lines after the <CFEXECUTE and before the display

This is 5000 miliseconds, thus 5 seconds, so you can change it to whatever you like.

CF MX or later.
0
 

Author Comment

by:RussoMA
ID: 16939281
that wraps this up, thanks to the help, i got my answer and i think this question will help out others that seek to convert pdf to html on the fly.  i am splitting the points because sidfishes lead me to use cfexecute and plucka helped me to get it running as well as the pause (thanks)
0

Featured Post

Get MongoDB database support online, now!

At Percona’s web store you can order your MongoDB database support needs in minutes. No hassles, no fuss, just pick and click. Pay online with a credit card. Handle your MongoDB database support now!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In a previous article here at Experts Exchange (http://www.experts-exchange.com/articles/18414/Create-a-PDF-file-with-Contact-Sheets-montage-of-thumbnails-for-all-JPG-files-in-a-folder-and-each-of-its-subfolders-using-an-automated-batch-method.html)…
Lease-to-own eliminates the expenditure of hardware replacement and allows you to pay off the server over time. Usually, this is much cheaper than leasing servers. Think of lease-to-own as credit without interest.
This video is the second in a two-part series that discusses PaperPort's "Send To Bar" feature . The first video tutorial (http://www.experts-exchange.com/VP_207.html) explains the purpose of the Send To Bar, how to use it, and how to hide unwanted …
This video Micro Tutorial is the first in a two-part series that shows how to create and use custom scanning profiles in Nuance's PaperPort 14.5 (http://www.experts-exchange.com/articles/17490/). But the ability to create custom scanning profiles al…

624 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question