Script/module for Perl that will let us dynamically generate PDF files on our server

Hello,

We have been struggling with a few different perl modules that are supposed to let us output PDF on our server.  We want our users to be able to generate content with lots of formatting and pictures and then be able to convert it to a downloadable pdf.  So far, everything we have tried out seems to have major problems handling any sort of formatting or graphics.  I've found quite a few of the commercial products out there, but we are looking for something free that we can configure ourselves.  Any suggestions?

Thanks!
nhtahoeAsked:
Who is Participating?
 
Dennis MaederConnect With a Mentor Commented:
With some configuration/mods this should work - http://www.rustyparts.com/pdf.php
This would be installable on your server.
D
0
 
ozoCommented:
Adobe is the main source of commercial products for PDF.
You might output in, say, HTML or Postscript and use Adobe Acrobat to convert it to PDF
0
 
nhtahoeAuthor Commented:
Adobe is the main source of commercial products but I know its not the only way to convert html or postscript to PDF.  For instance, PDFcreator is available for free on Sourceforge (http://sourceforge.net/projects/pdfcreator/) and it allows me to print PDF files on my local machine.

I'm not looking for a desktop solution, rather I'm looking for something I can integrate directly with perl on our server.  I want our script to be able to grab html a user has generated and to convert it to a pdf file.  I can imagine this would work something like this:
1. User enters formatted text and attaches a few pictures in a form.
2. User submits form.
3. User wants pdf version of what they just submitted.
4. Our system takes the html and using our perl script performs operations on it to get it converted into a PDF.  Maybe it will need to convert it to a postcript first, maybe not. As long as the end result is a typical PDF that keeps the original formatting.
0
Cloud Class® Course: Microsoft Windows 7 Basic

This introductory course to Windows 7 environment will teach you about working with the Windows operating system. You will learn about basic functions including start menu; the desktop; managing files, folders, and libraries.

 
Dennis MaederConnect With a Mentor Commented:
Something like this?
http://html2pdf.seven49.net/
D
0
 
nhtahoeAuthor Commented:
http://html2pdf.seven49.net can output pdf files in its demo, so its a good start.  But its $499 for the most basic version and the price goes up as you render more pdf files.  Thats not good.  Like I said, we are looking for a free open source way to generate pdf's using perl and some module(s).
0
 
nhtahoeAuthor Commented:
We are currently trying to use PDF:API2 to generate PDFs but are running into difficulties.

Does anybody have any working PDF::API2 Sample Code that actually works?
0
 
Perl_DiverCommented:
0
 
Dennis MaederConnect With a Mentor Commented:
http://www.rustyparts.com/pdf.php contains a working demo and is free!
D
0
 
nhtahoeAuthor Commented:
Dennis: Thanks for the suggestions, but rustyparts is written in PHP. We want a script in PERL.

Perl_Diver: We'll take a look at the links you sent in more detail later today.

Everyone else: I still haven't gotten any obvious solutions here.  Is there simply no good way to generate PDF files out there that is free?
0
 
ahoffmannCommented:
>  Like I said, we are looking for a free open source way to generate pdf's
you need to get rid of one of these requirements:
   free
   pdf


> We want a script in PERL.
http://www.pdflib.de/
http://www.pdflib.com/de/download/pdflib-familie/pdflib-7/
http://search.cpan.org/search?query=pdflib&mode=all

0
 
Dennis MaederConnect With a Mentor Commented:
I suppose you need denature then
http://sourceforge.net/projects/denature/
"denature is a perl program to convert HTML files to PDF files. It does this through a transformation to XSL-FO which is then passed to the FOP program, from xml.apache.org. "

D
0
 
ahoffmannCommented:
> .. everything we have tried out seems to have major problems handling any sort of formatting or graphics.
dennis_maeder, tell me at least on eproduct, weither fre or comercial or whatever, which has *no* problems with graphics and/or styles :-/
BTW: FOP is Java (and some more), not perl.
0
 
Dennis MaederConnect With a Mentor Commented:
EasySoftware - commercial ($69 one time)
http://www.easysw.com/htmldoc/
See for instructions
http://www.suite101.com/article.cfm/perl/108695
Basically this allows use of perl HTML::HTMLdoc
with a simple call

See
http://www.easysw.com/htmldoc/pdf-o-matic.php
for a pretty flawless html to pdf demo

D
0
 
nhtahoeAuthor Commented:
If you look at: http://www.easysw.com/htmldoc/pdf-o-matic.php
it bombs pretty well on our site at www.paintscratch.com.

Also, this module: http://search.cpan.org/~audreyt/PDF-FromHTML-0.20/lib/PDF/FromHTML.pm
is pretty unstable. Even a <br> in a html doc causes it to fail.

What i really want is a code sample of a working PDF::API2 that makes a page.
I can get this to work:

##############################3
#!/usr/bin/perl
print "Content-type: text/html\n\n";

use PDF::API2;

    $pdf = PDF::API2->new;
    #
    $fnt = $pdf->corefont('Helvetica-Bold');
    #
    $page = $pdf->page;
    $page->mediabox('A4');
    #
    $str= qq~ here is some stuff Some sample text  ~;
    $gfx = $page->gfx;
     $gfx->textlabel(100,700,$fnt,10,$str);
     
    $pdf->saveas('/usr/local/etc/httpd/htdocs/you/pdf/sample.pdf');
    $pdf->end;
#################################################

so just trying to figure out how to take a page of html and make it work. The program just prints  single line of text.
0
 
Dennis MaederConnect With a Mentor Commented:
The reason your page renders badly on easysw is that it hits a noscript tag - and thats the way it views on a browser with scripting disabled. Render google.com to see a relatively simple scenario with graphic.
The underlying problem with PDF::API2 is that although it may be a good PDF writer it has no HTML interpretation functionality. For that you need some sort of render engine like gecko which interprets HTML and makes a representation of the page which in turn can be streamed to a PDF writer. One possibility is PDF::FromHTML,
this works for me for file input but cannot take url directly :-
D


#!/usr/bin/perl
use PDF::API2;
use PDF::FromHTML;
use CGI;
use CGI::Carp qw(fatalsToBrowser);

print "Content-type:text/html\n\n";
print "HTML2PDF<br>";
print "<a href='/tmp/sample.pdf'>view output</a>";

    my $pdf = PDF::FromHTML->new( encoding => 'utf-8' );
    #$pdf->load_file('http://www.paintscratch.com/index.html');
    $pdf->load_file('/var/www/htdocs/index.php');
    $pdf->convert(
        # With PDF::API2, font names such as 'traditional' also works
       #        Font        => 'font.ttf',
        Font        => 'traditional',
        LineHeight  => 10,
        Landscape   => 1,
    );
    $pdf->write_file('/var/www/htdocs/tmp/sample.pdf');

0
 
nhtahoeAuthor Commented:
I got PDF::FromHTML working, but it can only take very simple html. Very little formatting.
For example, if I try the code above (after changing the file paths as shown below, I get this in my error logs and the file is never even created.
If I strip out most of the formatting of a page, even things like <br> for example cause it to crash.
Illegal division by zero at /usr/lib/perl5/site_perl/5.8.0/PDF/FromHTML/Twig.pm line 583.

#!/usr/bin/perl
use PDF::API2;
use PDF::FromHTML;
use CGI;
use CGI::Carp qw(fatalsToBrowser);

print "Content-type:text/html\n\n";
print "HTML2PDF<br>";
print "<a href='http://www.paintscratch.com/you/pdf/sample.pdf'>view output</a>";

    my $pdf = PDF::FromHTML->new( encoding => 'utf-8' );
    #$pdf->load_file('http://www.paintscratch.com/index.html');
    $pdf->load_file('/usr/local/etc/httpd/htdocs/index.html');
    $pdf->convert(
        # With PDF::API2, font names such as 'traditional' also works
       #        Font        => 'font.ttf',
        Font        => 'traditional',
        LineHeight  => 10,
        Landscape   => 1,
    );
    $pdf->write_file('/usr/local/etc/httpd/htdocs/you/pdf/sample.pdf');
0
 
Dennis MaederConnect With a Mentor Commented:
cest la vie!
I'll follow up if there's any further light that dawns on my horizon.
D
0
 
ahoffmannCommented:
> .. but it can only take very simple html. Very little formatting.
Didn't I tell you exactly that?

Following tools do a more or less good job, depending on your input file and depending on your requirements on the output quality:
  FOP 0.20
  FOP 0.92beta
  XSL Formater http://www.antennahouse.com/
  XEP http://www.renderx.com/tools/xep.html
  PDFlib http://www.pdflib.de/

None is a perl solution.

My experiance is that non of the tools manages hyphenation and/or graphics and/or full CSS.
If you feed them with exactly the same input, you get different output. You can improve the output by fiddling arround with the XSLT files, which needs to be adapted separately for each tool.

IMHO, best compromise (quality, performance, costs) is FOP 0.20, but that's also the one which is most tricky to use, and ignores most standards.
0
 
Dennis MaederConnect With a Mentor Commented:
nhtahoe,

In practice, Adobe Acrobat does a brilliant job of exporting from IE and handles even  www.paintscratch.com flawlessly, handling fields and hyperlinks. Again not perl and not free, but if you are not setting up an online factory its a good way to produce a pdf record.

D
0
 
nhtahoeAuthor Commented:
We ended up using http://www.rustyparts.com/pdf.php and got in touch with Jason Rust, the author and he was able to adapt his software to our needs.

I like using PDF::FromHTML for some of our simpler tasks, but I cannot figure out how to put a left margin in. I think it doesn't have the capability.
0
 
Dennis MaederConnect With a Mentor Commented:
For the record:
To do it the Rust way involves a two step process which can be piped together
and uses two components html2ps (which is all perl)
    http://user.it.uu.se/~jan/html2ps.html
and ps2pdf which is a part of GhostScript
    http://www.cs.wisc.edu/~ghost/

To make it one step
    html2ps input.html  | ps2pdf - -  > output.pdf
where each filename should be fully qualified as you don't have user environment (like path) under CGI.

I have tested this so
    html2ps http://www.paintscratch.com/index.html   | ps2pdf - -  > /var/www/html/tmp/paint.pdf
 with he anticipated noscript view. This behavior can be subverted, and margins adjusted, by hacking the html2ps perl script.

D
0
 
Dennis MaederConnect With a Mentor Commented:
Here's a more than useful alternative:

For a gecko render download xulrunner from mozilla
    http://developer.mozilla.org/en/docs/XULRunner
    http://wiki.mozilla.org/XUL:Home_Page
    http://ftp.mozilla.org/pub/mozilla.org/xulrunner/releases/1.8.0.4/linux-i686/en-US/xulrunner-1.8.0.4.en-US.linux-i686.tar.gz
Install using
    ./xulrunner --register-global
then download mozilla2ps from
    http://michele.pupazzo.org/mozilla2ps/download/mozilla2ps-0.3.xulapp
Follow the installation instructions at
    http://michele.pupazzo.org/mozilla2ps/
i.e.
    ./xulrunner --install-app mozilla2ps-0.3.xulapp
alternatively download mozilla2ps-0.3.xulapp into a convenient location e.g. mozilla2ps and
    unzip mozilla2ps-0.3.xulapp
as it is actually just a zip file.

to use it, from that directory
     path_to/xulrunner --app application.ini  http://www.paintscratch.com/index.html  /tmp/paint.ps  ; \
    ps2pdf /tmp/paint.ps /var/www/html/tmp/paint.pdf;
or similar.

The output is just like the print preview of firefox - a complete full graphics page.

D

0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.