Solved

Script/module for Perl that will let us dynamically generate PDF files on our server

Posted on 2006-11-15
22
1,016 Views
Last Modified: 2013-12-20
Hello,

We have been struggling with a few different perl modules that are supposed to let us output PDF on our server.  We want our users to be able to generate content with lots of formatting and pictures and then be able to convert it to a downloadable pdf.  So far, everything we have tried out seems to have major problems handling any sort of formatting or graphics.  I've found quite a few of the commercial products out there, but we are looking for something free that we can configure ourselves.  Any suggestions?

Thanks!
0
Comment
Question by:nhtahoe
  • 10
  • 7
  • 3
  • +2
22 Comments
 
LVL 84

Expert Comment

by:ozo
ID: 17954098
Adobe is the main source of commercial products for PDF.
You might output in, say, HTML or Postscript and use Adobe Acrobat to convert it to PDF
0
 

Author Comment

by:nhtahoe
ID: 17954122
Adobe is the main source of commercial products but I know its not the only way to convert html or postscript to PDF.  For instance, PDFcreator is available for free on Sourceforge (http://sourceforge.net/projects/pdfcreator/) and it allows me to print PDF files on my local machine.

I'm not looking for a desktop solution, rather I'm looking for something I can integrate directly with perl on our server.  I want our script to be able to grab html a user has generated and to convert it to a pdf file.  I can imagine this would work something like this:
1. User enters formatted text and attaches a few pictures in a form.
2. User submits form.
3. User wants pdf version of what they just submitted.
4. Our system takes the html and using our perl script performs operations on it to get it converted into a PDF.  Maybe it will need to convert it to a postcript first, maybe not. As long as the end result is a typical PDF that keeps the original formatting.
0
 
LVL 10

Assisted Solution

by:dennis_maeder
dennis_maeder earned 500 total points
ID: 17958077
Something like this?
http://html2pdf.seven49.net/
D
0
 
LVL 10

Accepted Solution

by:
dennis_maeder earned 500 total points
ID: 17958155
With some configuration/mods this should work - http://www.rustyparts.com/pdf.php
This would be installable on your server.
D
0
 

Author Comment

by:nhtahoe
ID: 17958318
http://html2pdf.seven49.net can output pdf files in its demo, so its a good start.  But its $499 for the most basic version and the price goes up as you render more pdf files.  Thats not good.  Like I said, we are looking for a free open source way to generate pdf's using perl and some module(s).
0
 

Author Comment

by:nhtahoe
ID: 17958338
We are currently trying to use PDF:API2 to generate PDFs but are running into difficulties.

Does anybody have any working PDF::API2 Sample Code that actually works?
0
 
LVL 8

Expert Comment

by:Perl_Diver
ID: 17958469
0
 
LVL 10

Assisted Solution

by:dennis_maeder
dennis_maeder earned 500 total points
ID: 17958535
http://www.rustyparts.com/pdf.php contains a working demo and is free!
D
0
 

Author Comment

by:nhtahoe
ID: 17959660
Dennis: Thanks for the suggestions, but rustyparts is written in PHP. We want a script in PERL.

Perl_Diver: We'll take a look at the links you sent in more detail later today.

Everyone else: I still haven't gotten any obvious solutions here.  Is there simply no good way to generate PDF files out there that is free?
0
 
LVL 51

Expert Comment

by:ahoffmann
ID: 17960578
>  Like I said, we are looking for a free open source way to generate pdf's
you need to get rid of one of these requirements:
   free
   pdf


> We want a script in PERL.
http://www.pdflib.de/
http://www.pdflib.com/de/download/pdflib-familie/pdflib-7/
http://search.cpan.org/search?query=pdflib&mode=all

0
 
LVL 10

Assisted Solution

by:dennis_maeder
dennis_maeder earned 500 total points
ID: 17960990
I suppose you need denature then
http://sourceforge.net/projects/denature/
"denature is a perl program to convert HTML files to PDF files. It does this through a transformation to XSL-FO which is then passed to the FOP program, from xml.apache.org. "

D
0
Top 6 Sources for Identifying Threat Actor TTPs

Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

 
LVL 51

Expert Comment

by:ahoffmann
ID: 17961036
> .. everything we have tried out seems to have major problems handling any sort of formatting or graphics.
dennis_maeder, tell me at least on eproduct, weither fre or comercial or whatever, which has *no* problems with graphics and/or styles :-/
BTW: FOP is Java (and some more), not perl.
0
 
LVL 10

Assisted Solution

by:dennis_maeder
dennis_maeder earned 500 total points
ID: 17961272
EasySoftware - commercial ($69 one time)
http://www.easysw.com/htmldoc/
See for instructions
http://www.suite101.com/article.cfm/perl/108695
Basically this allows use of perl HTML::HTMLdoc
with a simple call

See
http://www.easysw.com/htmldoc/pdf-o-matic.php
for a pretty flawless html to pdf demo

D
0
 

Author Comment

by:nhtahoe
ID: 17961423
If you look at: http://www.easysw.com/htmldoc/pdf-o-matic.php
it bombs pretty well on our site at www.paintscratch.com.

Also, this module: http://search.cpan.org/~audreyt/PDF-FromHTML-0.20/lib/PDF/FromHTML.pm
is pretty unstable. Even a <br> in a html doc causes it to fail.

What i really want is a code sample of a working PDF::API2 that makes a page.
I can get this to work:

##############################3
#!/usr/bin/perl
print "Content-type: text/html\n\n";

use PDF::API2;

    $pdf = PDF::API2->new;
    #
    $fnt = $pdf->corefont('Helvetica-Bold');
    #
    $page = $pdf->page;
    $page->mediabox('A4');
    #
    $str= qq~ here is some stuff Some sample text  ~;
    $gfx = $page->gfx;
     $gfx->textlabel(100,700,$fnt,10,$str);
     
    $pdf->saveas('/usr/local/etc/httpd/htdocs/you/pdf/sample.pdf');
    $pdf->end;
#################################################

so just trying to figure out how to take a page of html and make it work. The program just prints  single line of text.
0
 
LVL 10

Assisted Solution

by:dennis_maeder
dennis_maeder earned 500 total points
ID: 17962110
The reason your page renders badly on easysw is that it hits a noscript tag - and thats the way it views on a browser with scripting disabled. Render google.com to see a relatively simple scenario with graphic.
The underlying problem with PDF::API2 is that although it may be a good PDF writer it has no HTML interpretation functionality. For that you need some sort of render engine like gecko which interprets HTML and makes a representation of the page which in turn can be streamed to a PDF writer. One possibility is PDF::FromHTML,
this works for me for file input but cannot take url directly :-
D


#!/usr/bin/perl
use PDF::API2;
use PDF::FromHTML;
use CGI;
use CGI::Carp qw(fatalsToBrowser);

print "Content-type:text/html\n\n";
print "HTML2PDF<br>";
print "<a href='/tmp/sample.pdf'>view output</a>";

    my $pdf = PDF::FromHTML->new( encoding => 'utf-8' );
    #$pdf->load_file('http://www.paintscratch.com/index.html');
    $pdf->load_file('/var/www/htdocs/index.php');
    $pdf->convert(
        # With PDF::API2, font names such as 'traditional' also works
       #        Font        => 'font.ttf',
        Font        => 'traditional',
        LineHeight  => 10,
        Landscape   => 1,
    );
    $pdf->write_file('/var/www/htdocs/tmp/sample.pdf');

0
 

Author Comment

by:nhtahoe
ID: 17962173
I got PDF::FromHTML working, but it can only take very simple html. Very little formatting.
For example, if I try the code above (after changing the file paths as shown below, I get this in my error logs and the file is never even created.
If I strip out most of the formatting of a page, even things like <br> for example cause it to crash.
Illegal division by zero at /usr/lib/perl5/site_perl/5.8.0/PDF/FromHTML/Twig.pm line 583.

#!/usr/bin/perl
use PDF::API2;
use PDF::FromHTML;
use CGI;
use CGI::Carp qw(fatalsToBrowser);

print "Content-type:text/html\n\n";
print "HTML2PDF<br>";
print "<a href='http://www.paintscratch.com/you/pdf/sample.pdf'>view output</a>";

    my $pdf = PDF::FromHTML->new( encoding => 'utf-8' );
    #$pdf->load_file('http://www.paintscratch.com/index.html');
    $pdf->load_file('/usr/local/etc/httpd/htdocs/index.html');
    $pdf->convert(
        # With PDF::API2, font names such as 'traditional' also works
       #        Font        => 'font.ttf',
        Font        => 'traditional',
        LineHeight  => 10,
        Landscape   => 1,
    );
    $pdf->write_file('/usr/local/etc/httpd/htdocs/you/pdf/sample.pdf');
0
 
LVL 10

Assisted Solution

by:dennis_maeder
dennis_maeder earned 500 total points
ID: 17962486
cest la vie!
I'll follow up if there's any further light that dawns on my horizon.
D
0
 
LVL 51

Expert Comment

by:ahoffmann
ID: 17967532
> .. but it can only take very simple html. Very little formatting.
Didn't I tell you exactly that?

Following tools do a more or less good job, depending on your input file and depending on your requirements on the output quality:
  FOP 0.20
  FOP 0.92beta
  XSL Formater http://www.antennahouse.com/
  XEP http://www.renderx.com/tools/xep.html
  PDFlib http://www.pdflib.de/

None is a perl solution.

My experiance is that non of the tools manages hyphenation and/or graphics and/or full CSS.
If you feed them with exactly the same input, you get different output. You can improve the output by fiddling arround with the XSLT files, which needs to be adapted separately for each tool.

IMHO, best compromise (quality, performance, costs) is FOP 0.20, but that's also the one which is most tricky to use, and ignores most standards.
0
 
LVL 10

Assisted Solution

by:dennis_maeder
dennis_maeder earned 500 total points
ID: 17968133
nhtahoe,

In practice, Adobe Acrobat does a brilliant job of exporting from IE and handles even  www.paintscratch.com flawlessly, handling fields and hyperlinks. Again not perl and not free, but if you are not setting up an online factory its a good way to produce a pdf record.

D
0
 

Author Comment

by:nhtahoe
ID: 18007857
We ended up using http://www.rustyparts.com/pdf.php and got in touch with Jason Rust, the author and he was able to adapt his software to our needs.

I like using PDF::FromHTML for some of our simpler tasks, but I cannot figure out how to put a left margin in. I think it doesn't have the capability.
0
 
LVL 10

Assisted Solution

by:dennis_maeder
dennis_maeder earned 500 total points
ID: 18019535
For the record:
To do it the Rust way involves a two step process which can be piped together
and uses two components html2ps (which is all perl)
    http://user.it.uu.se/~jan/html2ps.html
and ps2pdf which is a part of GhostScript
    http://www.cs.wisc.edu/~ghost/

To make it one step
    html2ps input.html  | ps2pdf - -  > output.pdf
where each filename should be fully qualified as you don't have user environment (like path) under CGI.

I have tested this so
    html2ps http://www.paintscratch.com/index.html   | ps2pdf - -  > /var/www/html/tmp/paint.pdf
 with he anticipated noscript view. This behavior can be subverted, and margins adjusted, by hacking the html2ps perl script.

D
0
 
LVL 10

Assisted Solution

by:dennis_maeder
dennis_maeder earned 500 total points
ID: 18021733
Here's a more than useful alternative:

For a gecko render download xulrunner from mozilla
    http://developer.mozilla.org/en/docs/XULRunner
    http://wiki.mozilla.org/XUL:Home_Page
    http://ftp.mozilla.org/pub/mozilla.org/xulrunner/releases/1.8.0.4/linux-i686/en-US/xulrunner-1.8.0.4.en-US.linux-i686.tar.gz
Install using
    ./xulrunner --register-global
then download mozilla2ps from
    http://michele.pupazzo.org/mozilla2ps/download/mozilla2ps-0.3.xulapp
Follow the installation instructions at
    http://michele.pupazzo.org/mozilla2ps/
i.e.
    ./xulrunner --install-app mozilla2ps-0.3.xulapp
alternatively download mozilla2ps-0.3.xulapp into a convenient location e.g. mozilla2ps and
    unzip mozilla2ps-0.3.xulapp
as it is actually just a zip file.

to use it, from that directory
     path_to/xulrunner --app application.ini  http://www.paintscratch.com/index.html  /tmp/paint.ps  ; \
    ps2pdf /tmp/paint.ps /var/www/html/tmp/paint.pdf;
or similar.

The output is just like the print preview of firefox - a complete full graphics page.

D

0

Featured Post

Enabling OSINT in Activity Based Intelligence

Activity based intelligence (ABI) requires access to all available sources of data. Recorded Future allows analysts to observe structured data on the open, deep, and dark web.

Join & Write a Comment

This article discusses the PaperPort 14 Scanner Connection Tool, which Nuance provides at no charge in order to fix scanning problems in Windows 8. Furthermore, users of PaperPort 14 in Windows 7 and Windows 10 have reported that the tool works in t…
In a previously published article (http://www.experts-exchange.com/articles/10331/Automatic-Duplex-Scanning-in-PaperPort-Versions-11-12-14.html) here at Experts Exchange, I explained how to achieve duplex (double-sided) scanning in Nuance's PaperPor…
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.
This video Micro Tutorial is the first in a two-part series that shows how to create and use custom scanning profiles in Nuance's PaperPort 14.5 (http://www.experts-exchange.com/articles/17490/). But the ability to create custom scanning profiles al…

743 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now