Link to home
Start Free TrialLog in
Avatar of Stefan Lennerbrant
Stefan LennerbrantFlag for Sweden

asked on

Professional and secure conversion of html to pdf, running on backoffice server

I'm creating large volumes of PDF form documents, by using one single PDF template including form fields and then, for each resulting document, creating a small FDF file with data content and finally running the tool "pdftk" to populate the template with content.
This works very well, when the template is "static" (all field sizes are determined in advance) and when all data is more or less text-only.
The PDF template can, of course, be very complex in layout, initially created as "print to pdf" from any application (like MSWord) and then "prepared" by putting form fields into it with Adobe Pro

However, now I would like to insert more complex data into the document, like graphs and images. Also, it might be interesting to add "free-flowing text" where the field length is not known in advance and the document layout should change to accomodate the text.

Thus, I'm looking at creating my template in HTML instead, and then run a HTML-to-PDF converter to create the resulting PDF document. Thus, no longer using any PDF "template file" at all, and no FDF files either.

I now write this, partly to ask if anyone has got any experience with these tools, and partly to report on my findings.
I will be testing a number of tools, and try to find one that is secure and efficient.

The system this will run in "production environment" (Windows 2008) and it is absolutely necessary that there is no hazzle with large/strange frameworks (like php, for instance) or complex third-part-dependencies.
Or, God forbid, any dependency on client software to be installed on the server, like MSOffice or browsers, etc. Not even speaking about adware :-)
Of course, it is quite OK if the tools cost money :-)

I'm planning to create html files from my application, and then run the tool stand-alone (command-line), creating the resulting PDF document.
It is, however, perfectly OK (and maybe a nice feature) to run the tool as a library as well, and create my own "standalone converter program" to perform the actual conversion. That way, it might be possible to handle limitations in the tool and solve any oddities that may be triggered by strange html input (just looking at MSWord HTML output, for instance, it's obvious that not all renderers can cope with the markup)

My initial list of tools is the following, below.
If anyone has got any recommendation or ideas, or warnings, any such input would be greatly appreciated.
If any of these tools are known to be "no good" (in your experience) for this use, it would be great to hear about it - and then not needing to test it...

Any tools that strike you as "really good" or "not working like this"?
Thanks :-)
Avatar of quizwedge
Flag of United States of America image

I've used web super goo ABCpdf in a .net environment and it works well enough. We used the library and from what I remember, there was a bit of setup, but it's been working well for years. I haven't used a recent version, this is from a project with few updates since 2007 or so.

In a linux environment, I've used wkhtmltopdf. It's a bit quirky - I've found in setting it up, it would just fail because the document went out of margins or the command line parameters were out of order. It's working extremely well now and was the only thing that worked well for us in our linux / PHP environment.

My impression of HTML to PDF is that there's going to be setup / pain in getting it up and running. The trick is to find something that is solid once you get it set up.
Avatar of Stefan Lennerbrant


My first impressions are the following. Please don't hesitate to add any comments :-)

To me, "wkhtmltopdf" seems to be the best choice, with "winnovative" as possible alternative
Unfortunately, "websupergoo/abcpdf" couldn't be tested
  Seems to build on wkhtmltopdf, identical results (regarding conversion to PDF)
  OpenSource, based on WebKit
  Almost acceptable results, lots of configuration settings, seems to be an active community/development
  Ugly kerning in fonts (probably handled with better settings for font selections)
  Library, tested with its demo-app GUI
  Good font use
  Wrong margins/output size  as default (probably possible to handle with settings)
  Library, tested with its demo-app GUI
  Good font use
  Quite slow conversion
  Problems with "fatal errors" during conversion, worked once but not repeated testing
  Library, tested with its demo-app GUI
  Very slow in conversion
  Lots of layout errors in output, elements get completely wrong heights etc, not acceptable results
  Seems to be completely identical to winnovative-software
  Not tested, needs JRE in the production environment
  Was "the best" years ago, but has been sleeping since 2006 up to recently
  Seems to lack support for many html features
  Layout is without styling, not acceptable results
  Library, tested with its demo-app GUI
  Good fonts etc
  Problems with element heights
  Installed as a COM/ActiveX component, seems to be aimed at ASP and similar environments
  Not tested due to COM base. Also no demo supplied
  Not tested. No demo supplied
I would say your thoughts on wkhtmltopdf are correct. Once you've setup your options, the one thing we found in our linux environment was that the process would stay in memory and we had to come around every few minutes and kill the process. I don't know that that is normal and I don't know if it would happen on Windows. Switching to a faster server with an SSD drive dramatically increased performance of building the PDFs.
Some more testing today on wkhtmltopdf identified font selection as a problem area.
I'm sure it can be fixed, but I need to find out what fonts should be specified in the html input, to make wkhtmltopdf really happy.

  font-family:"Courier New"
does not work very well at all. It's converted to slightly different fonts and very different font sizes.
(currently, Arial/Courier New are specified in the MSWord html output, as these two Windows fonts are commonly used in the original document)

Any idea on what sansserif and serif fonts would make life easier for wkhtmltopdf? Especially in a Windows environment?
I just found out that "Verdana" makes kerning work much better in this environment, than with "Arial"
However, font sizes are still to big in the wkhtmltopdf output

Comparing, with the following font use in Word and (checking text "properties" with Adobe Pro) in the corresponding Word "print to PDF writer (Adobe)", I get the following wkhtmltopdf output (checked text "properties" with Adobe Pro).

Word: Arial (bold) 8pt
Word-PDF: Arial,Bold 7.98pt
wk-PDF original: ArialBold 10.08pt (using style='font-family: "Arial"; font-size: 8.0pt')
wk-PDF modif: VerdanaBold 9.36pt (using style='font-family: "Verdana"; font-size: 8.0pt')

Word: Arial (normal) 10pt
Word-PDF: Arial 10.02pt
wk-PDF original: ArialNormal 11.52pt (using style='font-family: "Arial"; font-size: 10.0pt')
wk-PDF modif: VerdanaNormal 11.52pt (using style='font-family: "Verdana"; font-size: 10.0pt')

Word: Courier New (normal) 8pt
Word-PDF: CourierNew 7.98pt
wk-PDF original: CourierNewNormal 10.08pt (using style='font-family:"Courier New"; font-size: 8.0pt')

Hm, how to get smaller (correct) font size?
Avatar of quizwedge
Flag of United States of America image

Link to home
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I'm struggling to find better fonts (or font solutions) to get the same result with wkhtmltopdf as I get with "save as PDF" from MSWord.

The original (in MSWord) uses Arial 10pt just because this is considered the best setup.
Using Arial really makes kerning go berserk when rendering to PDF wit wkhtmltopdf. And changing to Verdana is not good either - even though the fonts are pretty, the size of Arial and Verdana differs (as well as "touch and feel") too much.
And compensating with "decrease with x% using css" seems really complicated, considering all the possible styles used in different documents.

In addition, all fonts get like 20% larger (regardless of font) when comparing MSWord and wkhtmltopdf output. All MSWord 10pt texts are rendered with about 10pt in the MSWord Adobe/PDF output, but about 12pt in the wkhtmltopdf output.
All other "sizes" (layout boxes etc) are properly rendered. It's "only" the fonts that are get large.

I'll continue investigating and searching the internet - but any input from wkhtmltopdf-experienced people is appreciated
I'm been struggling to find info on the internet, but to no real avail.

Testing shows that (in a wkhtmltopdf Windows environment)
- a css "Verdana" font is rendered in PDF as VerdanaRegular with font size conversions like:
    all css sizes 6.4 to 7.8pt converts to PDF 8.64pt
    all css sizes 7.9 to 8.6pt converts to PDF 9.36pt, etc etc for other sizes (about 20% too large)
- same for Arial, CourierNew, TimesNewRoman etc, but slightly different "ranges" and conversions
- same (too large) conversion happens when using css "px" sizes instead of "pt"
- using MSWord "print to PDF using Adobe drivers" use different fonts and much much better sizes (10pt results 10.02pt)
Unfortunately, I'm guessing it's not an issue of finding the perfect font. wkhtmltopdf makes fonts bigger. While there's a lot out there on wkhtmltopdf, I've found that it's hard to find the specific answer you happen to be looking for at the time. If having it work exactly like your current setup is critical, you may want to look at a different solution.

I found that ABCpdf does offer a 30 day demo: You may want to try that.
I haven't tested ABCpdf as I hadn't time to build a demo myself, but reading the details I see that it contains a quite limited HTML parsing engine itself, and otherwise requires a browser (InternetExplorer or Firefox) to be installed on the server, using the browser to render the HTML page.
Perhaps not a very suitable and secure environment on production servers:-)
Thanks for all the input.
I went for wkhtmltopdf. A bit quirky and font handling is so-so, but it works.
Avatar of Peter Jhon
Peter Jhon

I really found link as a useful link but i found problem in conversion when i document has tables and graphs. I created one document for which was to use in conference but there was some mistake and my source document was missed and really i could not extract tables as that was designed.