Professional and secure conversion of html to pdf, running on backoffice server

Posted on 2014-11-17
Last Modified: 2016-01-13
I'm creating large volumes of PDF form documents, by using one single PDF template including form fields and then, for each resulting document, creating a small FDF file with data content and finally running the tool "pdftk" to populate the template with content.
This works very well, when the template is "static" (all field sizes are determined in advance) and when all data is more or less text-only.
The PDF template can, of course, be very complex in layout, initially created as "print to pdf" from any application (like MSWord) and then "prepared" by putting form fields into it with Adobe Pro

However, now I would like to insert more complex data into the document, like graphs and images. Also, it might be interesting to add "free-flowing text" where the field length is not known in advance and the document layout should change to accomodate the text.

Thus, I'm looking at creating my template in HTML instead, and then run a HTML-to-PDF converter to create the resulting PDF document. Thus, no longer using any PDF "template file" at all, and no FDF files either.

I now write this, partly to ask if anyone has got any experience with these tools, and partly to report on my findings.
I will be testing a number of tools, and try to find one that is secure and efficient.

The system this will run in "production environment" (Windows 2008) and it is absolutely necessary that there is no hazzle with large/strange frameworks (like php, for instance) or complex third-part-dependencies.
Or, God forbid, any dependency on client software to be installed on the server, like MSOffice or browsers, etc. Not even speaking about adware :-)
Of course, it is quite OK if the tools cost money :-)

I'm planning to create html files from my application, and then run the tool stand-alone (command-line), creating the resulting PDF document.
It is, however, perfectly OK (and maybe a nice feature) to run the tool as a library as well, and create my own "standalone converter program" to perform the actual conversion. That way, it might be possible to handle limitations in the tool and solve any oddities that may be triggered by strange html input (just looking at MSWord HTML output, for instance, it's obvious that not all renderers can cope with the markup)

My initial list of tools is the following, below.
If anyone has got any recommendation or ideas, or warnings, any such input would be greatly appreciated.
If any of these tools are known to be "no good" (in your experience) for this use, it would be great to hear about it - and then not needing to test it...

Any tools that strike you as "really good" or "not working like this"?
Thanks :-)
Question by:stefanlennerbrant
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 7
  • 4
LVL 14

Expert Comment

ID: 40447663
I've used web super goo ABCpdf in a .net environment and it works well enough. We used the library and from what I remember, there was a bit of setup, but it's been working well for years. I haven't used a recent version, this is from a project with few updates since 2007 or so.

In a linux environment, I've used wkhtmltopdf. It's a bit quirky - I've found in setting it up, it would just fail because the document went out of margins or the command line parameters were out of order. It's working extremely well now and was the only thing that worked well for us in our linux / PHP environment.

My impression of HTML to PDF is that there's going to be setup / pain in getting it up and running. The trick is to find something that is solid once you get it set up.

Author Comment

ID: 40449435
My first impressions are the following. Please don't hesitate to add any comments :-)

To me, "wkhtmltopdf" seems to be the best choice, with "winnovative" as possible alternative
Unfortunately, "websupergoo/abcpdf" couldn't be tested
  Seems to build on wkhtmltopdf, identical results (regarding conversion to PDF)
  OpenSource, based on WebKit
  Almost acceptable results, lots of configuration settings, seems to be an active community/development
  Ugly kerning in fonts (probably handled with better settings for font selections)
  Library, tested with its demo-app GUI
  Good font use
  Wrong margins/output size  as default (probably possible to handle with settings)
  Library, tested with its demo-app GUI
  Good font use
  Quite slow conversion
  Problems with "fatal errors" during conversion, worked once but not repeated testing
  Library, tested with its demo-app GUI
  Very slow in conversion
  Lots of layout errors in output, elements get completely wrong heights etc, not acceptable results
  Seems to be completely identical to winnovative-software
  Not tested, needs JRE in the production environment
  Was "the best" years ago, but has been sleeping since 2006 up to recently
  Seems to lack support for many html features
  Layout is without styling, not acceptable results
  Library, tested with its demo-app GUI
  Good fonts etc
  Problems with element heights
  Installed as a COM/ActiveX component, seems to be aimed at ASP and similar environments
  Not tested due to COM base. Also no demo supplied
  Not tested. No demo supplied
LVL 14

Expert Comment

ID: 40450189
I would say your thoughts on wkhtmltopdf are correct. Once you've setup your options, the one thing we found in our linux environment was that the process would stay in memory and we had to come around every few minutes and kill the process. I don't know that that is normal and I don't know if it would happen on Windows. Switching to a faster server with an SSD drive dramatically increased performance of building the PDFs.
Microsoft Certification Exam 74-409

Veeam® is happy to provide the Microsoft community with a study guide prepared by MVP and MCT, Orin Thomas. This guide will take you through each of the exam objectives, helping you to prepare for and pass the examination.


Author Comment

ID: 40450571
Some more testing today on wkhtmltopdf identified font selection as a problem area.
I'm sure it can be fixed, but I need to find out what fonts should be specified in the html input, to make wkhtmltopdf really happy.

  font-family:"Courier New"
does not work very well at all. It's converted to slightly different fonts and very different font sizes.
(currently, Arial/Courier New are specified in the MSWord html output, as these two Windows fonts are commonly used in the original document)

Any idea on what sansserif and serif fonts would make life easier for wkhtmltopdf? Especially in a Windows environment?

Author Comment

ID: 40450665
I just found out that "Verdana" makes kerning work much better in this environment, than with "Arial"
However, font sizes are still to big in the wkhtmltopdf output

Comparing, with the following font use in Word and (checking text "properties" with Adobe Pro) in the corresponding Word "print to PDF writer (Adobe)", I get the following wkhtmltopdf output (checked text "properties" with Adobe Pro).

Word: Arial (bold) 8pt
Word-PDF: Arial,Bold 7.98pt
wk-PDF original: ArialBold 10.08pt (using style='font-family: "Arial"; font-size: 8.0pt')
wk-PDF modif: VerdanaBold 9.36pt (using style='font-family: "Verdana"; font-size: 8.0pt')

Word: Arial (normal) 10pt
Word-PDF: Arial 10.02pt
wk-PDF original: ArialNormal 11.52pt (using style='font-family: "Arial"; font-size: 10.0pt')
wk-PDF modif: VerdanaNormal 11.52pt (using style='font-family: "Verdana"; font-size: 10.0pt')

Word: Courier New (normal) 8pt
Word-PDF: CourierNew 7.98pt
wk-PDF original: CourierNewNormal 10.08pt (using style='font-family:"Courier New"; font-size: 8.0pt')

Hm, how to get smaller (correct) font size?
LVL 14

Accepted Solution

quizwedge earned 500 total points
ID: 40450809
Ah, yes. That was one other downfall of wkhtmltopdf. Since our HTML was just for conversion to PDF, we scaled the fonts in CSS as needed to make it look good in PDF. If you need both formats, you may need to pass the page a URL parameter from wkhtmltopdf which alerts you to load a different CSS file with the new font sizes.

For fonts, we used some of Google's fonts and made sure they were installed on the server.

Author Comment

ID: 40452415
I'm struggling to find better fonts (or font solutions) to get the same result with wkhtmltopdf as I get with "save as PDF" from MSWord.

The original (in MSWord) uses Arial 10pt just because this is considered the best setup.
Using Arial really makes kerning go berserk when rendering to PDF wit wkhtmltopdf. And changing to Verdana is not good either - even though the fonts are pretty, the size of Arial and Verdana differs (as well as "touch and feel") too much.
And compensating with "decrease with x% using css" seems really complicated, considering all the possible styles used in different documents.

In addition, all fonts get like 20% larger (regardless of font) when comparing MSWord and wkhtmltopdf output. All MSWord 10pt texts are rendered with about 10pt in the MSWord Adobe/PDF output, but about 12pt in the wkhtmltopdf output.
All other "sizes" (layout boxes etc) are properly rendered. It's "only" the fonts that are get large.

I'll continue investigating and searching the internet - but any input from wkhtmltopdf-experienced people is appreciated

Author Comment

ID: 40455357
I'm been struggling to find info on the internet, but to no real avail.

Testing shows that (in a wkhtmltopdf Windows environment)
- a css "Verdana" font is rendered in PDF as VerdanaRegular with font size conversions like:
    all css sizes 6.4 to 7.8pt converts to PDF 8.64pt
    all css sizes 7.9 to 8.6pt converts to PDF 9.36pt, etc etc for other sizes (about 20% too large)
- same for Arial, CourierNew, TimesNewRoman etc, but slightly different "ranges" and conversions
- same (too large) conversion happens when using css "px" sizes instead of "pt"
- using MSWord "print to PDF using Adobe drivers" use different fonts and much much better sizes (10pt results 10.02pt)
LVL 14

Expert Comment

ID: 40456737
Unfortunately, I'm guessing it's not an issue of finding the perfect font. wkhtmltopdf makes fonts bigger. While there's a lot out there on wkhtmltopdf, I've found that it's hard to find the specific answer you happen to be looking for at the time. If having it work exactly like your current setup is critical, you may want to look at a different solution.

I found that ABCpdf does offer a 30 day demo: You may want to try that.

Author Comment

ID: 40466493
I haven't tested ABCpdf as I hadn't time to build a demo myself, but reading the details I see that it contains a quite limited HTML parsing engine itself, and otherwise requires a browser (InternetExplorer or Firefox) to be installed on the server, using the browser to render the HTML page.
Perhaps not a very suitable and secure environment on production servers:-)

Author Closing Comment

ID: 40606684
Thanks for all the input.
I went for wkhtmltopdf. A bit quirky and font handling is so-so, but it works.

Expert Comment

by:Peter Jhon
ID: 41411844
I really found link as a useful link but i found problem in conversion when i document has tables and graphs. I created one document for which was to use in conference but there was some mistake and my source document was missed and really i could not extract tables as that was designed.

Featured Post

Why Off-Site Backups Are The Only Way To Go

You are probably backing up your data—but how and where? Ransomware is on the rise and there are variants that specifically target backups. Read on to discover why off-site is the way to go.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Originally, this post was published on Monitis Blog, you can check it here . Websites are getting bigger and more complicated by the day. Video, images and custom fonts are all great for showcasing your product or service. But the price to pay in…
When the s#!t hits the fan, you don’t have time to look up who’s on call, draft emails, call collaborators, or send text messages. An instant chat window is definitely the way to go, especially one like HipChat. HipChat is a true business app. An…
HTML5 has deprecated a few of the older ways of showing media as well as offering up a new way to create games and animations. Audio, video, and canvas are just a few of the adjustments made between XHTML and HTML5. As we learned in our last micr…
Video by: Mark
This lesson goes over how to construct ordered and unordered lists and how to create hyperlinks.

705 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question