Professional and secure conversion of html to pdf, running on backoffice server

Posted on 2014-11-17
Medium Priority
Last Modified: 2016-01-13
I'm creating large volumes of PDF form documents, by using one single PDF template including form fields and then, for each resulting document, creating a small FDF file with data content and finally running the tool "pdftk" to populate the template with content.
This works very well, when the template is "static" (all field sizes are determined in advance) and when all data is more or less text-only.
The PDF template can, of course, be very complex in layout, initially created as "print to pdf" from any application (like MSWord) and then "prepared" by putting form fields into it with Adobe Pro

However, now I would like to insert more complex data into the document, like graphs and images. Also, it might be interesting to add "free-flowing text" where the field length is not known in advance and the document layout should change to accomodate the text.

Thus, I'm looking at creating my template in HTML instead, and then run a HTML-to-PDF converter to create the resulting PDF document. Thus, no longer using any PDF "template file" at all, and no FDF files either.

I now write this, partly to ask if anyone has got any experience with these tools, and partly to report on my findings.
I will be testing a number of tools, and try to find one that is secure and efficient.

The system this will run in "production environment" (Windows 2008) and it is absolutely necessary that there is no hazzle with large/strange frameworks (like php, for instance) or complex third-part-dependencies.
Or, God forbid, any dependency on client software to be installed on the server, like MSOffice or browsers, etc. Not even speaking about adware :-)
Of course, it is quite OK if the tools cost money :-)

I'm planning to create html files from my application, and then run the tool stand-alone (command-line), creating the resulting PDF document.
It is, however, perfectly OK (and maybe a nice feature) to run the tool as a library as well, and create my own "standalone converter program" to perform the actual conversion. That way, it might be possible to handle limitations in the tool and solve any oddities that may be triggered by strange html input (just looking at MSWord HTML output, for instance, it's obvious that not all renderers can cope with the markup)

My initial list of tools is the following, below.
If anyone has got any recommendation or ideas, or warnings, any such input would be greatly appreciated.
If any of these tools are known to be "no good" (in your experience) for this use, it would be great to hear about it - and then not needing to test it...


Any tools that strike you as "really good" or "not working like this"?
Thanks :-)
Question by:stefanlennerbrant
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 7
  • 4
LVL 14

Expert Comment

ID: 40447663
I've used web super goo ABCpdf in a .net environment and it works well enough. We used the library and from what I remember, there was a bit of setup, but it's been working well for years. I haven't used a recent version, this is from a project with few updates since 2007 or so.

In a linux environment, I've used wkhtmltopdf. It's a bit quirky - I've found in setting it up, it would just fail because the document went out of margins or the command line parameters were out of order. It's working extremely well now and was the only thing that worked well for us in our linux / PHP environment.

My impression of HTML to PDF is that there's going to be setup / pain in getting it up and running. The trick is to find something that is solid once you get it set up.

Author Comment

ID: 40449435
My first impressions are the following. Please don't hesitate to add any comments :-)

To me, "wkhtmltopdf" seems to be the best choice, with "winnovative" as possible alternative
Unfortunately, "websupergoo/abcpdf" couldn't be tested

  Seems to build on wkhtmltopdf, identical results (regarding conversion to PDF)

  OpenSource, based on WebKit
  Almost acceptable results, lots of configuration settings, seems to be an active community/development
  Ugly kerning in fonts (probably handled with better settings for font selections)

  Library, tested with its demo-app GUI
  Good font use
  Wrong margins/output size  as default (probably possible to handle with settings)

  Library, tested with its demo-app GUI
  Good font use
  Quite slow conversion
  Problems with "fatal errors" during conversion, worked once but not repeated testing

  Library, tested with its demo-app GUI
  Very slow in conversion
  Lots of layout errors in output, elements get completely wrong heights etc, not acceptable results

  Seems to be completely identical to winnovative-software

  Not tested, needs JRE in the production environment

  Was "the best" years ago, but has been sleeping since 2006 up to recently
  Seems to lack support for many html features
  Layout is without styling, not acceptable results

  Library, tested with its demo-app GUI
  Good fonts etc
  Problems with element heights

  Installed as a COM/ActiveX component, seems to be aimed at ASP and similar environments
  Not tested due to COM base. Also no demo supplied

  Not tested. No demo supplied
LVL 14

Expert Comment

ID: 40450189
I would say your thoughts on wkhtmltopdf are correct. Once you've setup your options, the one thing we found in our linux environment was that the process would stay in memory and we had to come around every few minutes and kill the process. I don't know that that is normal and I don't know if it would happen on Windows. Switching to a faster server with an SSD drive dramatically increased performance of building the PDFs.

We will be discussing what Azure Stack is, how does it fit into the suit of offerings that Azure has currently, and where can it fit into your organizations technology stack. We will also be discussing limitations of the platform while covering various applicable scenarios.


Author Comment

ID: 40450571
Some more testing today on wkhtmltopdf identified font selection as a problem area.
I'm sure it can be fixed, but I need to find out what fonts should be specified in the html input, to make wkhtmltopdf really happy.

  font-family:"Courier New"
does not work very well at all. It's converted to slightly different fonts and very different font sizes.
(currently, Arial/Courier New are specified in the MSWord html output, as these two Windows fonts are commonly used in the original document)

Any idea on what sansserif and serif fonts would make life easier for wkhtmltopdf? Especially in a Windows environment?

Author Comment

ID: 40450665
I just found out that "Verdana" makes kerning work much better in this environment, than with "Arial"
However, font sizes are still to big in the wkhtmltopdf output

Comparing, with the following font use in Word and (checking text "properties" with Adobe Pro) in the corresponding Word "print to PDF writer (Adobe)", I get the following wkhtmltopdf output (checked text "properties" with Adobe Pro).

Word: Arial (bold) 8pt
Word-PDF: Arial,Bold 7.98pt
wk-PDF original: ArialBold 10.08pt (using style='font-family: "Arial"; font-size: 8.0pt')
wk-PDF modif: VerdanaBold 9.36pt (using style='font-family: "Verdana"; font-size: 8.0pt')

Word: Arial (normal) 10pt
Word-PDF: Arial 10.02pt
wk-PDF original: ArialNormal 11.52pt (using style='font-family: "Arial"; font-size: 10.0pt')
wk-PDF modif: VerdanaNormal 11.52pt (using style='font-family: "Verdana"; font-size: 10.0pt')

Word: Courier New (normal) 8pt
Word-PDF: CourierNew 7.98pt
wk-PDF original: CourierNewNormal 10.08pt (using style='font-family:"Courier New"; font-size: 8.0pt')

Hm, how to get smaller (correct) font size?
LVL 14

Accepted Solution

quizwedge earned 2000 total points
ID: 40450809
Ah, yes. That was one other downfall of wkhtmltopdf. Since our HTML was just for conversion to PDF, we scaled the fonts in CSS as needed to make it look good in PDF. If you need both formats, you may need to pass the page a URL parameter from wkhtmltopdf which alerts you to load a different CSS file with the new font sizes.

For fonts, we used some of Google's fonts and made sure they were installed on the server.

Author Comment

ID: 40452415
I'm struggling to find better fonts (or font solutions) to get the same result with wkhtmltopdf as I get with "save as PDF" from MSWord.

The original (in MSWord) uses Arial 10pt just because this is considered the best setup.
Using Arial really makes kerning go berserk when rendering to PDF wit wkhtmltopdf. And changing to Verdana is not good either - even though the fonts are pretty, the size of Arial and Verdana differs (as well as "touch and feel") too much.
And compensating with "decrease with x% using css" seems really complicated, considering all the possible styles used in different documents.

In addition, all fonts get like 20% larger (regardless of font) when comparing MSWord and wkhtmltopdf output. All MSWord 10pt texts are rendered with about 10pt in the MSWord Adobe/PDF output, but about 12pt in the wkhtmltopdf output.
All other "sizes" (layout boxes etc) are properly rendered. It's "only" the fonts that are get large.

I'll continue investigating and searching the internet - but any input from wkhtmltopdf-experienced people is appreciated

Author Comment

ID: 40455357
I'm been struggling to find info on the internet, but to no real avail.

Testing shows that (in a wkhtmltopdf Windows environment)
- a css "Verdana" font is rendered in PDF as VerdanaRegular with font size conversions like:
    all css sizes 6.4 to 7.8pt converts to PDF 8.64pt
    all css sizes 7.9 to 8.6pt converts to PDF 9.36pt, etc etc for other sizes (about 20% too large)
- same for Arial, CourierNew, TimesNewRoman etc, but slightly different "ranges" and conversions
- same (too large) conversion happens when using css "px" sizes instead of "pt"
- using MSWord "print to PDF using Adobe drivers" use different fonts and much much better sizes (10pt results 10.02pt)
LVL 14

Expert Comment

ID: 40456737
Unfortunately, I'm guessing it's not an issue of finding the perfect font. wkhtmltopdf makes fonts bigger. While there's a lot out there on wkhtmltopdf, I've found that it's hard to find the specific answer you happen to be looking for at the time. If having it work exactly like your current setup is critical, you may want to look at a different solution.

I found that ABCpdf does offer a 30 day demo: http://www.websupergoo.com/download.htm#pd You may want to try that.

Author Comment

ID: 40466493
I haven't tested ABCpdf as I hadn't time to build a demo myself, but reading the details I see that it contains a quite limited HTML parsing engine itself, and otherwise requires a browser (InternetExplorer or Firefox) to be installed on the server, using the browser to render the HTML page.
Perhaps not a very suitable and secure environment on production servers:-)

Author Closing Comment

ID: 40606684
Thanks for all the input.
I went for wkhtmltopdf. A bit quirky and font handling is so-so, but it works.

Expert Comment

by:Peter Jhon
ID: 41411844
I really found http://www.html-to-pdf.net/html-to-pdf-converter.aspx link as a useful link but i found problem in conversion when i document has tables and graphs. I created one document for http://www.dianaboluk.co.uk which was to use in conference but there was some mistake and my source document was missed and really i could not extract tables as that was designed.

Featured Post

On Demand Webinar: Networking for the Cloud Era

Did you know SD-WANs can improve network connectivity? Check out this webinar to learn how an SD-WAN simplified, one-click tool can help you migrate and manage data in the cloud.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

The article shows the basic steps of integrating an HTML theme template into an ASP.NET MVC project
This article was originally published on Monitis Blog, you can check it here . Today it’s fairly well known that high-performing websites and applications bring in more visitors, higher SEO, and ultimately more sales. By the same token, downtime…
The viewer will get a basic understanding of what section 508 compliance can entail, learn about skip navigation links, alt text, transcripts, and font size controls.
The viewer will learn the benefit of using external CSS files and the relationship between class and ID selectors. Create your external css file by saving it as style.css then set up your style tags: (CODE) Reference the nav tag and set your prop…
Suggested Courses

765 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question