Solved

Professional and secure conversion of html to pdf, running on backoffice server

Posted on 2014-11-17
12
255 Views
Last Modified: 2016-01-13
I'm creating large volumes of PDF form documents, by using one single PDF template including form fields and then, for each resulting document, creating a small FDF file with data content and finally running the tool "pdftk" to populate the template with content.
This works very well, when the template is "static" (all field sizes are determined in advance) and when all data is more or less text-only.
The PDF template can, of course, be very complex in layout, initially created as "print to pdf" from any application (like MSWord) and then "prepared" by putting form fields into it with Adobe Pro

However, now I would like to insert more complex data into the document, like graphs and images. Also, it might be interesting to add "free-flowing text" where the field length is not known in advance and the document layout should change to accomodate the text.

Thus, I'm looking at creating my template in HTML instead, and then run a HTML-to-PDF converter to create the resulting PDF document. Thus, no longer using any PDF "template file" at all, and no FDF files either.

I now write this, partly to ask if anyone has got any experience with these tools, and partly to report on my findings.
I will be testing a number of tools, and try to find one that is secure and efficient.

The system this will run in "production environment" (Windows 2008) and it is absolutely necessary that there is no hazzle with large/strange frameworks (like php, for instance) or complex third-part-dependencies.
Or, God forbid, any dependency on client software to be installed on the server, like MSOffice or browsers, etc. Not even speaking about adware :-)
Of course, it is quite OK if the tools cost money :-)


I'm planning to create html files from my application, and then run the tool stand-alone (command-line), creating the resulting PDF document.
It is, however, perfectly OK (and maybe a nice feature) to run the tool as a library as well, and create my own "standalone converter program" to perform the actual conversion. That way, it might be possible to handle limitations in the tool and solve any oddities that may be triggered by strange html input (just looking at MSWord HTML output, for instance, it's obvious that not all renderers can cope with the markup)

My initial list of tools is the following, below.
If anyone has got any recommendation or ideas, or warnings, any such input would be greatly appreciated.
If any of these tools are known to be "no good" (in your experience) for this use, it would be great to hear about it - and then not needing to test it...

http://www.html-to-pdf.net/html-to-pdf-converter.aspx
http://www.sautinsoft.com/products/pdf-metamorphosis/index.php
http://www.evopdf.com/
http://www.pdfreactor.com/product/doc_html/
http://wkhtmltopdf.org/
http://www.msweet.org/projects.php?Z1
http://www.princexml.com/
http://www.colorpilot.com/html2pdfaddon.html
http://www.verypdf.com/app/html-to-any/index.html
http://www.websupergoo.com/abcpdf-1.htm
http://www.winnovative-software.com/html-to-pdf-converter.aspx

Any tools that strike you as "really good" or "not working like this"?
Thanks :-)
0
Comment
Question by:stefanlennerbrant
  • 7
  • 4
12 Comments
 
LVL 14

Expert Comment

by:quizwedge
ID: 40447663
I've used web super goo ABCpdf in a .net environment and it works well enough. We used the library and from what I remember, there was a bit of setup, but it's been working well for years. I haven't used a recent version, this is from a project with few updates since 2007 or so.

In a linux environment, I've used wkhtmltopdf. It's a bit quirky - I've found in setting it up, it would just fail because the document went out of margins or the command line parameters were out of order. It's working extremely well now and was the only thing that worked well for us in our linux / PHP environment.

My impression of HTML to PDF is that there's going to be setup / pain in getting it up and running. The trick is to find something that is solid once you get it set up.
0
 

Author Comment

by:stefanlennerbrant
ID: 40449435
My first impressions are the following. Please don't hesitate to add any comments :-)

To me, "wkhtmltopdf" seems to be the best choice, with "winnovative" as possible alternative
Unfortunately, "websupergoo/abcpdf" couldn't be tested


http://www.verypdf.com/app/html-to-any/index.html
  Seems to build on wkhtmltopdf, identical results (regarding conversion to PDF)

http://wkhtmltopdf.org
  OpenSource, based on WebKit
  Almost acceptable results, lots of configuration settings, seems to be an active community/development
  Ugly kerning in fonts (probably handled with better settings for font selections)

http://www.winnovative-software.com/html-to-pdf-converter.aspx
  Library, tested with its demo-app GUI
  Good font use
  Wrong margins/output size  as default (probably possible to handle with settings)

http://www.html-to-pdf.net/html-to-pdf-converter.aspx
  Library, tested with its demo-app GUI
  Good font use
  Quite slow conversion
  Problems with "fatal errors" during conversion, worked once but not repeated testing

http://www.sautinsoft.com/products/pdf-metamorphosis/index.php
  Library, tested with its demo-app GUI
  Very slow in conversion
  Lots of layout errors in output, elements get completely wrong heights etc, not acceptable results

http://www.evopdf.com
  Seems to be completely identical to winnovative-software

http://www.pdfreactor.com/product/doc_html
  Not tested, needs JRE in the production environment

http://www.msweet.org/projects.php?Z1
  Was "the best" years ago, but has been sleeping since 2006 up to recently
  Seems to lack support for many html features
  Layout is without styling, not acceptable results

http://www.princexml.com
  Library, tested with its demo-app GUI
  Good fonts etc
  Problems with element heights

http://www.colorpilot.com/html2pdfaddon.html
  Installed as a COM/ActiveX component, seems to be aimed at ASP and similar environments
  Not tested due to COM base. Also no demo supplied

http://www.websupergoo.com/abcpdf-1.htm
  Not tested. No demo supplied
0
 
LVL 14

Expert Comment

by:quizwedge
ID: 40450189
I would say your thoughts on wkhtmltopdf are correct. Once you've setup your options, the one thing we found in our linux environment was that the process would stay in memory and we had to come around every few minutes and kill the process. I don't know that that is normal and I don't know if it would happen on Windows. Switching to a faster server with an SSD drive dramatically increased performance of building the PDFs.
0
 

Author Comment

by:stefanlennerbrant
ID: 40450571
Some more testing today on wkhtmltopdf identified font selection as a problem area.
I'm sure it can be fixed, but I need to find out what fonts should be specified in the html input, to make wkhtmltopdf really happy.

Obviously
  font-family:"Arial","sans-serif"
  font-family:"Courier New"
does not work very well at all. It's converted to slightly different fonts and very different font sizes.
(currently, Arial/Courier New are specified in the MSWord html output, as these two Windows fonts are commonly used in the original document)

Any idea on what sansserif and serif fonts would make life easier for wkhtmltopdf? Especially in a Windows environment?
0
 

Author Comment

by:stefanlennerbrant
ID: 40450665
I just found out that "Verdana" makes kerning work much better in this environment, than with "Arial"
However, font sizes are still to big in the wkhtmltopdf output

Comparing, with the following font use in Word and (checking text "properties" with Adobe Pro) in the corresponding Word "print to PDF writer (Adobe)", I get the following wkhtmltopdf output (checked text "properties" with Adobe Pro).

Word: Arial (bold) 8pt
Word-PDF: Arial,Bold 7.98pt
wk-PDF original: ArialBold 10.08pt (using style='font-family: "Arial"; font-size: 8.0pt')
wk-PDF modif: VerdanaBold 9.36pt (using style='font-family: "Verdana"; font-size: 8.0pt')

Word: Arial (normal) 10pt
Word-PDF: Arial 10.02pt
wk-PDF original: ArialNormal 11.52pt (using style='font-family: "Arial"; font-size: 10.0pt')
wk-PDF modif: VerdanaNormal 11.52pt (using style='font-family: "Verdana"; font-size: 10.0pt')

Word: Courier New (normal) 8pt
Word-PDF: CourierNew 7.98pt
wk-PDF original: CourierNewNormal 10.08pt (using style='font-family:"Courier New"; font-size: 8.0pt')

Hm, how to get smaller (correct) font size?
0
 
LVL 14

Accepted Solution

by:
quizwedge earned 500 total points
ID: 40450809
Ah, yes. That was one other downfall of wkhtmltopdf. Since our HTML was just for conversion to PDF, we scaled the fonts in CSS as needed to make it look good in PDF. If you need both formats, you may need to pass the page a URL parameter from wkhtmltopdf which alerts you to load a different CSS file with the new font sizes.

For fonts, we used some of Google's fonts and made sure they were installed on the server.
0
Why You Should Analyze Threat Actor TTPs

After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

 

Author Comment

by:stefanlennerbrant
ID: 40452415
I'm struggling to find better fonts (or font solutions) to get the same result with wkhtmltopdf as I get with "save as PDF" from MSWord.

The original (in MSWord) uses Arial 10pt just because this is considered the best setup.
Using Arial really makes kerning go berserk when rendering to PDF wit wkhtmltopdf. And changing to Verdana is not good either - even though the fonts are pretty, the size of Arial and Verdana differs (as well as "touch and feel") too much.
And compensating with "decrease with x% using css" seems really complicated, considering all the possible styles used in different documents.

In addition, all fonts get like 20% larger (regardless of font) when comparing MSWord and wkhtmltopdf output. All MSWord 10pt texts are rendered with about 10pt in the MSWord Adobe/PDF output, but about 12pt in the wkhtmltopdf output.
All other "sizes" (layout boxes etc) are properly rendered. It's "only" the fonts that are get large.

I'll continue investigating and searching the internet - but any input from wkhtmltopdf-experienced people is appreciated
0
 

Author Comment

by:stefanlennerbrant
ID: 40455357
I'm been struggling to find info on the internet, but to no real avail.

Testing shows that (in a wkhtmltopdf Windows environment)
- a css "Verdana" font is rendered in PDF as VerdanaRegular with font size conversions like:
    all css sizes 6.4 to 7.8pt converts to PDF 8.64pt
    all css sizes 7.9 to 8.6pt converts to PDF 9.36pt, etc etc for other sizes (about 20% too large)
- same for Arial, CourierNew, TimesNewRoman etc, but slightly different "ranges" and conversions
- same (too large) conversion happens when using css "px" sizes instead of "pt"
- using MSWord "print to PDF using Adobe drivers" use different fonts and much much better sizes (10pt results 10.02pt)
0
 
LVL 14

Expert Comment

by:quizwedge
ID: 40456737
Unfortunately, I'm guessing it's not an issue of finding the perfect font. wkhtmltopdf makes fonts bigger. While there's a lot out there on wkhtmltopdf, I've found that it's hard to find the specific answer you happen to be looking for at the time. If having it work exactly like your current setup is critical, you may want to look at a different solution.

I found that ABCpdf does offer a 30 day demo: http://www.websupergoo.com/download.htm#pd You may want to try that.
0
 

Author Comment

by:stefanlennerbrant
ID: 40466493
I haven't tested ABCpdf as I hadn't time to build a demo myself, but reading the details I see that it contains a quite limited HTML parsing engine itself, and otherwise requires a browser (InternetExplorer or Firefox) to be installed on the server, using the browser to render the HTML page.
Perhaps not a very suitable and secure environment on production servers:-)
0
 

Author Closing Comment

by:stefanlennerbrant
ID: 40606684
Thanks for all the input.
I went for wkhtmltopdf. A bit quirky and font handling is so-so, but it works.
0
 

Expert Comment

by:Peter Jhon
ID: 41411844
I really found http://www.html-to-pdf.net/html-to-pdf-converter.aspx link as a useful link but i found problem in conversion when i document has tables and graphs. I created one document for http://www.dianaboluk.co.uk which was to use in conference but there was some mistake and my source document was missed and really i could not extract tables as that was designed.
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Read about why website design really matters in today's demanding market.
This article demonstrates how to create a simple responsive confirmation dialog with Ok and Cancel buttons using HTML, CSS, jQuery and Promises
Any person in technology especially those working for big companies should at least know about the basics of web accessibility. Believe it or not there are even laws in place that require businesses to provide such means for the disabled and aging p…
The viewer will learn the basics of jQuery including how to code hide show and toggles. Reference your jQuery libraries: (CODE) Include your new external js/jQuery file: (CODE) Write your first lines of code to setup your site for jQuery…

760 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now