Solved

C# HTML to PDF

Posted on 2013-11-11
14
551 Views
Last Modified: 2013-12-12
Hello experts,
I have html code that need to be converted to PDF.
HTML contains table populated from recordset.
Please check the code. It is working, except for some reason in pdf it cuts out last row in the table.
 private void GenerateReport(string Html, HttpContext context)
        {
            MemoryStream stream = createPDF(Html);

            context.Response.ContentType = "application/pdf";
            context.Response.AddHeader("Content-Disposition", "attachment; filename=\"Report.pdf\"");
            context.Response.BinaryWrite(stream.ToArray());
       
        }

        private MemoryStream createPDF(string html)
        {
            MemoryStream msOutput = new MemoryStream();
            TextReader reader = new StringReader(html);

            Document document = new Document(PageSize.A4,10f,10f,10f,0f);
            
            PdfWriter writer = PdfWriter.GetInstance(document, msOutput);

            HTMLWorker worker = new HTMLWorker(document);
    
            document.Open();
            worker.StartDocument();

            worker.Parse(reader);
            worker.EndDocument();
            worker.Close();
            document.Close();

            return msOutput;
        }

Open in new window


If i run just html, all rows are displayed. The generated PDF document won't include last row.
Please, help.
Thank you.
0
Comment
Question by:kqureshi321
  • 7
  • 5
14 Comments
 
LVL 32

Expert Comment

by:Robberbaron (robr)
ID: 39674535
1. are you sure your html is terminated correctly.  many browsers fix bad html for you.

2. can you paste the last part of the HTML as displayed in 'view-source' ?

3. what library are you using ?  I had problems with iTextSharp and ended up using WebKit commandline on a separate thread to get excellent results and consistent.

     /// <summary>
        /// Runs WebKit PDF command line convertor
        /// http://code.google.com/p/wkhtmltopdf/
        /// </summary>
        /// <param name="sRawUrl"></param>
        /// <returns></returns>
        ///
0
 

Author Comment

by:kqureshi321
ID: 39675103
The HTML page is not displayed. When someone clicks on the link
<a id="Summary" title="List" href="/handlers/file.ashx" target="_blank">Click</a>

Open in new window

it doesn't generates the HTML page, it generates html string and then converts it into pdf (see the code above). How can i check the source of generated html?
0
 

Author Comment

by:kqureshi321
ID: 39675155
We do use iTextSharp.
0
 
LVL 32

Expert Comment

by:Robberbaron (robr)
ID: 39676053
I do exactly the same process.
Create html as string and then convert to html.  You could write the string to console or text file to test.
As I said, I tried itextsharp but gave up as it couldn't handle my formatted html with css.
So I write the string to a temp file and then send that file to webkit for pdf output. Works well but needs the external webkit files to be available.
Ok for me as my app is intranet only.
0
 

Author Comment

by:kqureshi321
ID: 39692978
We actually convert html string into PDF and this is on our website. Also, our website is based on iParts so we will have to stay with itextsharp at least for now.
However, your idea to write the string to console or text file might work. It can show us what is wrong.
Can you please show me an example of how to do this ? I would really appreciate it.
Thanks!
0
 
LVL 32

Expert Comment

by:Robberbaron (robr)
ID: 39694785
this includes my calls to WebKit on a separate thread but it writes the incoming HTMLCode to a temporary file.  You can use whatever path/filename  you want.

                StreamWriter sWriter = File.CreateText(myPathFile);
                sWriter.WriteLine(HTMLCode);
                sWriter.Close();


        #region WebKit
        /// <summary>
        /// Runs WebKit PDF command line convertor
        /// http://code.google.com/p/wkhtmltopdf/
        /// </summary>
        /// <param name="sRawUrl"></param>
        /// <returns></returns>
        /// 
        private string _WebKitFiles = "DocMan_Files";
        private void ConvertHTMLToPDF_Wk(string HTMLCode)
        {
            string sFileName = ""; //GetNewName();
            string sPage = sFileName + ".html";
            //docman_files

            if (HTMLCode == "")
            {
                HTMLCode = "<HTML><HEAD><title>Blank data</title></head><body><h1>Blank document</h1></body></html>";
            }
            string GlobOptions = "-orientation Portrait -page-size A4 -title " + _DocInfo_title;
            StringWriter sw = new StringWriter();

            //Server.Execute(sUrlVirtual, sw);
            using (TemporaryFile htmlfile = new TemporaryFile(false, "HTML"))
            {
                StreamWriter sWriter = File.CreateText(htmlfile.Path);
                sWriter.WriteLine(HTMLCode);
                sWriter.Close();


                _threadArgs = RG_Utils.StringManip.Quoted(htmlfile.Path) + " " + RG_Utils.StringManip.Quoted(_PDFName);
                _threadWorkingDir = RG_Utils.StringManip.Quoted(System.AppDomain.CurrentDomain.BaseDirectory + WbKitFileLocation);
                _threadApp = RG_Utils.StringManip.Quoted(System.AppDomain.CurrentDomain.BaseDirectory + WbKitFileLocation + @"\" + "wkhtmltopdf.exe");

                System.Threading.ThreadStart job = new System.Threading.ThreadStart(ThreadStart);
                System.Threading.Thread thread = new System.Threading.Thread(job);
                thread.Start();

                // Wait for NewThread to terminate.
                thread.Join();
            }

        }

Open in new window

0
DevOps Toolchain Recommendations

Read this Gartner Research Note and discover how your IT organization can automate and optimize DevOps processes using a toolchain architecture.

 

Author Comment

by:kqureshi321
ID: 39696466
Great. I'll try and will let you know how it goes.
Thanks!
0
 

Author Comment

by:kqureshi321
ID: 39708863
Hi,
We tested the HTML and generated HTML contains all records. So , probably the issue is when it converting into PDF.
Please, any ideas? The code is above.
0
 
LVL 32

Expert Comment

by:Robberbaron (robr)
ID: 39710557
as before, look very carefully at the end of the HTML .    can you post the last 2 rows ?
are all rows properly terminated ?

1. try pasting your html to  http://validator.w3.org/#validate_by_input

2. try a small set of your data through iTextSharp.  as i said, I had problems with it parsing HTML.
0
 

Author Comment

by:kqureshi321
ID: 39711736
Here is a generated HTML string. As you can see there are 3 rows with 5 records.
But when this string converted into PDF, the last row  on the PDF document is not there .
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<body style="font-family: Arial, Helvetica, sans-serif; font-size: 9px; line-height: 1.1em;" bgcolor="FFFFFF#" link="CC3300#" vlink="333300"  leftmargin="0" topmargin="5" marginwidth="0" marginheight="0" alink="333300">
<p align="center" style="color:003366;font-size:16px;font-weight:bold;font-family: Arial, Helvetica, sans-serif;">
NEW YORK CITY<br />
<em>Committee List</em></p>
<br />
<p align="center" style="color:003366;font-size:12px;font-weight:bold;font-family: Arial, Helvetica, sans-serif;">AIDS Committee (5)</p>
<div style="clear:both;"></div><br />
<div>
<table>
<tr>
	<td valign="top">
	<b><em>Chair</em></b>- <b><em>10-28-2011</em></b><br />
	Ly Neuer, Esq.<br />
	Safe Law Proj<br />
	150 Court St<br />Rm 1600<br />Brooklyn, NY &nbsp; 10001<br />
	Phone: (555) 555-55555<br />Fax: (555) 555-55555<br />
	Email: <a style="color:blue;" href="mailto:ler@san.org">ler@san.org</a><br />
	</td>
	<td valign="top">
	<b><em>Member</em></b>- <b><em>05-01-2012</em></b><br />
	Alt Ren, Esq.<br />
	860 E 63rd St<br />
	New York, NY &nbsp; 10011<br />
	Phone: (555) 555-55555<br />
	Fax: (555) 555-55555<br />
	Email: <a style="color:blue;" href="mailto:albertrchen@gmail.com">an@gmail.com</a><br />
	</td>
</tr>
<tr>
	<td valign="top">
	<b><em>Member</em></b>- <b><em>05-01-2012</em></b><br />
	Doy Chr, Esq.<br />The Bronx Defenders<br />
	1760 Ave<br />Bronx, NY &nbsp; 10651<br />
	Phone: (555) 555-55555<br />Fax: (555) 555-55555<br />
	Email: <a style="color:blue;" href="mailto:chy@yahoo.com">chy@yahoo.com</a><br />
	</td>
	<td valign="top">
	<b><em>Member</em></b>- <b><em>06-25-2013</em></b><br />
	Last Join<br />42 east 46th Street<br />New York, NY &nbsp; 10011<br />
	United States<br />Email: <a style="color:blue;" href="mailto:as@as.com">as@as.com</a><br />
	</td>
</tr>
<tr>
	<td valign="top"><b><em>Member</em></b>- <b><em>03-06-2013</em></b><br />
	Chott Ho, Esq.<br />1180 Heat St<br />New York, NY &nbsp; 10011<br />
	United States<br />Phone: (555) 555-55555<br />
	Fax: (555) 555-55555<br />
	Email: <a style="color:blue;" href="mailto:sgl@nyc.org">sgl@nyc.org</a><br />
	</td>
</tr>
</table>
</div>
</body>
</html>

Open in new window


Can't find anything wrong. Am i missing something?
Thank you .
0
 
LVL 32

Accepted Solution

by:
Robberbaron (robr) earned 500 total points
ID: 39713123
I ran it through the validator.

lots of warnings but the one that sticks out is that the last row of the table only has one cell, yet all others have 2,  And there is no colspan specified.

ItextSharp is probably very picky about this.


<tr>
	<td valign="top"><b><em>Member</em></b>- <b><em>03-06-2013</em></b><br />
	Chott Ho, Esq.<br />1180 Heat St<br />New York, NY &nbsp; 10011<br />
	United States<br />Phone: (555) 555-55555<br />
	Fax: (555) 555-55555<br />
	Email: <a style="color:blue;" href="mailto:sgl@nyc.org">sgl@nyc.org</a><br />
	</td>
       <td> ************** Test **********</td>
</tr>
</table>

Open in new window

0
 

Author Closing Comment

by:kqureshi321
ID: 39714228
Great! Thank you so much!
When generating an HTML I have a variable that keeps track how many columns.
So, i added the following in the end and everything is working.
 if (columnCounter == 2)
            {
                html += "</tr></table></div></body></html>";
            }
            else
            {
                html += "<td></td></tr></table></div></body></html>";
            }
Thank you!
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Exception Handling is in the core of any application that is able to dignify its name. In this article, I'll guide you through the process of writing a DRY (Don't Repeat Yourself) Exception Handling mechanism, using Aspect Oriented Programming.
Performance in games development is paramount: every microsecond counts to be able to do everything in less than 33ms (aiming at 16ms). C# foreach statement is one of the worst performance killers, and here I explain why.
Viewers will learn how to properly install Eclipse with the necessary JDK, and will take a look at an introductory Java program. Download Eclipse installation zip file: Extract files from zip file: Download and install JDK 8: Open Eclipse and …
The viewer will learn the basics of jQuery, including how to invoke it on a web page. Reference your jQuery libraries: (CODE) Include your new external js/jQuery file: (CODE) Write your first lines of code to setup your site for jQuery.: (CODE)

863 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now