• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 661
  • Last Modified:

C# HTML to PDF

Hello experts,
I have html code that need to be converted to PDF.
HTML contains table populated from recordset.
Please check the code. It is working, except for some reason in pdf it cuts out last row in the table.
 private void GenerateReport(string Html, HttpContext context)
        {
            MemoryStream stream = createPDF(Html);

            context.Response.ContentType = "application/pdf";
            context.Response.AddHeader("Content-Disposition", "attachment; filename=\"Report.pdf\"");
            context.Response.BinaryWrite(stream.ToArray());
       
        }

        private MemoryStream createPDF(string html)
        {
            MemoryStream msOutput = new MemoryStream();
            TextReader reader = new StringReader(html);

            Document document = new Document(PageSize.A4,10f,10f,10f,0f);
            
            PdfWriter writer = PdfWriter.GetInstance(document, msOutput);

            HTMLWorker worker = new HTMLWorker(document);
    
            document.Open();
            worker.StartDocument();

            worker.Parse(reader);
            worker.EndDocument();
            worker.Close();
            document.Close();

            return msOutput;
        }

Open in new window


If i run just html, all rows are displayed. The generated PDF document won't include last row.
Please, help.
Thank you.
0
Galina Besselyanova
Asked:
Galina Besselyanova
  • 7
  • 5
1 Solution
 
Robberbaron (robr)Commented:
1. are you sure your html is terminated correctly.  many browsers fix bad html for you.

2. can you paste the last part of the HTML as displayed in 'view-source' ?

3. what library are you using ?  I had problems with iTextSharp and ended up using WebKit commandline on a separate thread to get excellent results and consistent.

     /// <summary>
        /// Runs WebKit PDF command line convertor
        /// http://code.google.com/p/wkhtmltopdf/
        /// </summary>
        /// <param name="sRawUrl"></param>
        /// <returns></returns>
        ///
0
 
Galina BesselyanovaSenior Software Developer/EngineerAuthor Commented:
The HTML page is not displayed. When someone clicks on the link
<a id="Summary" title="List" href="/handlers/file.ashx" target="_blank">Click</a>

Open in new window

it doesn't generates the HTML page, it generates html string and then converts it into pdf (see the code above). How can i check the source of generated html?
0
 
Galina BesselyanovaSenior Software Developer/EngineerAuthor Commented:
We do use iTextSharp.
0
Upgrade your Question Security!

Your question, your audience. Choose who sees your identity—and your question—with question security.

 
Robberbaron (robr)Commented:
I do exactly the same process.
Create html as string and then convert to html.  You could write the string to console or text file to test.
As I said, I tried itextsharp but gave up as it couldn't handle my formatted html with css.
So I write the string to a temp file and then send that file to webkit for pdf output. Works well but needs the external webkit files to be available.
Ok for me as my app is intranet only.
0
 
Galina BesselyanovaSenior Software Developer/EngineerAuthor Commented:
We actually convert html string into PDF and this is on our website. Also, our website is based on iParts so we will have to stay with itextsharp at least for now.
However, your idea to write the string to console or text file might work. It can show us what is wrong.
Can you please show me an example of how to do this ? I would really appreciate it.
Thanks!
0
 
Robberbaron (robr)Commented:
this includes my calls to WebKit on a separate thread but it writes the incoming HTMLCode to a temporary file.  You can use whatever path/filename  you want.

                StreamWriter sWriter = File.CreateText(myPathFile);
                sWriter.WriteLine(HTMLCode);
                sWriter.Close();


        #region WebKit
        /// <summary>
        /// Runs WebKit PDF command line convertor
        /// http://code.google.com/p/wkhtmltopdf/
        /// </summary>
        /// <param name="sRawUrl"></param>
        /// <returns></returns>
        /// 
        private string _WebKitFiles = "DocMan_Files";
        private void ConvertHTMLToPDF_Wk(string HTMLCode)
        {
            string sFileName = ""; //GetNewName();
            string sPage = sFileName + ".html";
            //docman_files

            if (HTMLCode == "")
            {
                HTMLCode = "<HTML><HEAD><title>Blank data</title></head><body><h1>Blank document</h1></body></html>";
            }
            string GlobOptions = "-orientation Portrait -page-size A4 -title " + _DocInfo_title;
            StringWriter sw = new StringWriter();

            //Server.Execute(sUrlVirtual, sw);
            using (TemporaryFile htmlfile = new TemporaryFile(false, "HTML"))
            {
                StreamWriter sWriter = File.CreateText(htmlfile.Path);
                sWriter.WriteLine(HTMLCode);
                sWriter.Close();


                _threadArgs = RG_Utils.StringManip.Quoted(htmlfile.Path) + " " + RG_Utils.StringManip.Quoted(_PDFName);
                _threadWorkingDir = RG_Utils.StringManip.Quoted(System.AppDomain.CurrentDomain.BaseDirectory + WbKitFileLocation);
                _threadApp = RG_Utils.StringManip.Quoted(System.AppDomain.CurrentDomain.BaseDirectory + WbKitFileLocation + @"\" + "wkhtmltopdf.exe");

                System.Threading.ThreadStart job = new System.Threading.ThreadStart(ThreadStart);
                System.Threading.Thread thread = new System.Threading.Thread(job);
                thread.Start();

                // Wait for NewThread to terminate.
                thread.Join();
            }

        }

Open in new window

0
 
Galina BesselyanovaSenior Software Developer/EngineerAuthor Commented:
Great. I'll try and will let you know how it goes.
Thanks!
0
 
Galina BesselyanovaSenior Software Developer/EngineerAuthor Commented:
Hi,
We tested the HTML and generated HTML contains all records. So , probably the issue is when it converting into PDF.
Please, any ideas? The code is above.
0
 
Robberbaron (robr)Commented:
as before, look very carefully at the end of the HTML .    can you post the last 2 rows ?
are all rows properly terminated ?

1. try pasting your html to  http://validator.w3.org/#validate_by_input

2. try a small set of your data through iTextSharp.  as i said, I had problems with it parsing HTML.
0
 
Galina BesselyanovaSenior Software Developer/EngineerAuthor Commented:
Here is a generated HTML string. As you can see there are 3 rows with 5 records.
But when this string converted into PDF, the last row  on the PDF document is not there .
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<body style="font-family: Arial, Helvetica, sans-serif; font-size: 9px; line-height: 1.1em;" bgcolor="FFFFFF#" link="CC3300#" vlink="333300"  leftmargin="0" topmargin="5" marginwidth="0" marginheight="0" alink="333300">
<p align="center" style="color:003366;font-size:16px;font-weight:bold;font-family: Arial, Helvetica, sans-serif;">
NEW YORK CITY<br />
<em>Committee List</em></p>
<br />
<p align="center" style="color:003366;font-size:12px;font-weight:bold;font-family: Arial, Helvetica, sans-serif;">AIDS Committee (5)</p>
<div style="clear:both;"></div><br />
<div>
<table>
<tr>
	<td valign="top">
	<b><em>Chair</em></b>- <b><em>10-28-2011</em></b><br />
	Ly Neuer, Esq.<br />
	Safe Law Proj<br />
	150 Court St<br />Rm 1600<br />Brooklyn, NY &nbsp; 10001<br />
	Phone: (555) 555-55555<br />Fax: (555) 555-55555<br />
	Email: <a style="color:blue;" href="mailto:ler@san.org">ler@san.org</a><br />
	</td>
	<td valign="top">
	<b><em>Member</em></b>- <b><em>05-01-2012</em></b><br />
	Alt Ren, Esq.<br />
	860 E 63rd St<br />
	New York, NY &nbsp; 10011<br />
	Phone: (555) 555-55555<br />
	Fax: (555) 555-55555<br />
	Email: <a style="color:blue;" href="mailto:albertrchen@gmail.com">an@gmail.com</a><br />
	</td>
</tr>
<tr>
	<td valign="top">
	<b><em>Member</em></b>- <b><em>05-01-2012</em></b><br />
	Doy Chr, Esq.<br />The Bronx Defenders<br />
	1760 Ave<br />Bronx, NY &nbsp; 10651<br />
	Phone: (555) 555-55555<br />Fax: (555) 555-55555<br />
	Email: <a style="color:blue;" href="mailto:chy@yahoo.com">chy@yahoo.com</a><br />
	</td>
	<td valign="top">
	<b><em>Member</em></b>- <b><em>06-25-2013</em></b><br />
	Last Join<br />42 east 46th Street<br />New York, NY &nbsp; 10011<br />
	United States<br />Email: <a style="color:blue;" href="mailto:as@as.com">as@as.com</a><br />
	</td>
</tr>
<tr>
	<td valign="top"><b><em>Member</em></b>- <b><em>03-06-2013</em></b><br />
	Chott Ho, Esq.<br />1180 Heat St<br />New York, NY &nbsp; 10011<br />
	United States<br />Phone: (555) 555-55555<br />
	Fax: (555) 555-55555<br />
	Email: <a style="color:blue;" href="mailto:sgl@nyc.org">sgl@nyc.org</a><br />
	</td>
</tr>
</table>
</div>
</body>
</html>

Open in new window


Can't find anything wrong. Am i missing something?
Thank you .
0
 
Robberbaron (robr)Commented:
I ran it through the validator.

lots of warnings but the one that sticks out is that the last row of the table only has one cell, yet all others have 2,  And there is no colspan specified.

ItextSharp is probably very picky about this.


<tr>
	<td valign="top"><b><em>Member</em></b>- <b><em>03-06-2013</em></b><br />
	Chott Ho, Esq.<br />1180 Heat St<br />New York, NY &nbsp; 10011<br />
	United States<br />Phone: (555) 555-55555<br />
	Fax: (555) 555-55555<br />
	Email: <a style="color:blue;" href="mailto:sgl@nyc.org">sgl@nyc.org</a><br />
	</td>
       <td> ************** Test **********</td>
</tr>
</table>

Open in new window

0
 
Galina BesselyanovaSenior Software Developer/EngineerAuthor Commented:
Great! Thank you so much!
When generating an HTML I have a variable that keeps track how many columns.
So, i added the following in the end and everything is working.
 if (columnCounter == 2)
            {
                html += "</tr></table></div></body></html>";
            }
            else
            {
                html += "<td></td></tr></table></div></body></html>";
            }
Thank you!
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Cloud Class® Course: Certified Penetration Testing

This CPTE Certified Penetration Testing Engineer course covers everything you need to know about becoming a Certified Penetration Testing Engineer. Career Path: Professional roles include Ethical Hackers, Security Consultants, System Administrators, and Chief Security Officers.

  • 7
  • 5
Tackle projects and never again get stuck behind a technical roadblock.
Join Now