Solved

C# HTML to PDF

Posted on 2013-11-11
14
588 Views
Last Modified: 2013-12-12
Hello experts,
I have html code that need to be converted to PDF.
HTML contains table populated from recordset.
Please check the code. It is working, except for some reason in pdf it cuts out last row in the table.
 private void GenerateReport(string Html, HttpContext context)
        {
            MemoryStream stream = createPDF(Html);

            context.Response.ContentType = "application/pdf";
            context.Response.AddHeader("Content-Disposition", "attachment; filename=\"Report.pdf\"");
            context.Response.BinaryWrite(stream.ToArray());
       
        }

        private MemoryStream createPDF(string html)
        {
            MemoryStream msOutput = new MemoryStream();
            TextReader reader = new StringReader(html);

            Document document = new Document(PageSize.A4,10f,10f,10f,0f);
            
            PdfWriter writer = PdfWriter.GetInstance(document, msOutput);

            HTMLWorker worker = new HTMLWorker(document);
    
            document.Open();
            worker.StartDocument();

            worker.Parse(reader);
            worker.EndDocument();
            worker.Close();
            document.Close();

            return msOutput;
        }

Open in new window


If i run just html, all rows are displayed. The generated PDF document won't include last row.
Please, help.
Thank you.
0
Comment
Question by:kqureshi321
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 7
  • 5
14 Comments
 
LVL 32

Expert Comment

by:Robberbaron (robr)
ID: 39674535
1. are you sure your html is terminated correctly.  many browsers fix bad html for you.

2. can you paste the last part of the HTML as displayed in 'view-source' ?

3. what library are you using ?  I had problems with iTextSharp and ended up using WebKit commandline on a separate thread to get excellent results and consistent.

     /// <summary>
        /// Runs WebKit PDF command line convertor
        /// http://code.google.com/p/wkhtmltopdf/
        /// </summary>
        /// <param name="sRawUrl"></param>
        /// <returns></returns>
        ///
0
 

Author Comment

by:kqureshi321
ID: 39675103
The HTML page is not displayed. When someone clicks on the link
<a id="Summary" title="List" href="/handlers/file.ashx" target="_blank">Click</a>

Open in new window

it doesn't generates the HTML page, it generates html string and then converts it into pdf (see the code above). How can i check the source of generated html?
0
 

Author Comment

by:kqureshi321
ID: 39675155
We do use iTextSharp.
0
More Than Just A Video Library

Train for your certification. Learn the latest DevOps tools. Grow your skillset to do better work.

At Linux Academy, we release new training modules every week so you'll always be up to date on the latest tech.

 
LVL 32

Expert Comment

by:Robberbaron (robr)
ID: 39676053
I do exactly the same process.
Create html as string and then convert to html.  You could write the string to console or text file to test.
As I said, I tried itextsharp but gave up as it couldn't handle my formatted html with css.
So I write the string to a temp file and then send that file to webkit for pdf output. Works well but needs the external webkit files to be available.
Ok for me as my app is intranet only.
0
 

Author Comment

by:kqureshi321
ID: 39692978
We actually convert html string into PDF and this is on our website. Also, our website is based on iParts so we will have to stay with itextsharp at least for now.
However, your idea to write the string to console or text file might work. It can show us what is wrong.
Can you please show me an example of how to do this ? I would really appreciate it.
Thanks!
0
 
LVL 32

Expert Comment

by:Robberbaron (robr)
ID: 39694785
this includes my calls to WebKit on a separate thread but it writes the incoming HTMLCode to a temporary file.  You can use whatever path/filename  you want.

                StreamWriter sWriter = File.CreateText(myPathFile);
                sWriter.WriteLine(HTMLCode);
                sWriter.Close();


        #region WebKit
        /// <summary>
        /// Runs WebKit PDF command line convertor
        /// http://code.google.com/p/wkhtmltopdf/
        /// </summary>
        /// <param name="sRawUrl"></param>
        /// <returns></returns>
        /// 
        private string _WebKitFiles = "DocMan_Files";
        private void ConvertHTMLToPDF_Wk(string HTMLCode)
        {
            string sFileName = ""; //GetNewName();
            string sPage = sFileName + ".html";
            //docman_files

            if (HTMLCode == "")
            {
                HTMLCode = "<HTML><HEAD><title>Blank data</title></head><body><h1>Blank document</h1></body></html>";
            }
            string GlobOptions = "-orientation Portrait -page-size A4 -title " + _DocInfo_title;
            StringWriter sw = new StringWriter();

            //Server.Execute(sUrlVirtual, sw);
            using (TemporaryFile htmlfile = new TemporaryFile(false, "HTML"))
            {
                StreamWriter sWriter = File.CreateText(htmlfile.Path);
                sWriter.WriteLine(HTMLCode);
                sWriter.Close();


                _threadArgs = RG_Utils.StringManip.Quoted(htmlfile.Path) + " " + RG_Utils.StringManip.Quoted(_PDFName);
                _threadWorkingDir = RG_Utils.StringManip.Quoted(System.AppDomain.CurrentDomain.BaseDirectory + WbKitFileLocation);
                _threadApp = RG_Utils.StringManip.Quoted(System.AppDomain.CurrentDomain.BaseDirectory + WbKitFileLocation + @"\" + "wkhtmltopdf.exe");

                System.Threading.ThreadStart job = new System.Threading.ThreadStart(ThreadStart);
                System.Threading.Thread thread = new System.Threading.Thread(job);
                thread.Start();

                // Wait for NewThread to terminate.
                thread.Join();
            }

        }

Open in new window

0
 

Author Comment

by:kqureshi321
ID: 39696466
Great. I'll try and will let you know how it goes.
Thanks!
0
 

Author Comment

by:kqureshi321
ID: 39708863
Hi,
We tested the HTML and generated HTML contains all records. So , probably the issue is when it converting into PDF.
Please, any ideas? The code is above.
0
 
LVL 32

Expert Comment

by:Robberbaron (robr)
ID: 39710557
as before, look very carefully at the end of the HTML .    can you post the last 2 rows ?
are all rows properly terminated ?

1. try pasting your html to  http://validator.w3.org/#validate_by_input

2. try a small set of your data through iTextSharp.  as i said, I had problems with it parsing HTML.
0
 

Author Comment

by:kqureshi321
ID: 39711736
Here is a generated HTML string. As you can see there are 3 rows with 5 records.
But when this string converted into PDF, the last row  on the PDF document is not there .
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<body style="font-family: Arial, Helvetica, sans-serif; font-size: 9px; line-height: 1.1em;" bgcolor="FFFFFF#" link="CC3300#" vlink="333300"  leftmargin="0" topmargin="5" marginwidth="0" marginheight="0" alink="333300">
<p align="center" style="color:003366;font-size:16px;font-weight:bold;font-family: Arial, Helvetica, sans-serif;">
NEW YORK CITY<br />
<em>Committee List</em></p>
<br />
<p align="center" style="color:003366;font-size:12px;font-weight:bold;font-family: Arial, Helvetica, sans-serif;">AIDS Committee (5)</p>
<div style="clear:both;"></div><br />
<div>
<table>
<tr>
	<td valign="top">
	<b><em>Chair</em></b>- <b><em>10-28-2011</em></b><br />
	Ly Neuer, Esq.<br />
	Safe Law Proj<br />
	150 Court St<br />Rm 1600<br />Brooklyn, NY &nbsp; 10001<br />
	Phone: (555) 555-55555<br />Fax: (555) 555-55555<br />
	Email: <a style="color:blue;" href="mailto:ler@san.org">ler@san.org</a><br />
	</td>
	<td valign="top">
	<b><em>Member</em></b>- <b><em>05-01-2012</em></b><br />
	Alt Ren, Esq.<br />
	860 E 63rd St<br />
	New York, NY &nbsp; 10011<br />
	Phone: (555) 555-55555<br />
	Fax: (555) 555-55555<br />
	Email: <a style="color:blue;" href="mailto:albertrchen@gmail.com">an@gmail.com</a><br />
	</td>
</tr>
<tr>
	<td valign="top">
	<b><em>Member</em></b>- <b><em>05-01-2012</em></b><br />
	Doy Chr, Esq.<br />The Bronx Defenders<br />
	1760 Ave<br />Bronx, NY &nbsp; 10651<br />
	Phone: (555) 555-55555<br />Fax: (555) 555-55555<br />
	Email: <a style="color:blue;" href="mailto:chy@yahoo.com">chy@yahoo.com</a><br />
	</td>
	<td valign="top">
	<b><em>Member</em></b>- <b><em>06-25-2013</em></b><br />
	Last Join<br />42 east 46th Street<br />New York, NY &nbsp; 10011<br />
	United States<br />Email: <a style="color:blue;" href="mailto:as@as.com">as@as.com</a><br />
	</td>
</tr>
<tr>
	<td valign="top"><b><em>Member</em></b>- <b><em>03-06-2013</em></b><br />
	Chott Ho, Esq.<br />1180 Heat St<br />New York, NY &nbsp; 10011<br />
	United States<br />Phone: (555) 555-55555<br />
	Fax: (555) 555-55555<br />
	Email: <a style="color:blue;" href="mailto:sgl@nyc.org">sgl@nyc.org</a><br />
	</td>
</tr>
</table>
</div>
</body>
</html>

Open in new window


Can't find anything wrong. Am i missing something?
Thank you .
0
 
LVL 32

Accepted Solution

by:
Robberbaron (robr) earned 500 total points
ID: 39713123
I ran it through the validator.

lots of warnings but the one that sticks out is that the last row of the table only has one cell, yet all others have 2,  And there is no colspan specified.

ItextSharp is probably very picky about this.


<tr>
	<td valign="top"><b><em>Member</em></b>- <b><em>03-06-2013</em></b><br />
	Chott Ho, Esq.<br />1180 Heat St<br />New York, NY &nbsp; 10011<br />
	United States<br />Phone: (555) 555-55555<br />
	Fax: (555) 555-55555<br />
	Email: <a style="color:blue;" href="mailto:sgl@nyc.org">sgl@nyc.org</a><br />
	</td>
       <td> ************** Test **********</td>
</tr>
</table>

Open in new window

0
 

Author Closing Comment

by:kqureshi321
ID: 39714228
Great! Thank you so much!
When generating an HTML I have a variable that keeps track how many columns.
So, i added the following in the end and everything is working.
 if (columnCounter == 2)
            {
                html += "</tr></table></div></body></html>";
            }
            else
            {
                html += "<td></td></tr></table></div></body></html>";
            }
Thank you!
0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

The article shows the basic steps of integrating an HTML theme template into an ASP.NET MVC project
A quick Powershell script I wrote to find old program installations and check versions of a specific file across the network.
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …
Video by: Mark
This lesson goes over how to construct ordered and unordered lists and how to create hyperlinks.

690 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question