Link to home
Start Free TrialLog in
Avatar of andy gehox
andy gehox

asked on

Parsing HTML in C#

Hello,

I am trying to parse inbox of an disposable email.

With GET method i receive HTML which i need to parse:

  HttpWebRequest Request = (HttpWebRequest)WebRequest.Create("https://10minutemail.net/");
  
//some code

           HttpWebResponse _Response = (HttpWebResponse)Request.GetResponse();
            Stream dataStream = _Response.GetResponseStream();
            StreamReader reader = new StreamReader(dataStream);
            string responseFromServer = reader.ReadToEnd();

Open in new window


So in variable responseFromServer  i have html which needs to be parsed.

Html that i want to parse looks like this:

<div id="mailbox" class="div-w div-m-0">
	<h2 class="h-line">InBox</h2>
	<div id="mailbox-table">
		<table id="maillist">
			<tr>
				<th>
					From
				</th>
				<th>
					Subject
				</th>
				<th>
					Date
				</th>
				</tr>
				<tr onclick="location='readmail.html?mid=welcome'" style="font-weight: bold;">
					<td>
						no-reply@10minutemail.net
					</td>
					<td>
						<a href="readmail.html?mid=welcome">
							Hi, Welcome to 10 Minute Mail
						</a>
					</td>
					<td>
						<span title="2016-02-15 05:44:52 UTC">
							just now
						</span>
					</td>
			</tr>
		</table>
	</div>
</div>

Open in new window


Parsing into class:
    class ClassMailBox
    {
         string From { get; set; }          //actual e-mail address from sender
         string LinkToMail { get; set; }    //content of onclick
        
    }

Open in new window


Can someone provide me with best solution. I need a code snipped that i can easily maintain and possible upgrade later on.

Thank you.
Avatar of Karrtik Iyer
Karrtik Iyer
Flag of India image

Hi Andy,

I have found HTMLAgilityPack to be useful to do this job. (HTML Agility on Code Plex)
The code samples can be found below.
HTML Agility Code Samples
Avatar of andy gehox
andy gehox

ASKER

Hello,

thank you for replying.

First problem i have is with name spae HtmlDocument

            I checked documentation and its available in System.Windows.Browser and System.Windows.Forms ... i am doing this in console application
            is this a problem?
            
            Second:
            From example posted in you link :
            
 foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href"])

Open in new window

           How do i navigate to my node?
            
            
            Thank you!
That shouldn't be an issue, have you added reference to
 System.Windows.Forms.dll.
Ok and what about second question

   How do i navigate to my node?

 foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href"])

Open in new window

SOLUTION
Avatar of Karrtik Iyer
Karrtik Iyer
Flag of India image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Avatar of Fernando Soto
Fernando Soto
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Hell Fernando Soto,

I like the solution you posted unfortunatelly i am getting error when parsing html:

An unhandled exception of type 'System.Xml.XmlException' occurred in System.Xml.dll

Additional information: 'type' is an unexpected token. The expected token is '='. Line 58, position 15.

Open in new window


I think its a problem when parsing some special characters?
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Hi Andy,

I just tried Fernando's solution and that works fine for me and produces below output.
BTW, did you try with the HTML Agility Pack solution that I suggested?
  public class ClassMailBox
    {
        public string From { get; set; }          //actual e-mail address from sender
        public string LinkToMail { get; set; }    //content of onclick

    }

Open in new window

  string responseFromServer = File.ReadAllText(@"E:\tsip\codeblocks_examples\expertsexchange\TotalEESolution\OrderXMLTest\ResponsefromServer.txt");
            XElement html = XElement.Parse(responseFromServer);

            ClassMailBox cmb = (from node in html.Descendants("tr")
                                where node.Elements().ElementAt(0).Name != "th"
                                select new ClassMailBox()
                                {
                                    From = node.Elements().ElementAt(0).Value.Trim(),
                                    LinkToMail = node.Elements().ElementAt(1).Element("a").Attribute("href").Value
                                }).FirstOrDefault();

            Console.WriteLine(cmb.From);
            Console.WriteLine(cmb.LinkToMail);

Open in new window

User generated image
As allways it was problem at my end. Everything is working as it should.

Thank you for all you help and time invested.

Best regards!