andy gehox
asked on
Parsing HTML in C#
Hello,
I am trying to parse inbox of an disposable email.
With GET method i receive HTML which i need to parse:
So in variable responseFromServer i have html which needs to be parsed.
Html that i want to parse looks like this:
Parsing into class:
Can someone provide me with best solution. I need a code snipped that i can easily maintain and possible upgrade later on.
Thank you.
I am trying to parse inbox of an disposable email.
With GET method i receive HTML which i need to parse:
HttpWebRequest Request = (HttpWebRequest)WebRequest.Create("https://10minutemail.net/");
//some code
HttpWebResponse _Response = (HttpWebResponse)Request.GetResponse();
Stream dataStream = _Response.GetResponseStream();
StreamReader reader = new StreamReader(dataStream);
string responseFromServer = reader.ReadToEnd();
So in variable responseFromServer i have html which needs to be parsed.
Html that i want to parse looks like this:
<div id="mailbox" class="div-w div-m-0">
<h2 class="h-line">InBox</h2>
<div id="mailbox-table">
<table id="maillist">
<tr>
<th>
From
</th>
<th>
Subject
</th>
<th>
Date
</th>
</tr>
<tr onclick="location='readmail.html?mid=welcome'" style="font-weight: bold;">
<td>
no-reply@10minutemail.net
</td>
<td>
<a href="readmail.html?mid=welcome">
Hi, Welcome to 10 Minute Mail
</a>
</td>
<td>
<span title="2016-02-15 05:44:52 UTC">
just now
</span>
</td>
</tr>
</table>
</div>
</div>
Parsing into class:
class ClassMailBox
{
string From { get; set; } //actual e-mail address from sender
string LinkToMail { get; set; } //content of onclick
}
Can someone provide me with best solution. I need a code snipped that i can easily maintain and possible upgrade later on.
Thank you.
ASKER
Hello,
thank you for replying.
First problem i have is with name spae HtmlDocument
I checked documentation and its available in System.Windows.Browser and System.Windows.Forms ... i am doing this in console application
is this a problem?
Second:
From example posted in you link :
Thank you!
thank you for replying.
First problem i have is with name spae HtmlDocument
I checked documentation and its available in System.Windows.Browser and System.Windows.Forms ... i am doing this in console application
is this a problem?
Second:
From example posted in you link :
foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href"])
How do i navigate to my node?Thank you!
That shouldn't be an issue, have you added reference to
System.Windows.Forms.dll.
System.Windows.Forms.dll.
ASKER
Ok and what about second question
How do i navigate to my node?
How do i navigate to my node?
foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href"])
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Hell Fernando Soto,
I like the solution you posted unfortunatelly i am getting error when parsing html:
I think its a problem when parsing some special characters?
I like the solution you posted unfortunatelly i am getting error when parsing html:
An unhandled exception of type 'System.Xml.XmlException' occurred in System.Xml.dll
Additional information: 'type' is an unexpected token. The expected token is '='. Line 58, position 15.
I think its a problem when parsing some special characters?
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Hi Andy,
I just tried Fernando's solution and that works fine for me and produces below output.
BTW, did you try with the HTML Agility Pack solution that I suggested?
I just tried Fernando's solution and that works fine for me and produces below output.
BTW, did you try with the HTML Agility Pack solution that I suggested?
public class ClassMailBox
{
public string From { get; set; } //actual e-mail address from sender
public string LinkToMail { get; set; } //content of onclick
}
string responseFromServer = File.ReadAllText(@"E:\tsip\codeblocks_examples\expertsexchange\TotalEESolution\OrderXMLTest\ResponsefromServer.txt");
XElement html = XElement.Parse(responseFromServer);
ClassMailBox cmb = (from node in html.Descendants("tr")
where node.Elements().ElementAt(0).Name != "th"
select new ClassMailBox()
{
From = node.Elements().ElementAt(0).Value.Trim(),
LinkToMail = node.Elements().ElementAt(1).Element("a").Attribute("href").Value
}).FirstOrDefault();
Console.WriteLine(cmb.From);
Console.WriteLine(cmb.LinkToMail);
ASKER
As allways it was problem at my end. Everything is working as it should.
Thank you for all you help and time invested.
Best regards!
Thank you for all you help and time invested.
Best regards!
I have found HTMLAgilityPack to be useful to do this job. (HTML Agility on Code Plex)
The code samples can be found below.
HTML Agility Code Samples