Link to home
Start Free TrialLog in
Avatar of DColin
DColinFlag for Thailand

asked on

Parse HTML tags from text file.

Hi Experts,

If I wanted to retrieve the <TR> tags from a web page I would do it like this:

Dim tagCollection As HtmlElementCollection
tagCollection = WebBrowser1.Document.Body.Document.GetElementsByTagName("tr")

How do I retrieve the tags from the html file text something like this?

tagCollection = hmtlFileText.GetElementsByTagName("tr")
ASKER CERTIFIED SOLUTION
Avatar of Jens Fiederer
Jens Fiederer
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of DColin

ASKER

jensfiederer,

The text file will be an HTML file.
Yes, but from the point of view of .NET, it is just a sequence of characters (that just HAPPEN to satisfy HTML syntax).  

ASP.Net needs to GENERATE HTML, but it doesn't usually need to read it - that's the browsers job.  That's why libraries like HTML Agility exist.
Avatar of DColin

ASKER

jensfiederer,

I was thinking that the WebBrowser control uses the HtmlDocument class to hold the html text. So how do I load the html text into an HtmlDocument object without having to use a WebBrowser control?
Like I said, you can use HTML agility pack.  It's free, it's available at the URI I provided, and it supports code like:

HtmlDocument doc = new HtmlDocument();
 doc.Load("file.htm");
 foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href"])
 {
    HtmlAttribute att = link["href"];
    att.Value = FixLink(att);
 }
 doc.Save("file.htm");

Open in new window


Note: I'm not involved in any way with the HTML Agility project, except that I needed to parse HTML files at one point a year or two ago and that is what I ended up using.