asked on

Parse HTML tags from text file.

Hi Experts,

If I wanted to retrieve the <TR> tags from a web page I would do it like this:

Dim tagCollection As HtmlElementCollection
tagCollection = WebBrowser1.Document.Body.Document.GetElementsByTagName("tr")

How do I retrieve the tags from the html file text something like this?

tagCollection = hmtlFileText.GetElementsByTagName("tr")

ASKER CERTIFIED SOLUTION

Jens Fiederer

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

DColin

ASKER

jensfiederer,

The text file will be an HTML file.

Jens Fiederer

Yes, but from the point of view of .NET, it is just a sequence of characters (that just HAPPEN to satisfy HTML syntax).

ASP.Net needs to GENERATE HTML, but it doesn't usually need to read it - that's the browsers job. That's why libraries like HTML Agility exist.

DColin

ASKER

jensfiederer,

I was thinking that the WebBrowser control uses the HtmlDocument class to hold the html text. So how do I load the html text into an HtmlDocument object without having to use a WebBrowser control?

Jens Fiederer

Like I said, you can use HTML agility pack. It's free, it's available at the URI I provided, and it supports code like:

HtmlDocument doc = new HtmlDocument();
 doc.Load("file.htm");
 foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href"])
 {
    HtmlAttribute att = link["href"];
    att.Value = FixLink(att);
 }
 doc.Save("file.htm");

Open in new window

Note: I'm not involved in any way with the HTML Agility project, except that I needed to parse HTML files at one point a year or two ago and that is what I ended up using.