DColin
asked on
Parse HTML tags from text file.
Hi Experts,
If I wanted to retrieve the <TR> tags from a web page I would do it like this:
Dim tagCollection As HtmlElementCollection
tagCollection = WebBrowser1.Document.Body. Document.G etElements ByTagName( "tr")
How do I retrieve the tags from the html file text something like this?
tagCollection = hmtlFileText.GetElementsBy TagName("t r")
If I wanted to retrieve the <TR> tags from a web page I would do it like this:
Dim tagCollection As HtmlElementCollection
tagCollection = WebBrowser1.Document.Body.
How do I retrieve the tags from the html file text something like this?
tagCollection = hmtlFileText.GetElementsBy
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Yes, but from the point of view of .NET, it is just a sequence of characters (that just HAPPEN to satisfy HTML syntax).
ASP.Net needs to GENERATE HTML, but it doesn't usually need to read it - that's the browsers job. That's why libraries like HTML Agility exist.
ASP.Net needs to GENERATE HTML, but it doesn't usually need to read it - that's the browsers job. That's why libraries like HTML Agility exist.
ASKER
jensfiederer,
I was thinking that the WebBrowser control uses the HtmlDocument class to hold the html text. So how do I load the html text into an HtmlDocument object without having to use a WebBrowser control?
I was thinking that the WebBrowser control uses the HtmlDocument class to hold the html text. So how do I load the html text into an HtmlDocument object without having to use a WebBrowser control?
Like I said, you can use HTML agility pack. It's free, it's available at the URI I provided, and it supports code like:
Note: I'm not involved in any way with the HTML Agility project, except that I needed to parse HTML files at one point a year or two ago and that is what I ended up using.
HtmlDocument doc = new HtmlDocument();
doc.Load("file.htm");
foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href"])
{
HtmlAttribute att = link["href"];
att.Value = FixLink(att);
}
doc.Save("file.htm");
Note: I'm not involved in any way with the HTML Agility project, except that I needed to parse HTML files at one point a year or two ago and that is what I ended up using.
ASKER
The text file will be an HTML file.