Go Premium for a chance to win a PS4. Enter to Win

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 306
  • Last Modified:

Parse HTML tags from text file.

Hi Experts,

If I wanted to retrieve the <TR> tags from a web page I would do it like this:

Dim tagCollection As HtmlElementCollection
tagCollection = WebBrowser1.Document.Body.Document.GetElementsByTagName("tr")

How do I retrieve the tags from the html file text something like this?

tagCollection = hmtlFileText.GetElementsByTagName("tr")
0
DColin
Asked:
DColin
  • 3
  • 2
1 Solution
 
Jens FiedererCommented:
Arbitrary files are not structured as HTML.  It is not hard to find all instances of "<TR>" doing simple text searches, or maybe "<TR" if you don't want to miss TR elements with attributes.  But to really structure it you need to have it parsed.

If you are fortunate enough to be using XHTML, you can use .NET XML parsing functions.

Otherwise you'll probably need a 3rd party library like HTML Agility

See http://htmlagilitypack.codeplex.com/
0
 
DColinAuthor Commented:
jensfiederer,

The text file will be an HTML file.
0
 
Jens FiedererCommented:
Yes, but from the point of view of .NET, it is just a sequence of characters (that just HAPPEN to satisfy HTML syntax).  

ASP.Net needs to GENERATE HTML, but it doesn't usually need to read it - that's the browsers job.  That's why libraries like HTML Agility exist.
0
 
DColinAuthor Commented:
jensfiederer,

I was thinking that the WebBrowser control uses the HtmlDocument class to hold the html text. So how do I load the html text into an HtmlDocument object without having to use a WebBrowser control?
0
 
Jens FiedererCommented:
Like I said, you can use HTML agility pack.  It's free, it's available at the URI I provided, and it supports code like:

HtmlDocument doc = new HtmlDocument();
 doc.Load("file.htm");
 foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href"])
 {
    HtmlAttribute att = link["href"];
    att.Value = FixLink(att);
 }
 doc.Save("file.htm");

Open in new window


Note: I'm not involved in any way with the HTML Agility project, except that I needed to parse HTML files at one point a year or two ago and that is what I ended up using.
0

Featured Post

VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

  • 3
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now