HTML Parser in VB .NET

I have to write a program in VB .NET that would look at a html and text files, parse out specific information and then deposit it in a database. This would not be a problem if it was something specific like an e-mail address that you could look at the @ sign for example. Not all the pages look exactly the same, not all have the same format, and the data that I am looking for is just numbers. An example would be to find a person's salary on the page.

As a human, I would look around on the page, look for references of "salary", then reference it that way.

Any ideas ? Information ?

Where to start?

Regex maybe?
waterzapAsked:
Who is Participating?
 
Bob LearnedCommented:
If it is simple, you might also be able to get the HTML text, and use simple Regular Expressions to parse.  The HTML Document class is a fairly hefty chunk of real estate that is like squirrel hunting with an elephant rifle.

Bob
0
 
armoghanCommented:
If you need to find MSHTML.. Its not 2005
Add Reference -> .NET -> Microsoft MSHTML -> Select
0
 
armoghanCommented:
opps ... sorry wrote in the wrong window
0
All Courses

From novice to tech pro — start learning today.