[x]
Posted via EE Mobile

Search, ask, and monitor your questions on the go with EE Mobile. Visit Experts Exchange from your mobile device and never be out of touch again.

Question
[x]
Attachment Details
[x]
The Solution Rating System

With so many solutions, how can you tell which solutions are most likely to help you and which ones are not? To provide you with a tool to use, we rate our solutions based on various elements that most accurately determine if a solution is a quality solution. To explain what factors affect the solution rating, here are the elements we take into consideration when formulating our solution rating.

  • The Grade of the Solution
  • The Zone Rank of the Expert Providing the Solution
  • The Number of Author and Expert Comments
  • The Number of Experts Contributing
  • The Feedback of the Community

Your Input Matters
Because of the way the system is set up, the most important variable in this equation is you. As a member of Experts Exchange, you are able to cast your vote on the quality of the solutions in regard to how complete, accurate, helpful and easy to understand each solution is. When you provide your feedback, each rating is adjusted accordingly. So, if you see a solution that has a poor rating that you think is a good solution, let us know by rating it. As you do, the rating will be adjusted and will become more accurate for other members of our site.

If you have any suggestions that you would like to make for our rating system, please ask a question in the Suggestions Zone of Community Support.

Thank you!

7.4

Using VB and the MSHTML Object Model to parse data from an HTML document

Asked by pique_tech in Visual Basic Programming

Tags: mshtml, vb

I receive periodic HTML emails detailing real estate property listings that have recently been added or changed.  These emails include several relevant data elements, such as address and price, and are system-generated, so they are standardized.  They take the form of
<HTML>
<STYLE>
...
</STYLE>
<BODY>
<TABLE>                                       A table used primarily for formatting appearance, nothing important in it
...
</TABLE>
<TABLE>                                       A table used primarily for formatting appearance, nothing important in it
...
</TABLE>
<TABLE>                                         This is the headings for the data table columns
<TR>
     <TD>Listing #</TD>
     <TD>Status</TD>
     <TD>Price</TD>
     <TD>Address</TD>
     <TD>Cross Street</TD>
     <TD>Area</TD>
     <TD>Type</TD>
     <TD>BD</TD>
     <TD>BA</TD>
     <TD>Sq Ft</TD>
</TR>
<TR>                                               This is the data, there may be one or many <TR> elements like this, one for each listing detail item
     <TD>123456</TD>
     <TD>Active</TD>
     <TD>100000</TD>
     <TD>123 Main St</TD>
     <TD>Market</TD>
     <TD>Downtown</TD>
     <TD>MFM2-4</TD>
     <TD>2</TD>
     <TD>1.5</TD>
     <TD>2560</TD>
</TR>
</TABLE>
<TABLE>                                       A table used primarily for formatting appearance, nothing important in it
...
</TABLE>
<TABLE>                                       A table used primarily for formatting appearance, nothing important in it
...
</TABLE>
</BODY>
</HTML>

So, the issue is:  how can I use the MSHTML Object Model to extract the data contained in the tables where the actual data is?  I cannot figure out how to identify and use a particular table through the object library.

I've come up with a pretty kludgey approach:  since these emails are automatically generated, the first N lines before the first data table are always the same, and the last M lines after the last data table are also always the same.  So using the FileSystemObject TextStream object, I can extract lines N+1 through M-1, which are exactly the HTML data tables I care about.  I can then use that shortened HTML to instantiate a much smaller object which I think I know how to manipulate.  But I'm accustomed to the MSXML object model, where you can actually walk the nodes and find what you want and I'm hoping that MSHTML offers something similar that I just haven't found yet.

Any advice anyone has about how to do this using only the object model would be greatly appreciated.  If it is not possible, confirmation would be great.  If anyone has an alternate approach to either the object model or the parsing approach, I'm all (virtual) ears.

The goal is to create a searchable database for this data.  I have over 10 months' worth of emails and finding a particular property I know I saw listed sometime in the first quarter is currently a pretty tedious (and not always successful) process.    
[+][-]06/17/05 08:49 AM, ID: 14242172Accepted Solution

View this solution now by starting your 30-day free trial. Setting up your free trial is quick, easy, and secure. We will return you to this solution, unlocked, when you're done.

About this solution

Zone: Visual Basic Programming
Tags: mshtml, vb
Sign Up Now!
Solution Provided By: wesbird
Participating Experts: 2
Solution Grade: B
 
[+][-]06/20/05 03:23 PM, ID: 14261275Assisted Solution

Assisted solutions are selected by the member who asked the question as a comment that contributed to their question's solution.

Start your 30-day free trial to view this Assisted Solution or ask the Experts your question.

 
[+][-]09/15/05 01:29 PM, ID: 14893180Author Comment

Often, when Experts are collaborating with members who have asked questions, they will request additional information about the problem. Askers respond with an author comment like this one.

Start your 30-day free trial to view this Author Comment or ask the Experts your question.

 
 
Loading Advertisement...
20091118-EE-VQP-93