I receive periodic HTML emails detailing real estate property listings that have recently been added or changed. These emails include several relevant data elements, such as address and price, and are system-generated, so they are standardized. They take the form of
<HTML>
<STYLE>
...
</STYLE>
<BODY>
<TABLE> A table used primarily for formatting appearance, nothing important in it
...
</TABLE>
<TABLE> A table used primarily for formatting appearance, nothing important in it
...
</TABLE>
<TABLE> This is the headings for the data table columns
<TR>
<TD>Listing #</TD>
<TD>Status</TD>
<TD>Price</TD>
<TD>Address</TD>
<TD>Cross Street</TD>
<TD>Area</TD>
<TD>Type</TD>
<TD>BD</TD>
<TD>BA</TD>
<TD>Sq Ft</TD>
</TR>
<TR> This is the data, there may be one or many <TR> elements like this, one for each listing detail item
<TD>123456</TD>
<TD>Active</TD>
<TD>100000</TD>
<TD>123 Main St</TD>
<TD>Market</TD>
<TD>Downtown</TD>
<TD>MFM2-4</TD>
<TD>2</TD>
<TD>1.5</TD>
<TD>2560</TD>
</TR>
</TABLE>
<TABLE> A table used primarily for formatting appearance, nothing important in it
...
</TABLE>
<TABLE> A table used primarily for formatting appearance, nothing important in it
...
</TABLE>
</BODY>
</HTML>
So, the issue is: how can I use the MSHTML Object Model to extract the data contained in the tables where the actual data is? I cannot figure out how to identify and use a particular table through the object library.
I've come up with a pretty kludgey approach: since these emails are automatically generated, the first N lines before the first data table are always the same, and the last M lines after the last data table are also always the same. So using the FileSystemObject TextStream object, I can extract lines N+1 through M-1, which are exactly the HTML data tables I care about. I can then use that shortened HTML to instantiate a much smaller object which I think I know how to manipulate. But I'm accustomed to the MSXML object model, where you can actually walk the nodes and find what you want and I'm hoping that MSHTML offers something similar that I just haven't found yet.
Any advice anyone has about how to do this using only the object model would be greatly appreciated. If it is not possible, confirmation would be great. If anyone has an alternate approach to either the object model or the parsing approach, I'm all (virtual) ears.
The goal is to create a searchable database for this data. I have over 10 months' worth of emails and finding a particular property I know I saw listed sometime in the first quarter is currently a pretty tedious (and not always successful) process.