Link to home
Start Free TrialLog in
Avatar of bs329
bs329Flag for United States of America

asked on

Parsing HTML tables using VB and placing data into excel

I have the following html tabular data and would like to create a macro to parse and put the data in a row of cells:

Data:
<div class="f2"><div class="f"><table class="t1"><tr class="rs2"><th colspan="4">Earnings Per Share - Quarterly Results</th></tr><tr class="r0"><th>&nbsp;</th><th>FY (09/09)</th><th>FY (09/08)</th><th>FY (09/07)</th></tr>
<tr><td>1st Qtr</td><td align="right">$1.81</td><td align="right">$1.81</td><td align="right">$1.17</td></tr>
<tr><td>2nd Qtr</td><td align="right">$1.35</td><td align="right">$1.19</td><td align="right">$0.89</td></tr>
<tr><td>3rd Qtr</td><td align="right">$1.38</td><td align="right">$1.21</td><td align="right">$0.94</td></tr>
<tr><td>4th Qtr</td><td align="right">$1.85</td><td align="right">$1.28</td><td align="right">$1.04</td></tr>
<tr><td>Total</td><td align="right">$6.39</td><td align="right">$5.49</td><td align="right">$4.04</td></tr>
</table></div></div>

Result in Excel in this order(each number is in its own cell):
$1.17 $0.89 $0.94 $1.04 $1.81 $1.19 $1.21 $1.28 $1.81 $1.35 $1.38 $1.85
Avatar of Dave
Dave
Flag of Australia image

Pls see sample file
 

Sub Extraxt()
    Dim regEx, RegMC, RegM
    Dim i As Long
    Dim MyStr As String
    MyStr = "<div class='f2><div class='f'><table class='t1'><tr class='rs2'><th colspan='4'>Earnings Per Share - Quarterly Results</th></tr><tr class='r0'><th>&nbsp;</th><th>FY (09/09)</th><th>FY (09/08)</th><th>FY (09/07)</th></tr><tr><td>1st Qtr</td><td align='right'>$1.81</td><td align='right'>$1.81</td><td align='right'>$1.17</td></tr><tr><td>2nd Qtr</td><td align='right'>$1.35</td><td align='right'>$1.19</td><td align='right'>$0.89</td></tr><tr><td>3rd Qtr</td><td align='right'>$1.38</td><td align='right'>$1.21</td><td align='right'>$0.94</td></tr><tr><td>4th Qtr</td><td align='right'>$1.85</td><td align='right'>$1.28</td><td align='right'>$1.04</td></tr><tr><td>Total</td><td align='right'>$6.39</td><td align='right'>$5.49</td><td align='right'>$4.04</td></tr></table></div></div>"
    Set regEx = CreateObject("vbscript.regexp")
    With regEx
        .Global = True
        .Pattern = "(\$.+?)<"
        Set RegMC = .Execute(MyStr)
        For Each RegM In RegMC
            i = i + 1
            If i Mod 3 = 0 Then Cells(i / 3, 1) = RegM.submatches(0)
            If i Mod 3 = 1 Then Cells((i + 2) / 3, 3) = RegM.submatches(0)
            If i Mod 3 = 2 Then Cells((i + 1) / 3, 2) = RegM.submatches(0)
        Next
    End With
End Sub

Open in new window

Avatar of bs329

ASKER

Thanks brettdj, that gets me mostly there.  Can you help me to put this in the following order (I'm going to have 1000's of pieces of data like this and need to get it in row format in the following order):

$1.17 $0.89 $0.94 $1.04 $1.81 $1.19 $1.21 $1.28 $1.81 $1.35 $1.38 $1.85
Avatar of bs329

ASKER

Hi brettdj,
I tried tweaking your code to work by directly opening the url (http://moneycentral.msn.com/investor/invsub/results/hilite.asp?symbol=aapl).  I created HTTP objects and assigned the webpage to a string, but I'm not getting anything back when I run what I have.  Is there something in particular I have to do if I'm trying to open a webpage and do this directly?

Thanks,
Avatar of bs329

ASKER

brettdj,

I altered what you gave me to open the url directly.  However, its not coming out so clean and still trying to get it in the row format as I mentioned above:

Sub Extraxt()
    Dim regEx, RegMC, RegM
    Dim i As Long
    Dim MyStr As String
    Set objHTTP = CreateObject("MSXML2.XMLHTTP")
    objHTTP.Open "GET", "http://moneycentral.msn.com/investor/invsub/results/hilite.asp?symbol=aapl", False
    objHTTP.Send
    strPageText = objHTTP.responseText
    intPos = InStr(strPageText, "<th colspan='4'>Earnings Per Share - Quarterly Results</th>")
    strPageText = Mid(strPageText, intPos + Len("<td>1st Qtr</td>"))
    MyStr = strPageText
    Set regEx = CreateObject("vbscript.regexp")
    With regEx
        .Global = True
        .Pattern = "(\$.+?)<"
        Set RegMC = .Execute(MyStr)
        For Each RegM In RegMC
            i = i + 1
            If i Mod 3 = 0 Then Cells(i / 3, 1) = RegM.submatches(0)
            If i Mod 3 = 1 Then Cells((i + 2) / 3, 3) = RegM.submatches(0)
            If i Mod 3 = 2 Then Cells((i + 1) / 3, 2) = RegM.submatches(0)
        Next
    End With
    Set objHTTP = Nothing
End Sub
Avatar of bs329

ASKER

brettdj,

I got everything working except for capturing NA values.  I am able to get -$.xx and $.xx values.  How can I change the regex to capture all 3 just mentioned:

.Pattern = "(\$.+?)<"
very good :)

Can you please post a snippet of the strings that include the NA values so that I can tailor the regexp to suit

thx

Dave
Avatar of bs329

ASKER

For example,  in reference to the string I gave above one of the <td> tags could have NA instead of a value.  e.g. .....<td align="right">NA</td>......
ASKER CERTIFIED SOLUTION
Avatar of Dave
Dave
Flag of Australia image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial