bs329
asked on
Parsing HTML tables using VB and placing data into excel
I have the following html tabular data and would like to create a macro to parse and put the data in a row of cells:
Data:
<div class="f2"><div class="f"><table class="t1"><tr class="rs2"><th colspan="4">Earnings Per Share - Quarterly Results</th></tr><tr class="r0"><th> </th> <th>FY (09/09)</th><th>FY (09/08)</th><th>FY (09/07)</th></tr>
<tr><td>1st Qtr</td><td align="right">$1.81</td><t d align="right">$1.81</td><t d align="right">$1.17</td></ tr>
<tr><td>2nd Qtr</td><td align="right">$1.35</td><t d align="right">$1.19</td><t d align="right">$0.89</td></ tr>
<tr><td>3rd Qtr</td><td align="right">$1.38</td><t d align="right">$1.21</td><t d align="right">$0.94</td></ tr>
<tr><td>4th Qtr</td><td align="right">$1.85</td><t d align="right">$1.28</td><t d align="right">$1.04</td></ tr>
<tr><td>Total</td><td align="right">$6.39</td><t d align="right">$5.49</td><t d align="right">$4.04</td></ tr>
</table></div></div>
Result in Excel in this order(each number is in its own cell):
$1.17 $0.89 $0.94 $1.04 $1.81 $1.19 $1.21 $1.28 $1.81 $1.35 $1.38 $1.85
Data:
<div class="f2"><div class="f"><table class="t1"><tr class="rs2"><th colspan="4">Earnings Per Share - Quarterly Results</th></tr><tr class="r0"><th> </th>
<tr><td>1st Qtr</td><td align="right">$1.81</td><t
<tr><td>2nd Qtr</td><td align="right">$1.35</td><t
<tr><td>3rd Qtr</td><td align="right">$1.38</td><t
<tr><td>4th Qtr</td><td align="right">$1.85</td><t
<tr><td>Total</td><td align="right">$6.39</td><t
</table></div></div>
Result in Excel in this order(each number is in its own cell):
$1.17 $0.89 $0.94 $1.04 $1.81 $1.19 $1.21 $1.28 $1.81 $1.35 $1.38 $1.85
ASKER
Thanks brettdj, that gets me mostly there. Can you help me to put this in the following order (I'm going to have 1000's of pieces of data like this and need to get it in row format in the following order):
$1.17 $0.89 $0.94 $1.04 $1.81 $1.19 $1.21 $1.28 $1.81 $1.35 $1.38 $1.85
$1.17 $0.89 $0.94 $1.04 $1.81 $1.19 $1.21 $1.28 $1.81 $1.35 $1.38 $1.85
ASKER
Hi brettdj,
I tried tweaking your code to work by directly opening the url (http://moneycentral.msn.com/investor/invsub/results/hilite.asp?symbol=aapl). I created HTTP objects and assigned the webpage to a string, but I'm not getting anything back when I run what I have. Is there something in particular I have to do if I'm trying to open a webpage and do this directly?
Thanks,
I tried tweaking your code to work by directly opening the url (http://moneycentral.msn.com/investor/invsub/results/hilite.asp?symbol=aapl). I created HTTP objects and assigned the webpage to a string, but I'm not getting anything back when I run what I have. Is there something in particular I have to do if I'm trying to open a webpage and do this directly?
Thanks,
ASKER
brettdj,
I altered what you gave me to open the url directly. However, its not coming out so clean and still trying to get it in the row format as I mentioned above:
Sub Extraxt()
Dim regEx, RegMC, RegM
Dim i As Long
Dim MyStr As String
Set objHTTP = CreateObject("MSXML2.XMLHT TP")
objHTTP.Open "GET", "http://moneycentral.msn.com/investor/invsub/results/hilite.asp?symbol=aapl", False
objHTTP.Send
strPageText = objHTTP.responseText
intPos = InStr(strPageText, "<th colspan='4'>Earnings Per Share - Quarterly Results</th>")
strPageText = Mid(strPageText, intPos + Len("<td>1st Qtr</td>"))
MyStr = strPageText
Set regEx = CreateObject("vbscript.reg exp")
With regEx
.Global = True
.Pattern = "(\$.+?)<"
Set RegMC = .Execute(MyStr)
For Each RegM In RegMC
i = i + 1
If i Mod 3 = 0 Then Cells(i / 3, 1) = RegM.submatches(0)
If i Mod 3 = 1 Then Cells((i + 2) / 3, 3) = RegM.submatches(0)
If i Mod 3 = 2 Then Cells((i + 1) / 3, 2) = RegM.submatches(0)
Next
End With
Set objHTTP = Nothing
End Sub
I altered what you gave me to open the url directly. However, its not coming out so clean and still trying to get it in the row format as I mentioned above:
Sub Extraxt()
Dim regEx, RegMC, RegM
Dim i As Long
Dim MyStr As String
Set objHTTP = CreateObject("MSXML2.XMLHT
objHTTP.Open "GET", "http://moneycentral.msn.com/investor/invsub/results/hilite.asp?symbol=aapl", False
objHTTP.Send
strPageText = objHTTP.responseText
intPos = InStr(strPageText, "<th colspan='4'>Earnings Per Share - Quarterly Results</th>")
strPageText = Mid(strPageText, intPos + Len("<td>1st Qtr</td>"))
MyStr = strPageText
Set regEx = CreateObject("vbscript.reg
With regEx
.Global = True
.Pattern = "(\$.+?)<"
Set RegMC = .Execute(MyStr)
For Each RegM In RegMC
i = i + 1
If i Mod 3 = 0 Then Cells(i / 3, 1) = RegM.submatches(0)
If i Mod 3 = 1 Then Cells((i + 2) / 3, 3) = RegM.submatches(0)
If i Mod 3 = 2 Then Cells((i + 1) / 3, 2) = RegM.submatches(0)
Next
End With
Set objHTTP = Nothing
End Sub
ASKER
brettdj,
I got everything working except for capturing NA values. I am able to get -$.xx and $.xx values. How can I change the regex to capture all 3 just mentioned:
.Pattern = "(\$.+?)<"
I got everything working except for capturing NA values. I am able to get -$.xx and $.xx values. How can I change the regex to capture all 3 just mentioned:
.Pattern = "(\$.+?)<"
very good :)
Can you please post a snippet of the strings that include the NA values so that I can tailor the regexp to suit
thx
Dave
Can you please post a snippet of the strings that include the NA values so that I can tailor the regexp to suit
thx
Dave
ASKER
For example, in reference to the string I gave above one of the <td> tags could have NA instead of a value. e.g. .....<td align="right">NA</td>..... .
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Open in new window