chocs666
asked on
Retrieve text data from a website and insert into a recordset
Hi,
I'm trying to pull down some data from a website. I used to do it using a web query in excel, but I found that that wasn't very reliable. Does anyone know is there a way using vb to get this information into a recordset (or whatever) ?
Cheers.
I'm trying to pull down some data from a website. I used to do it using a web query in excel, but I found that that wasn't very reliable. Does anyone know is there a way using vb to get this information into a recordset (or whatever) ?
Cheers.
Since you mention a web query I'm assuming that the data is in an HTML table on a web page. If that's correct, then you can use the WebBrowser control to access the page, then parse the Document property of the WebBrowser to find the data, and finally insert it into your recordset. Something like:
Dim objDocument As HTMLDocument
'Open the page
WebBrowser1.Navigate2 "www.mypage.com"
'Wait for the page to finish loading
While WebBrowser1.ReadyState = READYSTATE_LOADING
Wend
'Grab a copy of the loaded page
Set objDocument = WebBrowser1.Document
You can now parse objDocument, find the data, and load it into the recordset. It'd makes this easy if the data is in a table and the table has a name.
Dim objDocument As HTMLDocument
'Open the page
WebBrowser1.Navigate2 "www.mypage.com"
'Wait for the page to finish loading
While WebBrowser1.ReadyState = READYSTATE_LOADING
Wend
'Grab a copy of the loaded page
Set objDocument = WebBrowser1.Document
You can now parse objDocument, find the data, and load it into the recordset. It'd makes this easy if the data is in a table and the table has a name.
ASKER
BlueDevilFan,
I think that a webbrowser is the way to go for me because I am trying to get the information from a table on a website. Would you mind telling me how to go about parsing the objDocument to get the information from the table? Also I need to login to this website before I can get the info. What's the easiest way to do this. Finally, the while loop in your code seems to freeze as if it goes into an infinite loop.
Thanks for the help.
I think that a webbrowser is the way to go for me because I am trying to get the information from a table on a website. Would you mind telling me how to go about parsing the objDocument to get the information from the table? Also I need to login to this website before I can get the info. What's the easiest way to do this. Finally, the while loop in your code seems to freeze as if it goes into an infinite loop.
Thanks for the help.
Okay, here you go. This handles everything except the logging in portion. How you handle that depends on whether you'll use this program interactively or whether it needs to run on its own. If the former, then you can load the page and login just as you would if you were doing this from IE. Once you've logged in and the page with the table has loaded, you'd click a button and it'd suck the data out of the table. On the other hand, if this needs to run on its own, then you'd need to stuff the login information into the correct fields on the page, programmatically click the login button, wait for the subsequent page to load, and then fire the portion that gets the data.
You need to edit two items in the code below. Replace "www.somesite.com" with the URL of the page with the data, and replace "MyTable" with the title of the table the data is in. If the table doesn't have a title, then we have a problem. We'll have to find a way to distinguish that one table from any other tables on the page.
Private Sub Command1_Click()
Dim objDocument As Object, _
objElements As Object, _
objItem As Object, _
objTable As Object, _
objRows As Object, _
objRow As Object, _
objCols As Object, _
objCol As Object
'Load the web page
Browser.Navigate2 "www.somesite.com"
'Wait for the page to load
While Browser.readyState <> READYSTATE_COMPLETE
DoEvents
Wend
Set objDocument = Browser.Document
Set objElements = objDocument.All
For Each objItem In objElements
If objItem.Title = "MyTable" Then
Set objTable = objItem.childNodes.Item
Set objRows = objTable.childNodes
For Each objRow In objRows
Set objCols = objRow.children
For Each objCol In objCols
'This is where you'd get the data. I put the Debug.Print in to demonstrate that it works.
Debug.Print objCol.innerText
Next
Next
Exit For
End If
Next
End
End Sub
You need to edit two items in the code below. Replace "www.somesite.com" with the URL of the page with the data, and replace "MyTable" with the title of the table the data is in. If the table doesn't have a title, then we have a problem. We'll have to find a way to distinguish that one table from any other tables on the page.
Private Sub Command1_Click()
Dim objDocument As Object, _
objElements As Object, _
objItem As Object, _
objTable As Object, _
objRows As Object, _
objRow As Object, _
objCols As Object, _
objCol As Object
'Load the web page
Browser.Navigate2 "www.somesite.com"
'Wait for the page to load
While Browser.readyState <> READYSTATE_COMPLETE
DoEvents
Wend
Set objDocument = Browser.Document
Set objElements = objDocument.All
For Each objItem In objElements
If objItem.Title = "MyTable" Then
Set objTable = objItem.childNodes.Item
Set objRows = objTable.childNodes
For Each objRow In objRows
Set objCols = objRow.children
For Each objCol In objCols
'This is where you'd get the data. I put the Debug.Print in to demonstrate that it works.
Debug.Print objCol.innerText
Next
Next
Exit For
End If
Next
End
End Sub
ASKER
I got the login part working as this just needs to be done once interactively after the program is opened.
The next part could be a problem as there dosen't seem to be a title for the table. Would it help that there is only one table on the page?
The next part could be a problem as there dosen't seem to be a title for the table. Would it help that there is only one table on the page?
Yes, it'd help if there's only one table on the page. Problem is most pages depend heavily on tables for formatting and there are lots of them on a page. If there is truly just one table, then we can mod the code to find it. If there's even one other table though, then we'll have to find some means of picking it out from the others.
ASKER
I'm almost sure that there is only one table. I could post some of the source code if that would help.
Yeah, if you could post the HTML for the page in question that'd help.
ASKER
Here goes:
<br><script language="JavaScript">if(p arent.main )if(parent .upper.frm ){parent.u pper.frm.s ubtitle.va lue = '- Market prices';}</script><html><h ead><title >Market prices</title><meta http-equiv="refresh" content="10"><link rel="stylesheet" type="text/css" href="style.aspx"><script language="JavaScript">func tion SectionOver(row){row.style .backgroun dColor = '#AAAAAA';}
function SectionOut(row){row.style. background Color = '';}</script><script language="JavaScript">var sstr = '';function selectKeyDown(s){var c, i;var x = false;if(event.keyCode==13 ){s.blur;r eturn;}c = unescape('%' + event.keyCode.toString(16) );sstr = sstr + c;for(i = 0;i<s.length;i++){if(s.opt ions[i].te xt.slice(0 ,sstr.leng th).toLowe rCase() == sstr){s.selectedIndex = i;x = true;break;}}if(!x){sstr=' ';s.value= sstr}}</sc ript></hea d><body border=0 leftmargin=0 topmargin=0 bottommargin=0 rightmargin=0 marginheight=0 marginwidth=0><center><scr ipt language="JavaScript">func tion GroupJSRefreshPageGroup(){ var pagename = location.pathname.substrin g(location .pathname. lastIndexO f("/")+1);
var selectedgroup = document.frm.groupid.selec tedIndex;
var selectedgroupid = document.frm.groupid.optio ns[selecte dgroup].va lue;
if (location.href.indexOf('?' ) == -1){
location.href = location.href + '?groupid=' + selectedgroupid;
}else{
var newhref = location.href;
if (location.href.indexOf('gr oupid') != -1){
var newhref = location.href.substring(0, location.href.indexOf('gro upid'));
if (location.href.indexOf('&' , location.href.indexOf('gro upid') + 1) != -1){
newhref += location.href.substring(lo cation.hre f.indexOf( '&', location.href.indexOf('gro upid') + 1), location.href.length);
}
}
location.href = newhref + '&groupid=' + selectedgroupid;
}
}
</script><script language="JavaScript">func tion GroupJSRefreshPageSubgroup (groupid){
var pagename = location.pathname.substrin g(location .pathname. lastIndexO f("/")+1);
if (document.frm.subgroupid.t ype == "hidden"){
var selectedsubgroupid = document.frm.subgroupid.va lue;
} else {
var selectedsubgroup = document.frm.subgroupid.se lectedInde x;
var selectedsubgroupid = document.frm.subgroupid.op tions[sele ctedsubgro up].value;
}
if (location.href.indexOf('?' ) == -1){
location.href = location.href + '?subgroupid=' + selectedsubgroupid + '&groupid=' + groupid
}else{
var newhref = location.href;
if (location.href.indexOf('su bgroupid') != -1){
var newhref = location.href.substring(0, location.href.indexOf('sub groupid')) ;
if (location.href.indexOf('&' , location.href.indexOf('sub groupid') + 1) != -1){
newhref += location.href.substring(lo cation.hre f.indexOf( '&', location.href.indexOf('sub groupid') + 1), location.href.length);
}
}
location.href = newhref + '&subgroupid=' + selectedsubgroupid;
}
}
</script><form method="post" action="#" name=frm><table cellspacing=2 cellpadding=1 border=0><tr class="title"><td colspan="7"> </td></t r><tr><td colspan="3"><b> </b>< /td><td align="left"><b>Group</b>< /td><td><s elect onkeypress="selectKeyDown( this);retu rn false;" onblur="sstr=''" onchange="GroupJSRefreshPa geGroup(2) " name="groupid"><option value="0" selected>-- select a group --</option><option value="2" selected>Currencies</optio n><option value="1">Indices - European</option><option value="9">Indices - US</option><option value="3">Irish Shares</option><option value="11">UK Shares</option><option value="5">US Shares</option><option value="13">Commodities</op tion><opti on value="4">Popular Markets</option></select>< /td><td align="left"><b>Subgroup</ b></td><td >A-Z<input type="hidden" name="subgroupid" value="2"></td></tr><tr class="title"><td align="left"><b>Market</b> </td><td colspan="2"><b>Live Prices</b></td><td colspan="2"><b>Reuters</b> </td><td align="left"><b>High</b></ td><td align="left"><b>Low</b></t d></tr><tr class="title"><td align="left"><b> </b> </td><td align="left"><b>Bid</b></t d><td align="left"><b>Ask</b></t d><td align="left"><b>Bid</b></t d><td align="left"><b>Ask</b></t d><td align="left"><b> </b> </td><td align="left"><b> </b> </td></tr> <tr bgcolor="#CFCFCF" onmouseover="SectionOver(t his);" onmouseOut="SectionOut(thi s);"><td>< a onMouseOver="self.status = 'DOLLAR CHF DEC'; return true;" onmouseout="self.status='' " href="/admin/core/market_u pdate.aspx ?update=tr ue&id=2752 &subgroupi d=2&groupi d=2">DOLLA R CHF DEC</a></td><td align="right">12,466</td>< td align="right">12,478</td>< td align="right">1</td><td align="right">1</td><td align="right">12,494</td>< td align="right">12,426</td>< /tr><tr bgcolor="#dddddd" onmouseover="SectionOver(t his);" onmouseOut="SectionOut(thi s);"><td>< a onMouseOver="self.status = 'DOLLAR CHF MAR'; return true;" onmouseout="self.status='' " href="/admin/core/market_u pdate.aspx ?update=tr ue&id=2970 &subgroupi d=2&groupi d=2">DOLLA R CHF MAR</a></td><td align="right">12,422</td>< td align="right">12,434</td>< td align="right">1</td><td align="right">1</td><td align="right">12,450</td>< td align="right">12,381</td>< /tr><tr bgcolor="#CFCFCF" onmouseover="SectionOver(t his);" onmouseOut="SectionOut(thi s);"><td>< a onMouseOver="self.status = 'DOLLAR YEN DEC'; return true;" onmouseout="self.status='' " href="/admin/core/market_u pdate.aspx ?update=tr ue&id=2749 &subgroupi d=2&groupi d=2">DOLLA R YEN DEC</a></td><td align="right">10,997</td>< td align="right">11,009</td>< td align="right">110</td><td align="right">110</td><td align="right">11,026</td>< td align="right">10,967</td>< /tr><tr bgcolor="#dddddd" onmouseover="SectionOver(t his);" onmouseOut="SectionOut(thi s);"><td>< a onMouseOver="self.status = 'DOLLAR YEN MAR'; return true;" onmouseout="self.status='' " href="/admin/core/market_u pdate.aspx ?update=tr ue&id=2967 &subgroupi d=2&groupi d=2">DOLLA R YEN MAR</a></td><td align="right">10,942</td>< td align="right">10,954</td>< td align="right">110</td><td align="right">110</td><td align="right">10,971</td>< td align="right">10,912</td>< /tr><tr bgcolor="#CFCFCF" onmouseover="SectionOver(t his);" onmouseOut="SectionOut(thi s);"><td>< a onMouseOver="self.status = 'EURO CHF DEC'; return true;" onmouseout="self.status='' " href="/admin/core/market_u pdate.aspx ?update=tr ue&id=2756 &subgroupi d=2&groupi d=2">EURO CHF DEC</a></td><td align="right">15,467</td>< td align="right">15,483</td>< td align="right">2</td><td align="right">2</td><td align="right">15,486</td>< td align="right">15,424</td>< /tr><tr bgcolor="#dddddd" onmouseover="SectionOver(t his);" onmouseOut="SectionOut(thi s);"><td>< a onMouseOver="self.status = 'EURO CHF MAR'; return true;" onmouseout="self.status='' " href="/admin/core/market_u pdate.aspx ?update=tr ue&id=2974 &subgroupi d=2&groupi d=2">EURO CHF MAR</a></td><td align="right">15,405</td>< td align="right">15,421</td>< td align="right">2</td><td align="right">2</td><td align="right">15,424</td>< td align="right">15,362</td>< /tr><tr bgcolor="#CFCFCF" onmouseover="SectionOver(t his);" onmouseOut="SectionOut(thi s);"><td>< a onMouseOver="self.status = 'EURO DOLLAR DEC'; return true;" onmouseout="self.status='' " href="/admin/core/market_u pdate.aspx ?update=tr ue&id=2748 &subgroupi d=2&groupi d=2">EURO DOLLAR DEC</a></td><td align="right">12,397</td>< td align="right">12,407</td>< td align="right">1</td><td align="right">1</td><td align="right">12,426</td>< td align="right">12,376</td>< /tr><tr bgcolor="#dddddd" onmouseover="SectionOver(t his);" onmouseOut="SectionOut(thi s);"><td>< a onMouseOver="self.status = 'EURO DOLLAR MAR'; return true;" onmouseout="self.status='' " href="/admin/core/market_u pdate.aspx ?update=tr ue&id=2966 &subgroupi d=2&groupi d=2">EURO DOLLAR MAR</a></td><td align="right">12,393</td>< td align="right">12,403</td>< td align="right">1</td><td align="right">1</td><td align="right">12,423</td>< td align="right">12,373</td>< /tr><tr bgcolor="#CFCFCF" onmouseover="SectionOver(t his);" onmouseOut="SectionOut(thi s);"><td>< a onMouseOver="self.status = 'EURO YEN DEC'; return true;" onmouseout="self.status='' " href="/admin/core/market_u pdate.aspx ?update=tr ue&id=2753 &subgroupi d=2&groupi d=2">EURO YEN DEC</a></td><td align="right">13,637</td>< td align="right">13,653</td>< td align="right">137</td><td align="right">137</td><td align="right">13,674</td>< td align="right">13,601</td>< /tr><tr bgcolor="#dddddd" onmouseover="SectionOver(t his);" onmouseOut="SectionOut(thi s);"><td>< a onMouseOver="self.status = 'EURO YEN MAR'; return true;" onmouseout="self.status='' " href="/admin/core/market_u pdate.aspx ?update=tr ue&id=2971 &subgroupi d=2&groupi d=2">EURO YEN MAR</a></td><td align="right">13,563</td>< td align="right">13,579</td>< td align="right">137</td><td align="right">137</td><td align="right">13,599</td>< td align="right">13,526</td>< /tr><tr bgcolor="#CFCFCF" onmouseover="SectionOver(t his);" onmouseOut="SectionOut(thi s);"><td>< a onMouseOver="self.status = 'GBP CHF DEC'; return true;" onmouseout="self.status='' " href="/admin/core/market_u pdate.aspx ?update=tr ue&id=2755 &subgroupi d=2&groupi d=2">GBP CHF DEC</a></td><td align="right">22,288</td>< td align="right">22,312</td>< td align="right">2</td><td align="right">2</td><td align="right">22,419</td>< td align="right">22,199</td>< /tr><tr bgcolor="#dddddd" onmouseover="SectionOver(t his);" onmouseOut="SectionOut(thi s);"><td>< a onMouseOver="self.status = 'GBP CHF MAR'; return true;" onmouseout="self.status='' " href="/admin/core/market_u pdate.aspx ?update=tr ue&id=2973 &subgroupi d=2&groupi d=2">GBP CHF MAR</a></td><td align="right">22,032</td>< td align="right">22,056</td>< td align="right">2</td><td align="right">2</td><td align="right">22,162</td>< td align="right">21,944</td>< /tr><tr bgcolor="#CFCFCF" onmouseover="SectionOver(t his);" onmouseOut="SectionOut(thi s);"><td>< a onMouseOver="self.status = 'GBP DOLLAR DEC'; return true;" onmouseout="self.status='' " href="/admin/core/market_u pdate.aspx ?update=tr ue&id=2750 &subgroupi d=2&groupi d=2">GBP DOLLAR DEC</a></td><td align="right">17,872</td>< td align="right">17,886</td>< td align="right">2</td><td align="right">2</td><td align="right">18,003</td>< td align="right">17,818</td>< /tr><tr bgcolor="#dddddd" onmouseover="SectionOver(t his);" onmouseOut="SectionOut(thi s);"><td>< a onMouseOver="self.status = 'GBP DOLLAR MAR'; return true;" onmouseout="self.status='' " href="/admin/core/market_u pdate.aspx ?update=tr ue&id=2968 &subgroupi d=2&groupi d=2">GBP DOLLAR MAR</a></td><td align="right">17,731</td>< td align="right">17,745</td>< td align="right">2</td><td align="right">2</td><td align="right">17,861</td>< td align="right">17,678</td>< /tr><tr bgcolor="#CFCFCF" onmouseover="SectionOver(t his);" onmouseOut="SectionOut(thi s);"><td>< a onMouseOver="self.status = 'GBP EURO DEC'; return true;" onmouseout="self.status='' " href="/admin/core/market_u pdate.aspx ?update=tr ue&id=2751 &subgroupi d=2&groupi d=2">GBP EURO DEC</a></td><td align="right">14,414</td>< td align="right">14,428</td>< td align="right">1</td><td align="right">1</td><td align="right">14,514</td>< td align="right">14,376</td>< /tr><tr bgcolor="#dddddd" onmouseover="SectionOver(t his);" onmouseOut="SectionOut(thi s);"><td>< a onMouseOver="self.status = 'GBP EURO MAR'; return true;" onmouseout="self.status='' " href="/admin/core/market_u pdate.aspx ?update=tr ue&id=2969 &subgroupi d=2&groupi d=2">GBP EURO MAR</a></td><td align="right">14,302</td>< td align="right">14,316</td>< td align="right">1</td><td align="right">1</td><td align="right">14,401</td>< td align="right">14,264</td>< /tr><tr bgcolor="#CFCFCF" onmouseover="SectionOver(t his);" onmouseOut="SectionOut(thi s);"><td>< a onMouseOver="self.status = 'GBP YEN DEC'; return true;" onmouseout="self.status='' " href="/admin/core/market_u pdate.aspx ?update=tr ue&id=2754 &subgroupi d=2&groupi d=2">GBP YEN DEC</a></td><td align="right">19,658</td>< td align="right">19,682</td>< td align="right">199</td><td align="right">199</td><td align="right">19,792</td>< td align="right">19,566</td>< /tr><tr bgcolor="#dddddd" onmouseover="SectionOver(t his);" onmouseOut="SectionOut(thi s);"><td>< a onMouseOver="self.status = 'GBP YEN MAR'; return true;" onmouseout="self.status='' " href="/admin/core/market_u pdate.aspx ?update=tr ue&id=2972 &subgroupi d=2&groupi d=2">GBP YEN MAR</a></td><td align="right">19,402</td>< td align="right">19,426</td>< td align="right">199</td><td align="right">199</td><td align="right">19,535</td>< td align="right">19,312</td>< /tr><tr class="title"><td colspan="7"><b> </b>< /td></tr>< /table><br >[<a onMouseOver="self.status = 'Add group'; return true;" onmouseout="self.status='' " href="/admin/core/group_up date.aspx" >Add group</a>] - [<a onMouseOver="self.status = 'Add market'; return true;" onmouseout="self.status='' " href="/admin/core/market_u pdate.aspx ">Add market</a>] - [<a onMouseOver="self.status = 'Find markets'; return true;" onmouseout="self.status='' " href="/admin/core/market_f ind.aspx"> Find markets</a>] - [<a onMouseOver="self.status = 'Add market sector'; return true;" onmouseout="self.status='' " href="/admin/core/marketse ctor_updat e.aspx">Ad d market sector</a>] - [<a onMouseOver="self.status = 'List market sector'; return true;" onmouseout="self.status='' " href="/admin/core/marketse ctor_list. aspx">List market sector</a>] - [<a onMouseOver="self.status = 'List market errors'; return true;" onmouseout="self.status='' " href="/admin/core/market_e rror_list. aspx">List market errors</a>] - [<a onMouseOver="self.status = 'List expired markets'; return true;" onmouseout="self.status='' " href="/admin/core/market_e xpired_lis t.aspx">Li st expired markets</a>] - [<a onMouseOver="self.status = 'Manage groups'; return true;" onmouseout="self.status='' " href="/admin/core/group_ma nagement.a spx">Manag e groups</a>]</form><script language="JavaScript">var element, tmpElement;for(var i=0; i < document.forms[0].elements .length; i++){tmpElement = document.forms[0].elements [i];if((tm pElement.t ype != 'hidden') && (tmpElement.disabled != true) && (tmpElement.readOnly != true)){tmpElement.focus(); break;}}</script></center> <br></body ></html>
<br><script language="JavaScript">if(p
function SectionOut(row){row.style.
var selectedgroup = document.frm.groupid.selec
var selectedgroupid = document.frm.groupid.optio
if (location.href.indexOf('?'
location.href = location.href + '?groupid=' + selectedgroupid;
}else{
var newhref = location.href;
if (location.href.indexOf('gr
var newhref = location.href.substring(0,
if (location.href.indexOf('&'
newhref += location.href.substring(lo
}
}
location.href = newhref + '&groupid=' + selectedgroupid;
}
}
</script><script language="JavaScript">func
var pagename = location.pathname.substrin
if (document.frm.subgroupid.t
var selectedsubgroupid = document.frm.subgroupid.va
} else {
var selectedsubgroup = document.frm.subgroupid.se
var selectedsubgroupid = document.frm.subgroupid.op
}
if (location.href.indexOf('?'
location.href = location.href + '?subgroupid=' + selectedsubgroupid + '&groupid=' + groupid
}else{
var newhref = location.href;
if (location.href.indexOf('su
var newhref = location.href.substring(0,
if (location.href.indexOf('&'
newhref += location.href.substring(lo
}
}
location.href = newhref + '&subgroupid=' + selectedsubgroupid;
}
}
</script><form method="post" action="#" name=frm><table cellspacing=2 cellpadding=1 border=0><tr class="title"><td colspan="7"> </td></t
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
That worked perfectly! I was trying to read in the page as text and parse it using left() and right() functions but that was going to take me forever. Cheers again.
http://www.vbip.com/wininet/index.asp