Link to home
Start Free TrialLog in
Avatar of chocs666
chocs666

asked on

Retrieve text data from a website and insert into a recordset

Hi,

I'm trying to pull down some data from a website. I used to do it using a web query in excel, but I found that that wasn't very reliable. Does anyone know is there a way using vb to get this information into a recordset (or whatever) ?

Cheers.
Avatar of Amritpal Singh
Amritpal Singh
Flag of India image

if i got ur question right ,u can find ur answer at the following link
http://www.vbip.com/wininet/index.asp
Avatar of David Lee
Since you mention a web query I'm assuming that the data is in an HTML table on a web page.  If that's correct, then you can use the WebBrowser control to access the page, then parse the Document property of the WebBrowser to find the data, and finally insert it into your recordset.  Something like:

    Dim objDocument As HTMLDocument
    'Open the page
    WebBrowser1.Navigate2 "www.mypage.com"
    'Wait for the page to finish loading
    While WebBrowser1.ReadyState = READYSTATE_LOADING
    Wend
    'Grab a copy of the loaded page
    Set objDocument = WebBrowser1.Document

You can now parse objDocument, find the data, and load it into the recordset.  It'd makes this easy if the data is in a table and the table has a name.

Avatar of chocs666
chocs666

ASKER

BlueDevilFan,

I think that a webbrowser is the way to go for me because I am trying to get the information from a table on a website. Would you mind telling me how to go about parsing the objDocument to get the information from the table? Also I need to login to this website before I can get the info. What's the easiest way to do this. Finally, the while loop in your code seems to freeze as if it goes into an infinite loop.

Thanks for the help.
Okay, here you go.  This handles everything except the logging in portion.  How you handle that depends on whether you'll use this program interactively or whether it needs to run on its own.  If the former, then you can load the page and login just as you would if you were doing this from IE.  Once you've logged in and the page with the table has loaded, you'd click a button and it'd suck the data out of the table.  On the other hand, if this needs to run on its own, then you'd need to stuff the login information into the correct fields on the page, programmatically click the login button, wait for the subsequent page to load, and then fire the portion that gets the data.

You need to edit two items in the code below.  Replace "www.somesite.com" with the URL of the page with the data, and replace "MyTable" with the title of the table the data is in.  If the table doesn't have a title, then we have a problem.  We'll have to find a way to distinguish that one table from any other tables on the page.

Private Sub Command1_Click()
    Dim objDocument As Object, _
        objElements As Object, _
        objItem As Object, _
        objTable As Object, _
        objRows As Object, _
        objRow As Object, _
        objCols As Object, _
        objCol As Object
    'Load the web page
    Browser.Navigate2 "www.somesite.com"
    'Wait for the page to load
    While Browser.readyState <> READYSTATE_COMPLETE
        DoEvents
    Wend
    Set objDocument = Browser.Document
    Set objElements = objDocument.All
    For Each objItem In objElements
        If objItem.Title = "MyTable" Then
            Set objTable = objItem.childNodes.Item
            Set objRows = objTable.childNodes
            For Each objRow In objRows
                Set objCols = objRow.children
                For Each objCol In objCols
                    'This is where you'd get the data.  I put the Debug.Print in to demonstrate that it works.
                    Debug.Print objCol.innerText
                Next
            Next
            Exit For
        End If
    Next
    End
End Sub
I got the login part working as this just needs to be done once interactively after the program is opened.

The next part could be a problem as there dosen't seem to be a title for the table. Would it help that there is only one table on the page?
Yes, it'd help if there's only one table on the page.  Problem is most pages depend heavily on tables for formatting and there are lots of them on a page.  If there is truly just one table, then we can mod the code to find it.  If there's even one other table though, then we'll have to find some means of picking it out from the others.  
I'm almost sure that there is only one table. I could post some of the source code if that would help.
Yeah, if you could post the HTML for the page in question that'd help.
Here goes:

<br><script language="JavaScript">if(parent.main)if(parent.upper.frm){parent.upper.frm.subtitle.value = '- Market prices';}</script><html><head><title>Market prices</title><meta http-equiv="refresh" content="10"><link rel="stylesheet" type="text/css" href="style.aspx"><script language="JavaScript">function SectionOver(row){row.style.backgroundColor = '#AAAAAA';}
function SectionOut(row){row.style.backgroundColor = '';}</script><script language="JavaScript">var sstr = '';function selectKeyDown(s){var c, i;var x = false;if(event.keyCode==13){s.blur;return;}c = unescape('%' + event.keyCode.toString(16));sstr = sstr + c;for(i = 0;i<s.length;i++){if(s.options[i].text.slice(0,sstr.length).toLowerCase() == sstr){s.selectedIndex = i;x = true;break;}}if(!x){sstr='';s.value=sstr}}</script></head><body border=0 leftmargin=0 topmargin=0 bottommargin=0 rightmargin=0 marginheight=0 marginwidth=0><center><script language="JavaScript">function GroupJSRefreshPageGroup(){var pagename = location.pathname.substring(location.pathname.lastIndexOf("/")+1);
var selectedgroup = document.frm.groupid.selectedIndex;
var selectedgroupid = document.frm.groupid.options[selectedgroup].value;
if (location.href.indexOf('?') == -1){
      location.href = location.href + '?groupid=' + selectedgroupid;
}else{
      var newhref = location.href;
      if (location.href.indexOf('groupid') != -1){
            var newhref = location.href.substring(0, location.href.indexOf('groupid'));
            if (location.href.indexOf('&', location.href.indexOf('groupid') + 1) != -1){
                  newhref += location.href.substring(location.href.indexOf('&', location.href.indexOf('groupid') + 1), location.href.length);
            }
      }
      location.href = newhref + '&groupid=' + selectedgroupid;
}
}
</script><script language="JavaScript">function GroupJSRefreshPageSubgroup(groupid){
var pagename = location.pathname.substring(location.pathname.lastIndexOf("/")+1);
if (document.frm.subgroupid.type == "hidden"){
      var selectedsubgroupid = document.frm.subgroupid.value;
} else {
      var selectedsubgroup = document.frm.subgroupid.selectedIndex;
      var selectedsubgroupid = document.frm.subgroupid.options[selectedsubgroup].value;
}
if (location.href.indexOf('?') == -1){
      location.href = location.href + '?subgroupid=' + selectedsubgroupid + '&groupid=' + groupid
}else{
      var newhref = location.href;
      if (location.href.indexOf('subgroupid') != -1){
            var newhref = location.href.substring(0, location.href.indexOf('subgroupid'));
            if (location.href.indexOf('&', location.href.indexOf('subgroupid') + 1) != -1){
                  newhref += location.href.substring(location.href.indexOf('&', location.href.indexOf('subgroupid') + 1), location.href.length);
            }
      }
      location.href = newhref + '&subgroupid=' + selectedsubgroupid;
}
}
</script><form method="post" action="#" name=frm><table cellspacing=2 cellpadding=1 border=0><tr class="title"><td colspan="7">&nbsp;</td></tr><tr><td colspan="3"><b>&nbsp;</b></td><td align="left"><b>Group</b></td><td><select onkeypress="selectKeyDown(this);return false;" onblur="sstr=''" onchange="GroupJSRefreshPageGroup(2)" name="groupid"><option value="0" selected>-- select a group --</option><option value="2" selected>Currencies</option><option value="1">Indices - European</option><option value="9">Indices - US</option><option value="3">Irish Shares</option><option value="11">UK Shares</option><option value="5">US Shares</option><option value="13">Commodities</option><option value="4">Popular Markets</option></select></td><td align="left"><b>Subgroup</b></td><td>A-Z<input type="hidden" name="subgroupid" value="2"></td></tr><tr class="title"><td align="left"><b>Market</b></td><td colspan="2"><b>Live Prices</b></td><td colspan="2"><b>Reuters</b></td><td align="left"><b>High</b></td><td align="left"><b>Low</b></td></tr><tr class="title"><td align="left"><b>&nbsp;</b></td><td align="left"><b>Bid</b></td><td align="left"><b>Ask</b></td><td align="left"><b>Bid</b></td><td align="left"><b>Ask</b></td><td align="left"><b>&nbsp;</b></td><td align="left"><b>&nbsp;</b></td></tr><tr bgcolor="#CFCFCF" onmouseover="SectionOver(this);" onmouseOut="SectionOut(this);"><td><a onMouseOver="self.status = 'DOLLAR CHF DEC'; return true;" onmouseout="self.status=''" href="/admin/core/market_update.aspx?update=true&id=2752&subgroupid=2&groupid=2">DOLLAR CHF DEC</a></td><td align="right">12,466</td><td align="right">12,478</td><td align="right">1</td><td align="right">1</td><td align="right">12,494</td><td align="right">12,426</td></tr><tr bgcolor="#dddddd" onmouseover="SectionOver(this);" onmouseOut="SectionOut(this);"><td><a onMouseOver="self.status = 'DOLLAR CHF MAR'; return true;" onmouseout="self.status=''" href="/admin/core/market_update.aspx?update=true&id=2970&subgroupid=2&groupid=2">DOLLAR CHF MAR</a></td><td align="right">12,422</td><td align="right">12,434</td><td align="right">1</td><td align="right">1</td><td align="right">12,450</td><td align="right">12,381</td></tr><tr bgcolor="#CFCFCF" onmouseover="SectionOver(this);" onmouseOut="SectionOut(this);"><td><a onMouseOver="self.status = 'DOLLAR YEN DEC'; return true;" onmouseout="self.status=''" href="/admin/core/market_update.aspx?update=true&id=2749&subgroupid=2&groupid=2">DOLLAR YEN DEC</a></td><td align="right">10,997</td><td align="right">11,009</td><td align="right">110</td><td align="right">110</td><td align="right">11,026</td><td align="right">10,967</td></tr><tr bgcolor="#dddddd" onmouseover="SectionOver(this);" onmouseOut="SectionOut(this);"><td><a onMouseOver="self.status = 'DOLLAR YEN MAR'; return true;" onmouseout="self.status=''" href="/admin/core/market_update.aspx?update=true&id=2967&subgroupid=2&groupid=2">DOLLAR YEN MAR</a></td><td align="right">10,942</td><td align="right">10,954</td><td align="right">110</td><td align="right">110</td><td align="right">10,971</td><td align="right">10,912</td></tr><tr bgcolor="#CFCFCF" onmouseover="SectionOver(this);" onmouseOut="SectionOut(this);"><td><a onMouseOver="self.status = 'EURO CHF DEC'; return true;" onmouseout="self.status=''" href="/admin/core/market_update.aspx?update=true&id=2756&subgroupid=2&groupid=2">EURO CHF DEC</a></td><td align="right">15,467</td><td align="right">15,483</td><td align="right">2</td><td align="right">2</td><td align="right">15,486</td><td align="right">15,424</td></tr><tr bgcolor="#dddddd" onmouseover="SectionOver(this);" onmouseOut="SectionOut(this);"><td><a onMouseOver="self.status = 'EURO CHF MAR'; return true;" onmouseout="self.status=''" href="/admin/core/market_update.aspx?update=true&id=2974&subgroupid=2&groupid=2">EURO CHF MAR</a></td><td align="right">15,405</td><td align="right">15,421</td><td align="right">2</td><td align="right">2</td><td align="right">15,424</td><td align="right">15,362</td></tr><tr bgcolor="#CFCFCF" onmouseover="SectionOver(this);" onmouseOut="SectionOut(this);"><td><a onMouseOver="self.status = 'EURO DOLLAR DEC'; return true;" onmouseout="self.status=''" href="/admin/core/market_update.aspx?update=true&id=2748&subgroupid=2&groupid=2">EURO DOLLAR DEC</a></td><td align="right">12,397</td><td align="right">12,407</td><td align="right">1</td><td align="right">1</td><td align="right">12,426</td><td align="right">12,376</td></tr><tr bgcolor="#dddddd" onmouseover="SectionOver(this);" onmouseOut="SectionOut(this);"><td><a onMouseOver="self.status = 'EURO DOLLAR MAR'; return true;" onmouseout="self.status=''" href="/admin/core/market_update.aspx?update=true&id=2966&subgroupid=2&groupid=2">EURO DOLLAR MAR</a></td><td align="right">12,393</td><td align="right">12,403</td><td align="right">1</td><td align="right">1</td><td align="right">12,423</td><td align="right">12,373</td></tr><tr bgcolor="#CFCFCF" onmouseover="SectionOver(this);" onmouseOut="SectionOut(this);"><td><a onMouseOver="self.status = 'EURO YEN DEC'; return true;" onmouseout="self.status=''" href="/admin/core/market_update.aspx?update=true&id=2753&subgroupid=2&groupid=2">EURO YEN DEC</a></td><td align="right">13,637</td><td align="right">13,653</td><td align="right">137</td><td align="right">137</td><td align="right">13,674</td><td align="right">13,601</td></tr><tr bgcolor="#dddddd" onmouseover="SectionOver(this);" onmouseOut="SectionOut(this);"><td><a onMouseOver="self.status = 'EURO YEN MAR'; return true;" onmouseout="self.status=''" href="/admin/core/market_update.aspx?update=true&id=2971&subgroupid=2&groupid=2">EURO YEN MAR</a></td><td align="right">13,563</td><td align="right">13,579</td><td align="right">137</td><td align="right">137</td><td align="right">13,599</td><td align="right">13,526</td></tr><tr bgcolor="#CFCFCF" onmouseover="SectionOver(this);" onmouseOut="SectionOut(this);"><td><a onMouseOver="self.status = 'GBP CHF DEC'; return true;" onmouseout="self.status=''" href="/admin/core/market_update.aspx?update=true&id=2755&subgroupid=2&groupid=2">GBP CHF DEC</a></td><td align="right">22,288</td><td align="right">22,312</td><td align="right">2</td><td align="right">2</td><td align="right">22,419</td><td align="right">22,199</td></tr><tr bgcolor="#dddddd" onmouseover="SectionOver(this);" onmouseOut="SectionOut(this);"><td><a onMouseOver="self.status = 'GBP CHF MAR'; return true;" onmouseout="self.status=''" href="/admin/core/market_update.aspx?update=true&id=2973&subgroupid=2&groupid=2">GBP CHF MAR</a></td><td align="right">22,032</td><td align="right">22,056</td><td align="right">2</td><td align="right">2</td><td align="right">22,162</td><td align="right">21,944</td></tr><tr bgcolor="#CFCFCF" onmouseover="SectionOver(this);" onmouseOut="SectionOut(this);"><td><a onMouseOver="self.status = 'GBP DOLLAR DEC'; return true;" onmouseout="self.status=''" href="/admin/core/market_update.aspx?update=true&id=2750&subgroupid=2&groupid=2">GBP DOLLAR DEC</a></td><td align="right">17,872</td><td align="right">17,886</td><td align="right">2</td><td align="right">2</td><td align="right">18,003</td><td align="right">17,818</td></tr><tr bgcolor="#dddddd" onmouseover="SectionOver(this);" onmouseOut="SectionOut(this);"><td><a onMouseOver="self.status = 'GBP DOLLAR MAR'; return true;" onmouseout="self.status=''" href="/admin/core/market_update.aspx?update=true&id=2968&subgroupid=2&groupid=2">GBP DOLLAR MAR</a></td><td align="right">17,731</td><td align="right">17,745</td><td align="right">2</td><td align="right">2</td><td align="right">17,861</td><td align="right">17,678</td></tr><tr bgcolor="#CFCFCF" onmouseover="SectionOver(this);" onmouseOut="SectionOut(this);"><td><a onMouseOver="self.status = 'GBP EURO DEC'; return true;" onmouseout="self.status=''" href="/admin/core/market_update.aspx?update=true&id=2751&subgroupid=2&groupid=2">GBP EURO DEC</a></td><td align="right">14,414</td><td align="right">14,428</td><td align="right">1</td><td align="right">1</td><td align="right">14,514</td><td align="right">14,376</td></tr><tr bgcolor="#dddddd" onmouseover="SectionOver(this);" onmouseOut="SectionOut(this);"><td><a onMouseOver="self.status = 'GBP EURO MAR'; return true;" onmouseout="self.status=''" href="/admin/core/market_update.aspx?update=true&id=2969&subgroupid=2&groupid=2">GBP EURO MAR</a></td><td align="right">14,302</td><td align="right">14,316</td><td align="right">1</td><td align="right">1</td><td align="right">14,401</td><td align="right">14,264</td></tr><tr bgcolor="#CFCFCF" onmouseover="SectionOver(this);" onmouseOut="SectionOut(this);"><td><a onMouseOver="self.status = 'GBP YEN DEC'; return true;" onmouseout="self.status=''" href="/admin/core/market_update.aspx?update=true&id=2754&subgroupid=2&groupid=2">GBP YEN DEC</a></td><td align="right">19,658</td><td align="right">19,682</td><td align="right">199</td><td align="right">199</td><td align="right">19,792</td><td align="right">19,566</td></tr><tr bgcolor="#dddddd" onmouseover="SectionOver(this);" onmouseOut="SectionOut(this);"><td><a onMouseOver="self.status = 'GBP YEN MAR'; return true;" onmouseout="self.status=''" href="/admin/core/market_update.aspx?update=true&id=2972&subgroupid=2&groupid=2">GBP YEN MAR</a></td><td align="right">19,402</td><td align="right">19,426</td><td align="right">199</td><td align="right">199</td><td align="right">19,535</td><td align="right">19,312</td></tr><tr class="title"><td colspan="7"><b>&nbsp;</b></td></tr></table><br>[<a onMouseOver="self.status = 'Add group'; return true;" onmouseout="self.status=''" href="/admin/core/group_update.aspx">Add group</a>] - [<a onMouseOver="self.status = 'Add market'; return true;" onmouseout="self.status=''" href="/admin/core/market_update.aspx">Add market</a>] - [<a onMouseOver="self.status = 'Find markets'; return true;" onmouseout="self.status=''" href="/admin/core/market_find.aspx">Find markets</a>] - [<a onMouseOver="self.status = 'Add market sector'; return true;" onmouseout="self.status=''" href="/admin/core/marketsector_update.aspx">Add market sector</a>] - [<a onMouseOver="self.status = 'List market sector'; return true;" onmouseout="self.status=''" href="/admin/core/marketsector_list.aspx">List market sector</a>] - [<a onMouseOver="self.status = 'List market errors'; return true;" onmouseout="self.status=''" href="/admin/core/market_error_list.aspx">List market errors</a>] - [<a onMouseOver="self.status = 'List expired markets'; return true;" onmouseout="self.status=''" href="/admin/core/market_expired_list.aspx">List expired markets</a>] - [<a onMouseOver="self.status = 'Manage groups'; return true;" onmouseout="self.status=''" href="/admin/core/group_management.aspx">Manage groups</a>]</form><script language="JavaScript">var element, tmpElement;for(var i=0; i < document.forms[0].elements.length; i++){tmpElement = document.forms[0].elements[i];if((tmpElement.type != 'hidden') && (tmpElement.disabled != true) && (tmpElement.readOnly != true)){tmpElement.focus(); break;}}</script></center><br></body></html>
ASKER CERTIFIED SOLUTION
Avatar of David Lee
David Lee
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
That worked perfectly! I was trying to read in the page as text and parse it using left() and right() functions but that was going to take me forever. Cheers again.