Solved

Retrieve text data from a website and insert into a recordset

Posted on 2004-09-28
11
146 Views
Last Modified: 2010-05-02
Hi,

I'm trying to pull down some data from a website. I used to do it using a web query in excel, but I found that that wasn't very reliable. Does anyone know is there a way using vb to get this information into a recordset (or whatever) ?

Cheers.
0
Comment
Question by:chocs666
  • 5
  • 5
11 Comments
 
LVL 6

Expert Comment

by:Amritpal Singh
ID: 12169187
if i got ur question right ,u can find ur answer at the following link
http://www.vbip.com/wininet/index.asp
0
 
LVL 76

Expert Comment

by:David Lee
ID: 12170868
Since you mention a web query I'm assuming that the data is in an HTML table on a web page.  If that's correct, then you can use the WebBrowser control to access the page, then parse the Document property of the WebBrowser to find the data, and finally insert it into your recordset.  Something like:

    Dim objDocument As HTMLDocument
    'Open the page
    WebBrowser1.Navigate2 "www.mypage.com"
    'Wait for the page to finish loading
    While WebBrowser1.ReadyState = READYSTATE_LOADING
    Wend
    'Grab a copy of the loaded page
    Set objDocument = WebBrowser1.Document

You can now parse objDocument, find the data, and load it into the recordset.  It'd makes this easy if the data is in a table and the table has a name.

0
 

Author Comment

by:chocs666
ID: 12174753
BlueDevilFan,

I think that a webbrowser is the way to go for me because I am trying to get the information from a table on a website. Would you mind telling me how to go about parsing the objDocument to get the information from the table? Also I need to login to this website before I can get the info. What's the easiest way to do this. Finally, the while loop in your code seems to freeze as if it goes into an infinite loop.

Thanks for the help.
0
 
LVL 76

Expert Comment

by:David Lee
ID: 12175943
Okay, here you go.  This handles everything except the logging in portion.  How you handle that depends on whether you'll use this program interactively or whether it needs to run on its own.  If the former, then you can load the page and login just as you would if you were doing this from IE.  Once you've logged in and the page with the table has loaded, you'd click a button and it'd suck the data out of the table.  On the other hand, if this needs to run on its own, then you'd need to stuff the login information into the correct fields on the page, programmatically click the login button, wait for the subsequent page to load, and then fire the portion that gets the data.

You need to edit two items in the code below.  Replace "www.somesite.com" with the URL of the page with the data, and replace "MyTable" with the title of the table the data is in.  If the table doesn't have a title, then we have a problem.  We'll have to find a way to distinguish that one table from any other tables on the page.

Private Sub Command1_Click()
    Dim objDocument As Object, _
        objElements As Object, _
        objItem As Object, _
        objTable As Object, _
        objRows As Object, _
        objRow As Object, _
        objCols As Object, _
        objCol As Object
    'Load the web page
    Browser.Navigate2 "www.somesite.com"
    'Wait for the page to load
    While Browser.readyState <> READYSTATE_COMPLETE
        DoEvents
    Wend
    Set objDocument = Browser.Document
    Set objElements = objDocument.All
    For Each objItem In objElements
        If objItem.Title = "MyTable" Then
            Set objTable = objItem.childNodes.Item
            Set objRows = objTable.childNodes
            For Each objRow In objRows
                Set objCols = objRow.children
                For Each objCol In objCols
                    'This is where you'd get the data.  I put the Debug.Print in to demonstrate that it works.
                    Debug.Print objCol.innerText
                Next
            Next
            Exit For
        End If
    Next
    End
End Sub
0
 

Author Comment

by:chocs666
ID: 12206711
I got the login part working as this just needs to be done once interactively after the program is opened.

The next part could be a problem as there dosen't seem to be a title for the table. Would it help that there is only one table on the page?
0
What Should I Do With This Threat Intelligence?

Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

 
LVL 76

Expert Comment

by:David Lee
ID: 12206871
Yes, it'd help if there's only one table on the page.  Problem is most pages depend heavily on tables for formatting and there are lots of them on a page.  If there is truly just one table, then we can mod the code to find it.  If there's even one other table though, then we'll have to find some means of picking it out from the others.  
0
 

Author Comment

by:chocs666
ID: 12207850
I'm almost sure that there is only one table. I could post some of the source code if that would help.
0
 
LVL 76

Expert Comment

by:David Lee
ID: 12208143
Yeah, if you could post the HTML for the page in question that'd help.
0
 

Author Comment

by:chocs666
ID: 12208526
Here goes:

<br><script language="JavaScript">if(parent.main)if(parent.upper.frm){parent.upper.frm.subtitle.value = '- Market prices';}</script><html><head><title>Market prices</title><meta http-equiv="refresh" content="10"><link rel="stylesheet" type="text/css" href="style.aspx"><script language="JavaScript">function SectionOver(row){row.style.backgroundColor = '#AAAAAA';}
function SectionOut(row){row.style.backgroundColor = '';}</script><script language="JavaScript">var sstr = '';function selectKeyDown(s){var c, i;var x = false;if(event.keyCode==13){s.blur;return;}c = unescape('%' + event.keyCode.toString(16));sstr = sstr + c;for(i = 0;i<s.length;i++){if(s.options[i].text.slice(0,sstr.length).toLowerCase() == sstr){s.selectedIndex = i;x = true;break;}}if(!x){sstr='';s.value=sstr}}</script></head><body border=0 leftmargin=0 topmargin=0 bottommargin=0 rightmargin=0 marginheight=0 marginwidth=0><center><script language="JavaScript">function GroupJSRefreshPageGroup(){var pagename = location.pathname.substring(location.pathname.lastIndexOf("/")+1);
var selectedgroup = document.frm.groupid.selectedIndex;
var selectedgroupid = document.frm.groupid.options[selectedgroup].value;
if (location.href.indexOf('?') == -1){
      location.href = location.href + '?groupid=' + selectedgroupid;
}else{
      var newhref = location.href;
      if (location.href.indexOf('groupid') != -1){
            var newhref = location.href.substring(0, location.href.indexOf('groupid'));
            if (location.href.indexOf('&', location.href.indexOf('groupid') + 1) != -1){
                  newhref += location.href.substring(location.href.indexOf('&', location.href.indexOf('groupid') + 1), location.href.length);
            }
      }
      location.href = newhref + '&groupid=' + selectedgroupid;
}
}
</script><script language="JavaScript">function GroupJSRefreshPageSubgroup(groupid){
var pagename = location.pathname.substring(location.pathname.lastIndexOf("/")+1);
if (document.frm.subgroupid.type == "hidden"){
      var selectedsubgroupid = document.frm.subgroupid.value;
} else {
      var selectedsubgroup = document.frm.subgroupid.selectedIndex;
      var selectedsubgroupid = document.frm.subgroupid.options[selectedsubgroup].value;
}
if (location.href.indexOf('?') == -1){
      location.href = location.href + '?subgroupid=' + selectedsubgroupid + '&groupid=' + groupid
}else{
      var newhref = location.href;
      if (location.href.indexOf('subgroupid') != -1){
            var newhref = location.href.substring(0, location.href.indexOf('subgroupid'));
            if (location.href.indexOf('&', location.href.indexOf('subgroupid') + 1) != -1){
                  newhref += location.href.substring(location.href.indexOf('&', location.href.indexOf('subgroupid') + 1), location.href.length);
            }
      }
      location.href = newhref + '&subgroupid=' + selectedsubgroupid;
}
}
</script><form method="post" action="#" name=frm><table cellspacing=2 cellpadding=1 border=0><tr class="title"><td colspan="7">&nbsp;</td></tr><tr><td colspan="3"><b>&nbsp;</b></td><td align="left"><b>Group</b></td><td><select onkeypress="selectKeyDown(this);return false;" onblur="sstr=''" onchange="GroupJSRefreshPageGroup(2)" name="groupid"><option value="0" selected>-- select a group --</option><option value="2" selected>Currencies</option><option value="1">Indices - European</option><option value="9">Indices - US</option><option value="3">Irish Shares</option><option value="11">UK Shares</option><option value="5">US Shares</option><option value="13">Commodities</option><option value="4">Popular Markets</option></select></td><td align="left"><b>Subgroup</b></td><td>A-Z<input type="hidden" name="subgroupid" value="2"></td></tr><tr class="title"><td align="left"><b>Market</b></td><td colspan="2"><b>Live Prices</b></td><td colspan="2"><b>Reuters</b></td><td align="left"><b>High</b></td><td align="left"><b>Low</b></td></tr><tr class="title"><td align="left"><b>&nbsp;</b></td><td align="left"><b>Bid</b></td><td align="left"><b>Ask</b></td><td align="left"><b>Bid</b></td><td align="left"><b>Ask</b></td><td align="left"><b>&nbsp;</b></td><td align="left"><b>&nbsp;</b></td></tr><tr bgcolor="#CFCFCF" onmouseover="SectionOver(this);" onmouseOut="SectionOut(this);"><td><a onMouseOver="self.status = 'DOLLAR CHF DEC'; return true;" onmouseout="self.status=''" href="/admin/core/market_update.aspx?update=true&id=2752&subgroupid=2&groupid=2">DOLLAR CHF DEC</a></td><td align="right">12,466</td><td align="right">12,478</td><td align="right">1</td><td align="right">1</td><td align="right">12,494</td><td align="right">12,426</td></tr><tr bgcolor="#dddddd" onmouseover="SectionOver(this);" onmouseOut="SectionOut(this);"><td><a onMouseOver="self.status = 'DOLLAR CHF MAR'; return true;" onmouseout="self.status=''" href="/admin/core/market_update.aspx?update=true&id=2970&subgroupid=2&groupid=2">DOLLAR CHF MAR</a></td><td align="right">12,422</td><td align="right">12,434</td><td align="right">1</td><td align="right">1</td><td align="right">12,450</td><td align="right">12,381</td></tr><tr bgcolor="#CFCFCF" onmouseover="SectionOver(this);" onmouseOut="SectionOut(this);"><td><a onMouseOver="self.status = 'DOLLAR YEN DEC'; return true;" onmouseout="self.status=''" href="/admin/core/market_update.aspx?update=true&id=2749&subgroupid=2&groupid=2">DOLLAR YEN DEC</a></td><td align="right">10,997</td><td align="right">11,009</td><td align="right">110</td><td align="right">110</td><td align="right">11,026</td><td align="right">10,967</td></tr><tr bgcolor="#dddddd" onmouseover="SectionOver(this);" onmouseOut="SectionOut(this);"><td><a onMouseOver="self.status = 'DOLLAR YEN MAR'; return true;" onmouseout="self.status=''" href="/admin/core/market_update.aspx?update=true&id=2967&subgroupid=2&groupid=2">DOLLAR YEN MAR</a></td><td align="right">10,942</td><td align="right">10,954</td><td align="right">110</td><td align="right">110</td><td align="right">10,971</td><td align="right">10,912</td></tr><tr bgcolor="#CFCFCF" onmouseover="SectionOver(this);" onmouseOut="SectionOut(this);"><td><a onMouseOver="self.status = 'EURO CHF DEC'; return true;" onmouseout="self.status=''" href="/admin/core/market_update.aspx?update=true&id=2756&subgroupid=2&groupid=2">EURO CHF DEC</a></td><td align="right">15,467</td><td align="right">15,483</td><td align="right">2</td><td align="right">2</td><td align="right">15,486</td><td align="right">15,424</td></tr><tr bgcolor="#dddddd" onmouseover="SectionOver(this);" onmouseOut="SectionOut(this);"><td><a onMouseOver="self.status = 'EURO CHF MAR'; return true;" onmouseout="self.status=''" href="/admin/core/market_update.aspx?update=true&id=2974&subgroupid=2&groupid=2">EURO CHF MAR</a></td><td align="right">15,405</td><td align="right">15,421</td><td align="right">2</td><td align="right">2</td><td align="right">15,424</td><td align="right">15,362</td></tr><tr bgcolor="#CFCFCF" onmouseover="SectionOver(this);" onmouseOut="SectionOut(this);"><td><a onMouseOver="self.status = 'EURO DOLLAR DEC'; return true;" onmouseout="self.status=''" href="/admin/core/market_update.aspx?update=true&id=2748&subgroupid=2&groupid=2">EURO DOLLAR DEC</a></td><td align="right">12,397</td><td align="right">12,407</td><td align="right">1</td><td align="right">1</td><td align="right">12,426</td><td align="right">12,376</td></tr><tr bgcolor="#dddddd" onmouseover="SectionOver(this);" onmouseOut="SectionOut(this);"><td><a onMouseOver="self.status = 'EURO DOLLAR MAR'; return true;" onmouseout="self.status=''" href="/admin/core/market_update.aspx?update=true&id=2966&subgroupid=2&groupid=2">EURO DOLLAR MAR</a></td><td align="right">12,393</td><td align="right">12,403</td><td align="right">1</td><td align="right">1</td><td align="right">12,423</td><td align="right">12,373</td></tr><tr bgcolor="#CFCFCF" onmouseover="SectionOver(this);" onmouseOut="SectionOut(this);"><td><a onMouseOver="self.status = 'EURO YEN DEC'; return true;" onmouseout="self.status=''" href="/admin/core/market_update.aspx?update=true&id=2753&subgroupid=2&groupid=2">EURO YEN DEC</a></td><td align="right">13,637</td><td align="right">13,653</td><td align="right">137</td><td align="right">137</td><td align="right">13,674</td><td align="right">13,601</td></tr><tr bgcolor="#dddddd" onmouseover="SectionOver(this);" onmouseOut="SectionOut(this);"><td><a onMouseOver="self.status = 'EURO YEN MAR'; return true;" onmouseout="self.status=''" href="/admin/core/market_update.aspx?update=true&id=2971&subgroupid=2&groupid=2">EURO YEN MAR</a></td><td align="right">13,563</td><td align="right">13,579</td><td align="right">137</td><td align="right">137</td><td align="right">13,599</td><td align="right">13,526</td></tr><tr bgcolor="#CFCFCF" onmouseover="SectionOver(this);" onmouseOut="SectionOut(this);"><td><a onMouseOver="self.status = 'GBP CHF DEC'; return true;" onmouseout="self.status=''" href="/admin/core/market_update.aspx?update=true&id=2755&subgroupid=2&groupid=2">GBP CHF DEC</a></td><td align="right">22,288</td><td align="right">22,312</td><td align="right">2</td><td align="right">2</td><td align="right">22,419</td><td align="right">22,199</td></tr><tr bgcolor="#dddddd" onmouseover="SectionOver(this);" onmouseOut="SectionOut(this);"><td><a onMouseOver="self.status = 'GBP CHF MAR'; return true;" onmouseout="self.status=''" href="/admin/core/market_update.aspx?update=true&id=2973&subgroupid=2&groupid=2">GBP CHF MAR</a></td><td align="right">22,032</td><td align="right">22,056</td><td align="right">2</td><td align="right">2</td><td align="right">22,162</td><td align="right">21,944</td></tr><tr bgcolor="#CFCFCF" onmouseover="SectionOver(this);" onmouseOut="SectionOut(this);"><td><a onMouseOver="self.status = 'GBP DOLLAR DEC'; return true;" onmouseout="self.status=''" href="/admin/core/market_update.aspx?update=true&id=2750&subgroupid=2&groupid=2">GBP DOLLAR DEC</a></td><td align="right">17,872</td><td align="right">17,886</td><td align="right">2</td><td align="right">2</td><td align="right">18,003</td><td align="right">17,818</td></tr><tr bgcolor="#dddddd" onmouseover="SectionOver(this);" onmouseOut="SectionOut(this);"><td><a onMouseOver="self.status = 'GBP DOLLAR MAR'; return true;" onmouseout="self.status=''" href="/admin/core/market_update.aspx?update=true&id=2968&subgroupid=2&groupid=2">GBP DOLLAR MAR</a></td><td align="right">17,731</td><td align="right">17,745</td><td align="right">2</td><td align="right">2</td><td align="right">17,861</td><td align="right">17,678</td></tr><tr bgcolor="#CFCFCF" onmouseover="SectionOver(this);" onmouseOut="SectionOut(this);"><td><a onMouseOver="self.status = 'GBP EURO DEC'; return true;" onmouseout="self.status=''" href="/admin/core/market_update.aspx?update=true&id=2751&subgroupid=2&groupid=2">GBP EURO DEC</a></td><td align="right">14,414</td><td align="right">14,428</td><td align="right">1</td><td align="right">1</td><td align="right">14,514</td><td align="right">14,376</td></tr><tr bgcolor="#dddddd" onmouseover="SectionOver(this);" onmouseOut="SectionOut(this);"><td><a onMouseOver="self.status = 'GBP EURO MAR'; return true;" onmouseout="self.status=''" href="/admin/core/market_update.aspx?update=true&id=2969&subgroupid=2&groupid=2">GBP EURO MAR</a></td><td align="right">14,302</td><td align="right">14,316</td><td align="right">1</td><td align="right">1</td><td align="right">14,401</td><td align="right">14,264</td></tr><tr bgcolor="#CFCFCF" onmouseover="SectionOver(this);" onmouseOut="SectionOut(this);"><td><a onMouseOver="self.status = 'GBP YEN DEC'; return true;" onmouseout="self.status=''" href="/admin/core/market_update.aspx?update=true&id=2754&subgroupid=2&groupid=2">GBP YEN DEC</a></td><td align="right">19,658</td><td align="right">19,682</td><td align="right">199</td><td align="right">199</td><td align="right">19,792</td><td align="right">19,566</td></tr><tr bgcolor="#dddddd" onmouseover="SectionOver(this);" onmouseOut="SectionOut(this);"><td><a onMouseOver="self.status = 'GBP YEN MAR'; return true;" onmouseout="self.status=''" href="/admin/core/market_update.aspx?update=true&id=2972&subgroupid=2&groupid=2">GBP YEN MAR</a></td><td align="right">19,402</td><td align="right">19,426</td><td align="right">199</td><td align="right">199</td><td align="right">19,535</td><td align="right">19,312</td></tr><tr class="title"><td colspan="7"><b>&nbsp;</b></td></tr></table><br>[<a onMouseOver="self.status = 'Add group'; return true;" onmouseout="self.status=''" href="/admin/core/group_update.aspx">Add group</a>] - [<a onMouseOver="self.status = 'Add market'; return true;" onmouseout="self.status=''" href="/admin/core/market_update.aspx">Add market</a>] - [<a onMouseOver="self.status = 'Find markets'; return true;" onmouseout="self.status=''" href="/admin/core/market_find.aspx">Find markets</a>] - [<a onMouseOver="self.status = 'Add market sector'; return true;" onmouseout="self.status=''" href="/admin/core/marketsector_update.aspx">Add market sector</a>] - [<a onMouseOver="self.status = 'List market sector'; return true;" onmouseout="self.status=''" href="/admin/core/marketsector_list.aspx">List market sector</a>] - [<a onMouseOver="self.status = 'List market errors'; return true;" onmouseout="self.status=''" href="/admin/core/market_error_list.aspx">List market errors</a>] - [<a onMouseOver="self.status = 'List expired markets'; return true;" onmouseout="self.status=''" href="/admin/core/market_expired_list.aspx">List expired markets</a>] - [<a onMouseOver="self.status = 'Manage groups'; return true;" onmouseout="self.status=''" href="/admin/core/group_management.aspx">Manage groups</a>]</form><script language="JavaScript">var element, tmpElement;for(var i=0; i < document.forms[0].elements.length; i++){tmpElement = document.forms[0].elements[i];if((tmpElement.type != 'hidden') && (tmpElement.disabled != true) && (tmpElement.readOnly != true)){tmpElement.focus(); break;}}</script></center><br></body></html>
0
 
LVL 76

Accepted Solution

by:
David Lee earned 125 total points
ID: 12232850
I've modified the code to parse the HTML you supplied above.  The data you want from the table begins in the 5th row.  I set the code to skip over the first contents of rows 1-4.  Obviously if the table layout changes, then the code would have to change to accomodate the difference.  Otherwise, it'd parse incorrectly.  The code also presumes that the table with the data is the first table on the page.  There can be other tables after it and the code will still work okay.  But if there are any tables before it, then the data won't be parsed out as expected.  I don't know what you want to do with the data, so the code below employs a pair of Debug.Print statements to display the parsed cells and space them out.  This is for demonstration only.  The Debug.Prints should both be removed or commented out from any production use of this code.

Cheers!


Private Sub Command1_Click()
    Dim objDocument As Object, _
        objElements As Object, _
        objItem As Object, _
        objTable As Object, _
        objRows As Object, _
        objRow As Object, _
        objCols As Object, _
        objCol As Object, _
        intRow As Integer
    Browser.Navigate2 "www.somesite.com"
    While Browser.readyState <> READYSTATE_COMPLETE
        DoEvents
    Wend
    Set objDocument = Browser.document
    Set objElements = objDocument.All
    For Each objItem In objElements
        If LCase(objItem.nodeName) = "table" Then
            Set objTable = objItem.childNodes.Item
            Set objRows = objTable.childNodes
            For Each objRow In objRows
                intRow = intRow + 1
                Set objCols = objRow.children
                For Each objCol In objCols
                    If intRow >= 5 Then
                        'Insert commands here to do something with the data in each cell
                        'The Debug.Print below is for demonstration purposes only.
                        Debug.Print objCol.innerText
                    End If
                Next
                'The Debug.Print below is used to space out the output for demo purposes.
                'Remove it from the production version.
                Debug.Print ""
            Next
            Exit For
        End If
    Next
    End
End Sub
0
 

Author Comment

by:chocs666
ID: 12252372
That worked perfectly! I was trying to read in the page as text and parse it using left() and right() functions but that was going to take me forever. Cheers again.
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Introduction I needed to skip over some file processing within a For...Next loop in some old production code and wished that VB (classic) had a statement that would drop down to the end of the current iteration, bypassing the statements that were c…
When trying to find the cause of a problem in VBA or VB6 it's often valuable to know what procedures were executed prior to the error. You can use the Call Stack for that but it is often inadequate because it may show procedures you aren't intereste…
Get people started with the process of using Access VBA to control Outlook using automation, Microsoft Access can control other applications. An example is the ability to programmatically talk to Microsoft Outlook. Using automation, an Access applic…
Get people started with the process of using Access VBA to control Excel using automation, Microsoft Access can control other applications. An example is the ability to programmatically talk to Excel. Using automation, an Access application can laun…

757 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now