Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
?
Solved

Best approaches for reading and parsing from a webpage

Posted on 2006-05-16
6
Medium Priority
?
233 Views
Last Modified: 2010-04-07
Hi-

I've never done this before and don't have any idea where to begin.  There are some websites that have information on them that I want to read and parse and populate a database with.  I am familiar with VB6.

What are the concepts / approaches I should explore given my VB6 bias?
0
Comment
Question by:SAbboushi
4 Comments
 
LVL 20

Accepted Solution

by:
hes earned 1000 total points
ID: 16691751
Set a reverence to the Microsoft Internet Controls


Option Explicit
Dim comp2 As Boolean
Dim sHtml As String
Dim WithEvents Web1 As InternetExplorer

Private Sub Form_Load()
  Set Web1 = New InternetExplorer
  Web1.Visible = True
  Web1.Navigate "http://www.experts-exchange.com/

  comp2 = False
  ' wait until the page is fully loaded
  Do Until comp2 = True
    DoEvents
  Loop

  sHtml = Web1.Document.documentElement.innerHTML
'sHtml now contains the contents of the web page
' then just parse it to find what you want
End Sub
Private Sub Web1_DocumentComplete(ByVal pDisp As Object, URL As Variant)
  comp2 = True
End Sub
0
 
LVL 35

Assisted Solution

by:mvidas
mvidas earned 1000 total points
ID: 16692010
Just to give you a couple alternatives to using internet controls (though there really isn't anything wrong with it), I usually use msxml2 or the wininet api.  You could also use the IE object, though it is just about the same as using internet controls.


msxml:

Function GetWebPage(ByVal vWebSite As String) As String
 Dim oXMLHTTP As Object, vWebText As String, i As Long
 Set oXMLHTTP = CreateObject("msxml2.xmlhttp")
 oXMLHTTP.Open "GET", vWebSite, False
 oXMLHTTP.send
 If (oXMLHTTP.readyState = 4) And (oXMLHTTP.Status = 200) Then
  vWebText = oXMLHTTP.ResponseText
  vWebText = Replace(vWebText, """, Chr(34))
  vWebText = Replace(vWebText, "<", Chr(60))
  vWebText = Replace(vWebText, ">", Chr(62))
  vWebText = Replace(vWebText, "&", Chr(38))
  vWebText = Replace(vWebText, " ", Chr(32))
  For i = 1 To 255
   vWebText = Replace(vWebText, "&#" & i & ";", Chr(i))
  Next
 End If
 GetWebPage = vWebText
 Set oXMLHTTP = Nothing
End Function


API:

Private Const INTERNET_OPEN_TYPE_PRECONFIG = 0
Private Const INTERNET_OPEN_TYPE_DIRECT = 1
Private Const INTERNET_OPEN_TYPE_PROXY = 3
Private Const scUserAgent = "VB Project"
Private Const INTERNET_FLAG_RELOAD = &H80000000
Private Declare Function InternetOpen Lib "wininet.dll" Alias "InternetOpenA" (ByVal _
 sAgent As String, ByVal lAccessType As Long, ByVal sProxyName As String, ByVal _
 sProxyBypass As String, ByVal lFlags As Long) As Long
Private Declare Function InternetOpenUrl Lib "wininet.dll" Alias "InternetOpenUrlA" _
 (ByVal hOpen As Long, ByVal sUrl As String, ByVal sHeaders As String, ByVal lLength _
 As Long, ByVal lFlags As Long, ByVal lContext As Long) As Long
Private Declare Function InternetReadFile Lib "wininet.dll" (ByVal hFile As Long, ByVal _
 sBuffer As String, ByVal lNumBytesToRead As Long, lNumberOfBytesRead As Long) As Long
Private Declare Function InternetCloseHandle Lib "wininet.dll" (ByVal hInet As Long) _
 As Long
Private Declare Function URLDownloadToFile Lib "urlmon" Alias "URLDownloadToFileA" ( _
 ByVal pCaller As Long, ByVal szURL As String, ByVal szFilename As String, ByVal _
 dwReserved As Long, ByVal lpfnCB As Long) As Long
Function OpenURL(ByVal sUrl As String) As String
 Dim hOpen As Long, hOpenUrl As Long, lNumberOfBytesRead As Long, i As Long
 Dim bDoLoop As Boolean, bRet As Boolean
 Dim sReadBuffer As String * 2048, sBuffer As String
 hOpen = InternetOpen(scUserAgent, INTERNET_OPEN_TYPE_PRECONFIG, vbNullString, _
  vbNullString, 0)
 hOpenUrl = InternetOpenUrl(hOpen, sUrl, vbNullString, 0, INTERNET_FLAG_RELOAD, 0)
 bDoLoop = True
 While bDoLoop
  sReadBuffer = vbNullString
  bRet = InternetReadFile(hOpenUrl, sReadBuffer, Len(sReadBuffer), lNumberOfBytesRead)
  sBuffer = sBuffer & Left$(sReadBuffer, lNumberOfBytesRead)
  If Not CBool(lNumberOfBytesRead) Then bDoLoop = False
 Wend
   
 If hOpenUrl <> 0 Then InternetCloseHandle (hOpenUrl)
 If hOpen <> 0 Then InternetCloseHandle (hOpen)
 
  sBuffer = Replace(sBuffer, "&quot;", Chr(34))
  sBuffer = Replace(sBuffer, "&lt;", Chr(60))
  sBuffer = Replace(sBuffer, "&gt;", Chr(62))
  sBuffer = Replace(sBuffer, "&amp;", Chr(38))
  sBuffer = Replace(sBuffer, "&nbsp;", Chr(32))
  For i = 1 To 255
   sBuffer = Replace(sBuffer, "&#" & i & ";", Chr(i))
  Next
 
 OpenURL = sBuffer

End Function


IE:

Function GetWebIE(ByVal vWebSite As String) As String
 Dim IE As Object
 Set IE = CreateObject("internetexplorer.application")
 IE.Navigate2 vWebSite
 Do While IE.readyState <> 4 'READYSTATE_COMPLETE
  DoEvents
 Loop
 GetWebIE = IE.Document.Body.InnerHTML 'could also be .InnerText
 Set IE = Nothing
End Function


I would say to run some speed tests to see which works best for you and the site(s) you'll be parsing.
Matt
0
 
LVL 17

Expert Comment

by:inthedark
ID: 16699729
You have 2 problems to solve a) How to get the data, b) how to decode the response.

You already have some examples for problem "a" but here is a simple way. Need to set a component reference to Microsoft Internet Transfer Control, and then place it on a form.

Function GetWebPage(psURL As String) As String
On Error Resume Next
GetWebPage = Inet1.OpenURL(psURL)
End Function

For problem "b" I would use a binary method to decode the data, this can be very much faster than using string functions.  Numeric operations work many many times faster than string functions.

You decode function needs to be a little cute as you can have nested tags.  Here is an extract from a class I create to decode XML files into a sort of recordset object.

To convert from string to byte array

bytData = StrConv(psXML, vbFromUnicode)

The just loop throught the array looking for control characters like < > & etc. But set them up as numerics first

mlLT = Asc("<")
mlGT = Asc(">")

Select Case bytData(lCount)
    Case = mlLT
          ' handle lessthan
etc.


0
 

Author Comment

by:SAbboushi
ID: 16725408
Hi folks - thanks for the posts.  I will review and get back to you-
0

Featured Post

Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

If you have ever used Microsoft Word then you know that it has a good spell checker and it may have occurred to you that the ability to check spelling might be a nice piece of functionality to add to certain applications of yours. Well the code that…
You can of course define an array to hold data that is of a particular type like an array of Strings to hold customer names or an array of Doubles to hold customer sales, but what do you do if you want to coordinate that data? This article describes…
As developers, we are not limited to the functions provided by the VBA language. In addition, we can call the functions that are part of the Windows operating system. These functions are part of the Windows API (Application Programming Interface). U…
This lesson covers basic error handling code in Microsoft Excel using VBA. This is the first lesson in a 3-part series that uses code to loop through an Excel spreadsheet in VBA and then fix errors, taking advantage of error handling code. This l…

580 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question