Link to home
Start Free TrialLog in
Avatar of Fordraiders
FordraidersFlag for United States of America

asked on

Pulling data from webpage and placing in multiple columns and loop procedure

Excel 2010 vba:
Pulling data in from website and placing on sheet.

What I have:
Code that does a single pass on the html data and places it on the worksheet.

I have this url string in ColumnA on the worksheet.
http://www.grainger.com/search?searchQuery=CLAW HAMMER 16

Once the website is reached and results returned on the webpage.
The code will mak a single pass and place the data on the sheet. ColumnB


What I need:
I have in an array several metatags that i need placed on the sheet starting at column B to Column D


Problems:
I may have several results to return to the sheet.

So i need to loop through the html page and grab all necessary data.
and loop through the tags and place them on the sheet.


Example:
Right now with a sinple pass the result is returning.


Single Pass
Column A                                                                                                     Column B
http://www.grainger.com/search?searchQuery=CLAW HAMMER 16               4RY57


Make a Pass through ALL the html data:
Example:
Column

  B             C          D                                  E
6R252 STANLEY 51-616     Claw Hammer, 16 Oz, Polished, Hickory
6XV65 STANLEY 51-942     Rip Claw Hammer, 16 Oz, Smooth, Steel
 

etc.....

Option Explicit

Sub TestScreenScraping()
  Dim webpage As String
  Dim i As Long, j As Long
  Dim sFind As String, SKU As String
  Dim cel As Range, rg As Range
  Dim arr
  Dim TAG As String
  
  arr = Array("data-sku=""", "class=""productBrand"">", "class=""productMFR""> | Mfr. Model # <span class=""productInfoValueList"">", "class=""productLink"" title=""")

  
  With Worksheets("Sheet1")
    Set rg = .Range("A1")       'First URL to check
    If rg.Offset(1, 0) <> "" Then Set rg = Range(rg, rg.End(xlDown))    'All the URLs in that column
  End With
     
    
 ' =======================    MULTI PAGE RESULT   ===============================
  
    
       ' FINDS JUST A SKU FROM A MULTIPAGE RESULT
    sFind = "data-sku="""
   ' FINDS MFGNAME FROM A MULTIPAGE RESULT WITH GARBAGE
' sFind = "class=""productBrand"">"
 ' FINDS MFGNUMBER FROM A MULTIPAGE RESULT WITH GARBAGE
' sFind = "class=""productMFR""> | Mfr. Model # <span class=""productInfoValueList"">"
    ' FINDS DESCRIPTION FROM A MULTIPAGE RESULT
'     sFind = "class=""productLink"" title="""
    
    ' FINDS pagenumber FROM A MULTIPAGE RESULT
  '  sFind = "class=""productCatalog""> Catalog Page #"
    
    
    
    
 ' =======================    MULTI PAGE RESULT   ===============================
  For Each cel In rg.Cells
    If cel.Value <> "" Then
      webpage = ""
      i = 0
      j = 0
      SKU = ""
      webpage = GetWebpage(cel.Value)
      
      i = InStr(1, webpage, sFind)
      If i > 0 Then
        i = i + Len(sFind)
        j = InStr(i, webpage, """")
        If j > 0 Then
          SKU = Mid$(webpage, i, j - i)
        End If
      End If
    End If
    cel.Offset(0, 1).Value = SKU
  Next
End Sub
Function GetWebpage(url As String, Optional fileName As String) As String
 
Dim xml As Object ' MSXML2.XMLHTTP
Dim result As String
 
  Set xml = GetMSXML
 
  ' grab webpage
  With xml
    .Open "GET", url, False
    .send
  End With
 
  GetWebpage = xml.responseText
 
  ' write to file?
  If Len(fileName) > 0 Then
    If Not FileExists(fileName) Then
      Call CreateFile(fileName, GetWebpage)
    Else ' file exists
      If MsgBox("File already exists, overwrite?", vbYesNo) = vbYes Then
        Call CreateFile(fileName, GetWebpage)
      End If
    End If
  End If
 
End Function

Function FileExists(fileName As String) As Boolean
  FileExists = (Len(Dir(fileName)) > 0)
End Function

Function GetMSXML() As Object  '  MSXML2.XMLHTTP60
 On Error Resume Next
  Set GetMSXML = CreateObject("MSXML2.XMLHTTP.6.0")
End Function

Function CreateFile(fileName As String, contents As String) As String
' creates file from string contents
 
Dim tempFile As String
Dim nextFileNum As Long
 
  nextFileNum = FreeFile
  tempFile = fileName
 
  Open tempFile For Output As #nextFileNum
  Print #nextFileNum, contents
  Close #nextFileNum
 
  CreateFile = tempFile
End Function

Open in new window


Thanks
fordraiders
ASKER CERTIFIED SOLUTION
Avatar of aikimark
aikimark
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Fordraiders

ASKER

aikimark,
Should this be passing the url as a variable to the routine ?

set oMatches = oRE.Execute(GetWebpage) ?
set oMatches = oRE.Execute(GetWebpage)  getting error "Argument Not Optional"
ok, Nevermind.
I just did this...

Dim xml As Object ' MSXML2.XMLHTTP
Dim result As String
 
  Set xml = GetMSXML
 
  ' grab webpage
  With xml
    .Open "GET", url, False
    .send
  End With
 
  str = xml.responseText

Set oMatches = oRE.Execute(str)

Now getting it into a workbook or listbox.
I assume that GetWebpage will actually be GetWebpage(cel.Value) if executed outside of the parsing routine.
set oMatches = oRE.Execute(GetWebpage(cel.Value))

Open in new window


If done inside the GetWebPage() function, the statement will probably look like this:
set oMatches = oRE.Execute(xml.responseText)

Open in new window

ok...
Thanks


here is thew completed project...
adding items to listbox.
Function testEERegex()
Dim oRE
Dim oMatches
Dim oMatch
Dim Description, Mfg, GrangerID, MfgID
Dim str As String
Dim url
Dim xml As Object ' MSXML2.XMLHTTP
Dim result As String
Dim x As Integer





url = "http://www.grainger.com/search?searchQuery=RIP CLAW HAMMER 16"

Set oRE = CreateObject("vbscript.regexp")
oRE.Global = True
oRE.Pattern = "<a href.*>(.*?)</a></p><p class=""productBrand"">(.*?)</p>.*?<span class=""productInfoValueList"">(.*?)</span>.*?<span class=""productInfoValueList"">(.*?)</span>"

 
  Set xml = GetMSXML
 
  ' grab webpage
  With xml
    .Open "GET", url, False
    .send
  End With
 
  str = xml.responseText

Set oMatches = oRE.Execute(str)

Sheet1.ListBox1.Clear

x = 0
For Each oMatch In oMatches
    
    Description = Trim(oMatch.Submatches(0))
    Mfg = Trim(oMatch.Submatches(1))
    GrangerID = Trim(oMatch.Submatches(2))
    MfgID = Trim(oMatch.Submatches(3))
    
    ' now add the stuff to the listbox
    With Sheet1.ListBox1
    .ColumnCount = 4
    .ColumnWidths = "100;60;60;60"
    .AddItem
    .List(x, 0) = Description
    .List(x, 1) = Mfg
    .List(x, 2) = GrangerID
    .List(x, 3) = MfgID
    x = x + 1
End With

Open in new window

Beautiful ,


AND thanks very much !!
You might also populate some range in a (probably hidden) worksheet and use the listbox .Rowsource property to connect the listbox to the data.  Consider this change if you (your users) experience poor performance.
Yes, Thanks
its fairly quick.
But again...Thanks very much..
may have more questions later.

by the way:
did u piece this together or pull from the source code in the page ?

oRE.Pattern = "<a href.*>(.*?)</a></p><p class=""productBrand"">(.*?)</p>.*?<span class=""productInfoValueList"">(.*?)</span>.*?<span class=""productInfoValueList"">(.*?)</span>"
Since you are populating the columns in the same order as the regexp submatches, you might also eliminate the use of the variables.
Dim lngCol as Long    'new variable

With Sheet1.ListBox1
    .ColumnCount = 4
    .ColumnWidths = "100;60;60;60"
End With

For Each oMatch In oMatches
    ' now add the stuff to the listbox
    With Sheet1.ListBox1
       .AddItem
       For lngCol=0 to 3
          .List(x, lngCol) = Trim(oMatch.Submatches(lngCol))
       Next
       x = x + 1
    End With
Next

Open in new window

did u piece this together or pull from the source code in the page ?
I looked at the web page source (view source).  I copied a couple of the result sections to a tool that allows me to do regexp parsing with the vbscript.regexp object.

My first attempt just used a pattern that involved the productInfoValueList text.  When it returned two matches per item, it actually made my job easier.  I repeated the pattern and added capture groups for the other two items.

I tested with a pattern that began with ">(.*?)</a>".  It didn't parse correctly, so I it to begin with "<a href.*>(.*?)</a>".  I then wrote the vb code.  I used variables to make the code easier to understand.

===========
Don't forget about adding &perPage=100 to your URL string.
aikimark, Trying to add another tag.

<p class="productPrice"><span class="priceLabel">Price:</span> $39.10

oRE.Pattern = "<a href.*>(.*?)</a></p><p class=""productBrand"">(.*?)</p>.*?<span class=""productInfoValueList"">(.*?)</span>.*?<span class=""productInfoValueList"">(.*?)</span>(.*?)</span>.*?<span class=""priceLabel"">Price:>(.*?)</span> "


this is  not working.

Thanks
fordraiders
please open a new question and post a link to the new question in this thread.
@fordraiders

Have you opened a new question?  If so, where is the link to the new question?
View your question here: http:Q_28366259.html