Solved

Pulling data from webpage and placing in multiple columns and loop procedure

Posted on 2014-02-13
15
486 Views
Last Modified: 2014-02-16
Excel 2010 vba:
Pulling data in from website and placing on sheet.

What I have:
Code that does a single pass on the html data and places it on the worksheet.

I have this url string in ColumnA on the worksheet.
http://www.grainger.com/search?searchQuery=CLAW HAMMER 16

Once the website is reached and results returned on the webpage.
The code will mak a single pass and place the data on the sheet. ColumnB


What I need:
I have in an array several metatags that i need placed on the sheet starting at column B to Column D


Problems:
I may have several results to return to the sheet.

So i need to loop through the html page and grab all necessary data.
and loop through the tags and place them on the sheet.


Example:
Right now with a sinple pass the result is returning.


Single Pass
Column A                                                                                                     Column B
http://www.grainger.com/search?searchQuery=CLAW HAMMER 16               4RY57


Make a Pass through ALL the html data:
Example:
Column

  B             C          D                                  E
6R252 STANLEY 51-616     Claw Hammer, 16 Oz, Polished, Hickory
6XV65 STANLEY 51-942     Rip Claw Hammer, 16 Oz, Smooth, Steel
 

etc.....

Option Explicit

Sub TestScreenScraping()
  Dim webpage As String
  Dim i As Long, j As Long
  Dim sFind As String, SKU As String
  Dim cel As Range, rg As Range
  Dim arr
  Dim TAG As String
  
  arr = Array("data-sku=""", "class=""productBrand"">", "class=""productMFR""> | Mfr. Model # <span class=""productInfoValueList"">", "class=""productLink"" title=""")

  
  With Worksheets("Sheet1")
    Set rg = .Range("A1")       'First URL to check
    If rg.Offset(1, 0) <> "" Then Set rg = Range(rg, rg.End(xlDown))    'All the URLs in that column
  End With
     
    
 ' =======================    MULTI PAGE RESULT   ===============================
  
    
       ' FINDS JUST A SKU FROM A MULTIPAGE RESULT
    sFind = "data-sku="""
   ' FINDS MFGNAME FROM A MULTIPAGE RESULT WITH GARBAGE
' sFind = "class=""productBrand"">"
 ' FINDS MFGNUMBER FROM A MULTIPAGE RESULT WITH GARBAGE
' sFind = "class=""productMFR""> | Mfr. Model # <span class=""productInfoValueList"">"
    ' FINDS DESCRIPTION FROM A MULTIPAGE RESULT
'     sFind = "class=""productLink"" title="""
    
    ' FINDS pagenumber FROM A MULTIPAGE RESULT
  '  sFind = "class=""productCatalog""> Catalog Page #"
    
    
    
    
 ' =======================    MULTI PAGE RESULT   ===============================
  For Each cel In rg.Cells
    If cel.Value <> "" Then
      webpage = ""
      i = 0
      j = 0
      SKU = ""
      webpage = GetWebpage(cel.Value)
      
      i = InStr(1, webpage, sFind)
      If i > 0 Then
        i = i + Len(sFind)
        j = InStr(i, webpage, """")
        If j > 0 Then
          SKU = Mid$(webpage, i, j - i)
        End If
      End If
    End If
    cel.Offset(0, 1).Value = SKU
  Next
End Sub
Function GetWebpage(url As String, Optional fileName As String) As String
 
Dim xml As Object ' MSXML2.XMLHTTP
Dim result As String
 
  Set xml = GetMSXML
 
  ' grab webpage
  With xml
    .Open "GET", url, False
    .send
  End With
 
  GetWebpage = xml.responseText
 
  ' write to file?
  If Len(fileName) > 0 Then
    If Not FileExists(fileName) Then
      Call CreateFile(fileName, GetWebpage)
    Else ' file exists
      If MsgBox("File already exists, overwrite?", vbYesNo) = vbYes Then
        Call CreateFile(fileName, GetWebpage)
      End If
    End If
  End If
 
End Function

Function FileExists(fileName As String) As Boolean
  FileExists = (Len(Dir(fileName)) > 0)
End Function

Function GetMSXML() As Object  '  MSXML2.XMLHTTP60
 On Error Resume Next
  Set GetMSXML = CreateObject("MSXML2.XMLHTTP.6.0")
End Function

Function CreateFile(fileName As String, contents As String) As String
' creates file from string contents
 
Dim tempFile As String
Dim nextFileNum As Long
 
  nextFileNum = FreeFile
  tempFile = fileName
 
  Open tempFile For Output As #nextFileNum
  Print #nextFileNum, contents
  Close #nextFileNum
 
  CreateFile = tempFile
End Function

Open in new window


Thanks
fordraiders
0
Comment
Question by:fordraiders
  • 8
  • 7
15 Comments
 
LVL 45

Accepted Solution

by:
aikimark earned 500 total points
ID: 39859219
1. You can improve your performance if you add "&perPage=100" to your URL string.  This will minimize the number of web pages that must be iterated. (with the &requestedPage=# part of the URL)

2. Your parsing code will be much simpler if you use regular expressions.
dim oRE
dim oMatches
dim oMatch
Dim Description , Mfg , GrangerID , MfgID
set oRE = CreateObject("vbscript.regexp")
oRE.Global = True
oRE.Pattern = "<a href.*>(.*?)</a></p><p class="productBrand">(.*?)</p>.*?<span class="productInfoValueList">(.*?)</span>.*?<span class="productInfoValueList">(.*?)</span>"
set oMatches = oRE.Execute(GetWebpage)
For Each oMatch In oMatches
    Description = Trim(oMatch.Submatches(0))
    Mfg = Trim(oMatch.Submatches(1))
    GrangerID = Trim(oMatch.Submatches(2))
    MfgID = Trim(oMatch.Submatches(3))
    'move the data to your workbook
Next

Open in new window

3. If you transfer the data from the submatches collection into an array, you will be able to push the data into your worksheet with one statement.  See my "Fast Data Push to Excel" article: http://www.experts-exchange.com/A_2253.html
0
 
LVL 3

Author Comment

by:fordraiders
ID: 39859757
aikimark,
Should this be passing the url as a variable to the routine ?

set oMatches = oRE.Execute(GetWebpage) ?
0
 
LVL 3

Author Comment

by:fordraiders
ID: 39859768
set oMatches = oRE.Execute(GetWebpage)  getting error "Argument Not Optional"
0
 
LVL 3

Author Comment

by:fordraiders
ID: 39859925
ok, Nevermind.
I just did this...

Dim xml As Object ' MSXML2.XMLHTTP
Dim result As String
 
  Set xml = GetMSXML
 
  ' grab webpage
  With xml
    .Open "GET", url, False
    .send
  End With
 
  str = xml.responseText

Set oMatches = oRE.Execute(str)

Now getting it into a workbook or listbox.
0
 
LVL 45

Expert Comment

by:aikimark
ID: 39859940
I assume that GetWebpage will actually be GetWebpage(cel.Value) if executed outside of the parsing routine.
set oMatches = oRE.Execute(GetWebpage(cel.Value))

Open in new window


If done inside the GetWebPage() function, the statement will probably look like this:
set oMatches = oRE.Execute(xml.responseText)

Open in new window

0
 
LVL 3

Author Comment

by:fordraiders
ID: 39860084
ok...
Thanks


here is thew completed project...
adding items to listbox.
Function testEERegex()
Dim oRE
Dim oMatches
Dim oMatch
Dim Description, Mfg, GrangerID, MfgID
Dim str As String
Dim url
Dim xml As Object ' MSXML2.XMLHTTP
Dim result As String
Dim x As Integer





url = "http://www.grainger.com/search?searchQuery=RIP CLAW HAMMER 16"

Set oRE = CreateObject("vbscript.regexp")
oRE.Global = True
oRE.Pattern = "<a href.*>(.*?)</a></p><p class=""productBrand"">(.*?)</p>.*?<span class=""productInfoValueList"">(.*?)</span>.*?<span class=""productInfoValueList"">(.*?)</span>"

 
  Set xml = GetMSXML
 
  ' grab webpage
  With xml
    .Open "GET", url, False
    .send
  End With
 
  str = xml.responseText

Set oMatches = oRE.Execute(str)

Sheet1.ListBox1.Clear

x = 0
For Each oMatch In oMatches
    
    Description = Trim(oMatch.Submatches(0))
    Mfg = Trim(oMatch.Submatches(1))
    GrangerID = Trim(oMatch.Submatches(2))
    MfgID = Trim(oMatch.Submatches(3))
    
    ' now add the stuff to the listbox
    With Sheet1.ListBox1
    .ColumnCount = 4
    .ColumnWidths = "100;60;60;60"
    .AddItem
    .List(x, 0) = Description
    .List(x, 1) = Mfg
    .List(x, 2) = GrangerID
    .List(x, 3) = MfgID
    x = x + 1
End With

Open in new window

0
 
LVL 3

Author Closing Comment

by:fordraiders
ID: 39860086
Beautiful ,


AND thanks very much !!
0
Highfive + Dolby Voice = No More Audio Complaints!

Poor audio quality is one of the top reasons people don’t use video conferencing. Get the crispest, clearest audio powered by Dolby Voice in every meeting. Highfive and Dolby Voice deliver the best video conferencing and audio experience for every meeting and every room.

 
LVL 45

Expert Comment

by:aikimark
ID: 39860108
You might also populate some range in a (probably hidden) worksheet and use the listbox .Rowsource property to connect the listbox to the data.  Consider this change if you (your users) experience poor performance.
0
 
LVL 3

Author Comment

by:fordraiders
ID: 39860120
Yes, Thanks
its fairly quick.
But again...Thanks very much..
may have more questions later.

by the way:
did u piece this together or pull from the source code in the page ?

oRE.Pattern = "<a href.*>(.*?)</a></p><p class=""productBrand"">(.*?)</p>.*?<span class=""productInfoValueList"">(.*?)</span>.*?<span class=""productInfoValueList"">(.*?)</span>"
0
 
LVL 45

Expert Comment

by:aikimark
ID: 39860131
Since you are populating the columns in the same order as the regexp submatches, you might also eliminate the use of the variables.
Dim lngCol as Long    'new variable

With Sheet1.ListBox1
    .ColumnCount = 4
    .ColumnWidths = "100;60;60;60"
End With

For Each oMatch In oMatches
    ' now add the stuff to the listbox
    With Sheet1.ListBox1
       .AddItem
       For lngCol=0 to 3
          .List(x, lngCol) = Trim(oMatch.Submatches(lngCol))
       Next
       x = x + 1
    End With
Next

Open in new window

0
 
LVL 45

Expert Comment

by:aikimark
ID: 39860165
did u piece this together or pull from the source code in the page ?
I looked at the web page source (view source).  I copied a couple of the result sections to a tool that allows me to do regexp parsing with the vbscript.regexp object.

My first attempt just used a pattern that involved the productInfoValueList text.  When it returned two matches per item, it actually made my job easier.  I repeated the pattern and added capture groups for the other two items.

I tested with a pattern that began with ">(.*?)</a>".  It didn't parse correctly, so I it to begin with "<a href.*>(.*?)</a>".  I then wrote the vb code.  I used variables to make the code easier to understand.

===========
Don't forget about adding &perPage=100 to your URL string.
0
 
LVL 3

Author Comment

by:fordraiders
ID: 39862120
aikimark, Trying to add another tag.

<p class="productPrice"><span class="priceLabel">Price:</span> $39.10

oRE.Pattern = "<a href.*>(.*?)</a></p><p class=""productBrand"">(.*?)</p>.*?<span class=""productInfoValueList"">(.*?)</span>.*?<span class=""productInfoValueList"">(.*?)</span>(.*?)</span>.*?<span class=""priceLabel"">Price:>(.*?)</span> "


this is  not working.

Thanks
fordraiders
0
 
LVL 45

Expert Comment

by:aikimark
ID: 39862768
please open a new question and post a link to the new question in this thread.
0
 
LVL 45

Expert Comment

by:aikimark
ID: 39864020
@fordraiders

Have you opened a new question?  If so, where is the link to the new question?
0
 
LVL 3

Author Comment

by:fordraiders
ID: 39864023
View your question here: http:Q_28366259.html
0

Featured Post

6 Surprising Benefits of Threat Intelligence

All sorts of threat intelligence is available on the web. Intelligence you can learn from, and use to anticipate and prepare for future attacks.

Join & Write a Comment

Today, still in the boom of Apple, PC's and products, nearly 50% of the computer users use Windows as graphical operating systems. If you are among those users who love windows, but are grappling to keep the system's hard drive optimized, then you s…
This article explains how to prepare an HTML email signature template file containing dynamic placeholders for users' Azure AD data. Furthermore, it explains how to use this file to remotely set up a department-wide email signature policy in Office …
This Micro Tutorial demonstrates in Microsoft Excel how to consolidate your marketing data by creating an interactive charts using form controls. This creates cool drop-downs for viewers of your chart to choose from.
This Micro Tutorial will demonstrate how to use a scrolling table in Microsoft Excel using the INDEX function.

743 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now