Fordraiders
asked on
Pulling data from webpage and placing in multiple columns and loop procedure
Excel 2010 vba:
Pulling data in from website and placing on sheet.
What I have:
Code that does a single pass on the html data and places it on the worksheet.
I have this url string in ColumnA on the worksheet.
http://www.grainger.com/search?searchQuery=CLAW HAMMER 16
Once the website is reached and results returned on the webpage.
The code will mak a single pass and place the data on the sheet. ColumnB
What I need:
I have in an array several metatags that i need placed on the sheet starting at column B to Column D
Problems:
I may have several results to return to the sheet.
So i need to loop through the html page and grab all necessary data.
and loop through the tags and place them on the sheet.
Example:
Right now with a sinple pass the result is returning.
Single Pass
Column A Column B
http://www.grainger.com/search?searchQuery=CLAW HAMMER 16 4RY57
Make a Pass through ALL the html data:
Example:
Column
B C D E
6R252 STANLEY 51-616 Claw Hammer, 16 Oz, Polished, Hickory
6XV65 STANLEY 51-942 Rip Claw Hammer, 16 Oz, Smooth, Steel
etc.....
Thanks
fordraiders
Pulling data in from website and placing on sheet.
What I have:
Code that does a single pass on the html data and places it on the worksheet.
I have this url string in ColumnA on the worksheet.
http://www.grainger.com/search?searchQuery=CLAW HAMMER 16
Once the website is reached and results returned on the webpage.
The code will mak a single pass and place the data on the sheet. ColumnB
What I need:
I have in an array several metatags that i need placed on the sheet starting at column B to Column D
Problems:
I may have several results to return to the sheet.
So i need to loop through the html page and grab all necessary data.
and loop through the tags and place them on the sheet.
Example:
Right now with a sinple pass the result is returning.
Single Pass
Column A Column B
http://www.grainger.com/search?searchQuery=CLAW HAMMER 16 4RY57
Make a Pass through ALL the html data:
Example:
Column
B C D E
6R252 STANLEY 51-616 Claw Hammer, 16 Oz, Polished, Hickory
6XV65 STANLEY 51-942 Rip Claw Hammer, 16 Oz, Smooth, Steel
etc.....
Option Explicit
Sub TestScreenScraping()
Dim webpage As String
Dim i As Long, j As Long
Dim sFind As String, SKU As String
Dim cel As Range, rg As Range
Dim arr
Dim TAG As String
arr = Array("data-sku=""", "class=""productBrand"">", "class=""productMFR""> | Mfr. Model # <span class=""productInfoValueList"">", "class=""productLink"" title=""")
With Worksheets("Sheet1")
Set rg = .Range("A1") 'First URL to check
If rg.Offset(1, 0) <> "" Then Set rg = Range(rg, rg.End(xlDown)) 'All the URLs in that column
End With
' ======================= MULTI PAGE RESULT ===============================
' FINDS JUST A SKU FROM A MULTIPAGE RESULT
sFind = "data-sku="""
' FINDS MFGNAME FROM A MULTIPAGE RESULT WITH GARBAGE
' sFind = "class=""productBrand"">"
' FINDS MFGNUMBER FROM A MULTIPAGE RESULT WITH GARBAGE
' sFind = "class=""productMFR""> | Mfr. Model # <span class=""productInfoValueList"">"
' FINDS DESCRIPTION FROM A MULTIPAGE RESULT
' sFind = "class=""productLink"" title="""
' FINDS pagenumber FROM A MULTIPAGE RESULT
' sFind = "class=""productCatalog""> Catalog Page #"
' ======================= MULTI PAGE RESULT ===============================
For Each cel In rg.Cells
If cel.Value <> "" Then
webpage = ""
i = 0
j = 0
SKU = ""
webpage = GetWebpage(cel.Value)
i = InStr(1, webpage, sFind)
If i > 0 Then
i = i + Len(sFind)
j = InStr(i, webpage, """")
If j > 0 Then
SKU = Mid$(webpage, i, j - i)
End If
End If
End If
cel.Offset(0, 1).Value = SKU
Next
End Sub
Function GetWebpage(url As String, Optional fileName As String) As String
Dim xml As Object ' MSXML2.XMLHTTP
Dim result As String
Set xml = GetMSXML
' grab webpage
With xml
.Open "GET", url, False
.send
End With
GetWebpage = xml.responseText
' write to file?
If Len(fileName) > 0 Then
If Not FileExists(fileName) Then
Call CreateFile(fileName, GetWebpage)
Else ' file exists
If MsgBox("File already exists, overwrite?", vbYesNo) = vbYes Then
Call CreateFile(fileName, GetWebpage)
End If
End If
End If
End Function
Function FileExists(fileName As String) As Boolean
FileExists = (Len(Dir(fileName)) > 0)
End Function
Function GetMSXML() As Object ' MSXML2.XMLHTTP60
On Error Resume Next
Set GetMSXML = CreateObject("MSXML2.XMLHTTP.6.0")
End Function
Function CreateFile(fileName As String, contents As String) As String
' creates file from string contents
Dim tempFile As String
Dim nextFileNum As Long
nextFileNum = FreeFile
tempFile = fileName
Open tempFile For Output As #nextFileNum
Print #nextFileNum, contents
Close #nextFileNum
CreateFile = tempFile
End Function
Thanks
fordraiders
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
set oMatches = oRE.Execute(GetWebpage) getting error "Argument Not Optional"
ASKER
ok, Nevermind.
I just did this...
Dim xml As Object ' MSXML2.XMLHTTP
Dim result As String
Set xml = GetMSXML
' grab webpage
With xml
.Open "GET", url, False
.send
End With
str = xml.responseText
Set oMatches = oRE.Execute(str)
Now getting it into a workbook or listbox.
I just did this...
Dim xml As Object ' MSXML2.XMLHTTP
Dim result As String
Set xml = GetMSXML
' grab webpage
With xml
.Open "GET", url, False
.send
End With
str = xml.responseText
Set oMatches = oRE.Execute(str)
Now getting it into a workbook or listbox.
I assume that GetWebpage will actually be GetWebpage(cel.Value) if executed outside of the parsing routine.
If done inside the GetWebPage() function, the statement will probably look like this:
set oMatches = oRE.Execute(GetWebpage(cel.Value))
If done inside the GetWebPage() function, the statement will probably look like this:
set oMatches = oRE.Execute(xml.responseText)
ASKER
ok...
Thanks
here is thew completed project...
adding items to listbox.
Thanks
here is thew completed project...
adding items to listbox.
Function testEERegex()
Dim oRE
Dim oMatches
Dim oMatch
Dim Description, Mfg, GrangerID, MfgID
Dim str As String
Dim url
Dim xml As Object ' MSXML2.XMLHTTP
Dim result As String
Dim x As Integer
url = "http://www.grainger.com/search?searchQuery=RIP CLAW HAMMER 16"
Set oRE = CreateObject("vbscript.regexp")
oRE.Global = True
oRE.Pattern = "<a href.*>(.*?)</a></p><p class=""productBrand"">(.*?)</p>.*?<span class=""productInfoValueList"">(.*?)</span>.*?<span class=""productInfoValueList"">(.*?)</span>"
Set xml = GetMSXML
' grab webpage
With xml
.Open "GET", url, False
.send
End With
str = xml.responseText
Set oMatches = oRE.Execute(str)
Sheet1.ListBox1.Clear
x = 0
For Each oMatch In oMatches
Description = Trim(oMatch.Submatches(0))
Mfg = Trim(oMatch.Submatches(1))
GrangerID = Trim(oMatch.Submatches(2))
MfgID = Trim(oMatch.Submatches(3))
' now add the stuff to the listbox
With Sheet1.ListBox1
.ColumnCount = 4
.ColumnWidths = "100;60;60;60"
.AddItem
.List(x, 0) = Description
.List(x, 1) = Mfg
.List(x, 2) = GrangerID
.List(x, 3) = MfgID
x = x + 1
End With
ASKER
Beautiful ,
AND thanks very much !!
AND thanks very much !!
You might also populate some range in a (probably hidden) worksheet and use the listbox .Rowsource property to connect the listbox to the data. Consider this change if you (your users) experience poor performance.
ASKER
Yes, Thanks
its fairly quick.
But again...Thanks very much..
may have more questions later.
by the way:
did u piece this together or pull from the source code in the page ?
oRE.Pattern = "<a href.*>(.*?)</a></p><p class=""productBrand"">(.* ?)</p>.*?< span class=""productInfoValueLi st"">(.*?) </span>.*? <span class=""productInfoValueLi st"">(.*?) </span>"
its fairly quick.
But again...Thanks very much..
may have more questions later.
by the way:
did u piece this together or pull from the source code in the page ?
oRE.Pattern = "<a href.*>(.*?)</a></p><p class=""productBrand"">(.*
Since you are populating the columns in the same order as the regexp submatches, you might also eliminate the use of the variables.
Dim lngCol as Long 'new variable
With Sheet1.ListBox1
.ColumnCount = 4
.ColumnWidths = "100;60;60;60"
End With
For Each oMatch In oMatches
' now add the stuff to the listbox
With Sheet1.ListBox1
.AddItem
For lngCol=0 to 3
.List(x, lngCol) = Trim(oMatch.Submatches(lngCol))
Next
x = x + 1
End With
Next
did u piece this together or pull from the source code in the page ?I looked at the web page source (view source). I copied a couple of the result sections to a tool that allows me to do regexp parsing with the vbscript.regexp object.
My first attempt just used a pattern that involved the productInfoValueList text. When it returned two matches per item, it actually made my job easier. I repeated the pattern and added capture groups for the other two items.
I tested with a pattern that began with ">(.*?)</a>". It didn't parse correctly, so I it to begin with "<a href.*>(.*?)</a>". I then wrote the vb code. I used variables to make the code easier to understand.
===========
Don't forget about adding &perPage=100 to your URL string.
ASKER
aikimark, Trying to add another tag.
<p class="productPrice"><span class="priceLabel">Price:< /span> $39.10
oRE.Pattern = "<a href.*>(.*?)</a></p><p class=""productBrand"">(.* ?)</p>.*?< span class=""productInfoValueLi st"">(.*?) </span>.*? <span class=""productInfoValueLi st"">(.*?) </span>(.* ?)</span>. *?<span class=""priceLabel"">Price :>(.*?)</s pan> "
this is not working.
Thanks
fordraiders
<p class="productPrice"><span
oRE.Pattern = "<a href.*>(.*?)</a></p><p class=""productBrand"">(.*
this is not working.
Thanks
fordraiders
please open a new question and post a link to the new question in this thread.
@fordraiders
Have you opened a new question? If so, where is the link to the new question?
Have you opened a new question? If so, where is the link to the new question?
ASKER
View your question here: http:Q_28366259.html
ASKER
Should this be passing the url as a variable to the routine ?
set oMatches = oRE.Execute(GetWebpage) ?