Solved

Extract Data from a website

Posted on 2013-11-24
12
298 Views
Last Modified: 2013-12-05
Excel vba 2010.


Thanks to byundt,

I have code directing me to a website.
Function GetIE() As Object
Dim IE As Object, oShApp As Object, oWin As Object
Set oShApp = CreateObject("Shell.Application")
For Each oWin In oShApp.Windows
    If TypeName(oWin.Document) = "HTMLDocument" Then
        Set IE = oWin
        Set oShApp = Nothing
        Set oWin = Nothing
        Exit For
    End If
Next
If IE Is Nothing Then
    Set IE = CreateObject("InternetExplorer.Application")
End If
Set GetIE = IE
End Function

Sub test()
Dim IE As Object
Dim t As String
Set IE = GetIE()
t = "11509957"
IE.Navigate ("http://www.fastenal.com/web/products/detail.ex?sku=" & t & " ")
End Sub 

Open in new window


Once i'am at a website:
I need to know if there is way to extract data from a webpage...if i know what html code to look for..

in this case: but not the website above..
mboxCreate("hybris_idp", "communityName=Anonymous", "SKU=4YP37", "db_audience="

The vaue i'm wanting is the value after the

"SKU=<alphanumeric number>",


say to a textbox 2 on my userform...

Thanks
fordraiders
0
Comment
Question by:fordraiders
  • 7
  • 5
12 Comments
 
LVL 81

Expert Comment

by:byundt
ID: 39673621
Could you please post an URL for a web page that has the desired information?
0
 
LVL 3

Author Comment

by:fordraiders
ID: 39673772
0
 
LVL 81

Expert Comment

by:byundt
ID: 39673789
When I used the View...Source menu item on that web  page in Internet Explorer, I found the following text on statement 506:
mboxCreate("hybris_idp", "communityName=Anonymous", "SKU=4YP37", "db_audience=" + data.audience, "db_industry=" + data.industry);

Would you be wanting the code to return the following text?
"db_audience=" + data.audience, "db_industry=" + data.industry);
0
 
LVL 3

Author Comment

by:fordraiders
ID: 39674640
bynudt, No just the "4YP37"
Thanks
0
 
LVL 3

Author Comment

by:fordraiders
ID: 39695583
byundt, when u get a chance...any more suggestions
0
 
LVL 81

Expert Comment

by:byundt
ID: 39695758
I'm not understanding your overall problem.

I asked you for a link that contained the desired information. You gave me:
http://www.grainger.com/search?searchQuery=4YP37 

I then asked what information you wanted to get from that web page. You responded "just the 4YP37"

Since the desired information is part of the link, I am definitely not understanding what you want.
0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 
LVL 3

Author Comment

by:fordraiders
ID: 39695914
ok...
i'll rephrase my repsonse.
Thanks
0
 
LVL 3

Author Comment

by:fordraiders
ID: 39696737
byundt, If a person goes to this website
http://www.grainger.com
and then say finds a product  and click on it to view details...such as "4YP37"

Then i need to pull that sku # from the webpage.

does this help..
Thanks
fordraiders
0
 
LVL 81

Accepted Solution

by:
byundt earned 500 total points
ID: 39697239
I still don't understand your workflow, but assume that you can modify the following sub to suit your purpose. As written, it searches the webpages returned by a set of URLs starting in cell A1, and puts the SKU in the adjacent cells. Each webpage is returned as a very long string. The sub then searches that string for the text you specified: mboxCreate("hybris_idp", "communityName=Anonymous", "SKU=        If found, the sub returns the SKU that follows that text.
Sub TestScreenScraping()
  Dim webpage As String
  Dim i As Long, j As Long
  Dim sFind As String, SKU As String
  Dim cel As Range, rg As Range
  With Worksheets("Sheet1")
    Set rg = .Range("A1")       'First URL to check
    If rg.Offset(1, 0) <> "" Then Set rg = Range(rg, rg.End(xlDown))    'All the URLs in that column
  End With
  sFind = "mboxCreate(""hybris_idp"", ""communityName=Anonymous"", ""SKU="
    
  For Each cel In rg.Cells
    If cel.Value <> "" Then
      webpage = ""
      i = 0
      j = 0
      SKU = ""
      webpage = GetWebpage(cel.Value)
      i = InStr(1, webpage, sFind)
      If i > 0 Then
        i = i + Len(sFind)
        j = InStr(i, webpage, """")
        If j > 0 Then
          SKU = Mid$(webpage, i, j - i)
        End If
      End If
    End If
    cel.Offset(0, 1).Value = SKU
  Next
End Sub

Open in new window

The code uses a number of functions written by Jimmy Pena, and posted on his website "Screen Scraping 101 with VBA" http://www.jpsoftwaretech.com/screen-scraping-101-with-vba/. Supplemental code is at his page "MSXML Object Library Routines" http://www.jpsoftwaretech.com/vba/msxml-object-library-routines/#classmod
'Screen scraping functions written by Jimmy Pena _
    http://www.jpsoftwaretech.com/screen-scraping-101-with-vba/ _
    http://www.jpsoftwaretech.com/vba/msxml-object-library-routines/#classmod
Function GetIE() As Object
Dim IE As Object, oShApp As Object, oWin As Object
Set oShApp = CreateObject("Shell.Application")
For Each oWin In oShApp.Windows
    If TypeName(oWin.Document) = "HTMLDocument" Then
        Set IE = oWin
        Set oShApp = Nothing
        Set oWin = Nothing
        Exit For
    End If
Next
If IE Is Nothing Then
    Set IE = CreateObject("InternetExplorer.Application")
End If
Set GetIE = IE
End Function

Function GetWebpage(url As String, Optional fileName As String) As String
 
Dim xml As Object ' MSXML2.XMLHTTP
Dim result As String
 
  Set xml = GetMSXML
 
  ' grab webpage
  With xml
    .Open "GET", url, False
    .send
  End With
 
  GetWebpage = xml.responseText
 
  ' write to file?
  If Len(fileName) > 0 Then
    If Not FileExists(fileName) Then
      Call CreateFile(fileName, GetWebpage)
    Else ' file exists
      If MsgBox("File already exists, overwrite?", vbYesNo) = vbYes Then
        Call CreateFile(fileName, GetWebpage)
      End If
    End If
  End If
 
End Function

Function FileExists(fileName As String) As Boolean
  FileExists = (Len(Dir(fileName)) > 0)
End Function

Function GetMSXML() As Object  '  MSXML2.XMLHTTP60
 On Error Resume Next
  Set GetMSXML = CreateObject("MSXML2.XMLHTTP.6.0")
End Function

Function CreateFile(fileName As String, contents As String) As String
' creates file from string contents
 
Dim tempFile As String
Dim nextFileNum As Long
 
  nextFileNum = FreeFile
  tempFile = fileName
 
  Open tempFile For Output As #nextFileNum
  Print #nextFileNum, contents
  Close #nextFileNum
 
  CreateFile = tempFile
End Function

Open in new window

ScreenscrapingQ28302714.xlsm
0
 
LVL 3

Author Closing Comment

by:fordraiders
ID: 39699622
Perfect, Will take it from here...
0
 
LVL 3

Author Comment

by:fordraiders
ID: 39699995
byundt, I guess i expected  the opposite to happen.

No information on the Worksheet will predetermine where to go on this website.

A person goes to this site looking for product. Once they find a product.

They press the button on the sheet... the code will go to the current page a person has open on this website while in product view.

say at this particular moment they are looking at a product number "3UU59"

the code will look for this part of the html code:
mboxCreate("hybris_idp", "communityName=GIS", "SKU=3UU59");

and return me the  
3UU59 to the cell in A1
TO THE SHEET.
0
 
LVL 81

Expert Comment

by:byundt
ID: 39700071
I am stuck trying to figure out which tab is active if you have more than one in your Internet Explorer session. If there is only one, then you can capture the SKU using this sub (along with the previous code). If there are more than one tab, the code checks only the first one.
Sub GetSKUonFirstTab()
Dim IE As Object
Dim webpage As String
Dim i As Long, j As Long
Dim sFind As String, SKU As String, URL As String

sFind = "mboxCreate(""hybris_idp"", ""communityName=Anonymous"", ""SKU="
Set IE = GetIE()

URL = IE.LocationURL
webpage = ""
i = 0
j = 0
SKU = ""
webpage = GetWebpage(URL)
i = InStr(1, webpage, sFind)
If i > 0 Then
    i = i + Len(sFind)
    j = InStr(i, webpage, """")
    If j > 0 Then
        SKU = Mid$(webpage, i, j - i)
    End If
End If
End Sub

Open in new window

0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Improved? Move/Copy Add-in Replacement - How to avoid the annoying, “A formula or sheet you want to move or copy contains the name XXX, which already exists on the destination worksheet.” David Miller (dlmille)  It was one of those days… I wa…
Freeze panes is an option within all variants of Excel to enable parts of a sheet to remain stationary when the cursor is in another part of the sheet. This is a very useful feature which is overlooked or under used.
The viewer will learn how to create a normally distributed random variable in Excel, use a normal distribution to simulate the return on an investment over a period of years, Create a Monte Carlo simulation using a normal random variable, and calcul…
This Micro Tutorial demonstrate the bugs in Microsoft Excel for Mac with Pivot Charts.

943 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now