?
Solved

Extract Data from a website

Posted on 2013-11-24
12
Medium Priority
?
332 Views
Last Modified: 2013-12-05
Excel vba 2010.


Thanks to byundt,

I have code directing me to a website.
Function GetIE() As Object
Dim IE As Object, oShApp As Object, oWin As Object
Set oShApp = CreateObject("Shell.Application")
For Each oWin In oShApp.Windows
    If TypeName(oWin.Document) = "HTMLDocument" Then
        Set IE = oWin
        Set oShApp = Nothing
        Set oWin = Nothing
        Exit For
    End If
Next
If IE Is Nothing Then
    Set IE = CreateObject("InternetExplorer.Application")
End If
Set GetIE = IE
End Function

Sub test()
Dim IE As Object
Dim t As String
Set IE = GetIE()
t = "11509957"
IE.Navigate ("http://www.fastenal.com/web/products/detail.ex?sku=" & t & " ")
End Sub 

Open in new window


Once i'am at a website:
I need to know if there is way to extract data from a webpage...if i know what html code to look for..

in this case: but not the website above..
mboxCreate("hybris_idp", "communityName=Anonymous", "SKU=4YP37", "db_audience="

The vaue i'm wanting is the value after the

"SKU=<alphanumeric number>",


say to a textbox 2 on my userform...

Thanks
fordraiders
0
Comment
Question by:fordraiders
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 7
  • 5
12 Comments
 
LVL 81

Expert Comment

by:byundt
ID: 39673621
Could you please post an URL for a web page that has the desired information?
0
 
LVL 3

Author Comment

by:fordraiders
ID: 39673772
0
 
LVL 81

Expert Comment

by:byundt
ID: 39673789
When I used the View...Source menu item on that web  page in Internet Explorer, I found the following text on statement 506:
mboxCreate("hybris_idp", "communityName=Anonymous", "SKU=4YP37", "db_audience=" + data.audience, "db_industry=" + data.industry);

Would you be wanting the code to return the following text?
"db_audience=" + data.audience, "db_industry=" + data.industry);
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
LVL 3

Author Comment

by:fordraiders
ID: 39674640
bynudt, No just the "4YP37"
Thanks
0
 
LVL 3

Author Comment

by:fordraiders
ID: 39695583
byundt, when u get a chance...any more suggestions
0
 
LVL 81

Expert Comment

by:byundt
ID: 39695758
I'm not understanding your overall problem.

I asked you for a link that contained the desired information. You gave me:
http://www.grainger.com/search?searchQuery=4YP37 

I then asked what information you wanted to get from that web page. You responded "just the 4YP37"

Since the desired information is part of the link, I am definitely not understanding what you want.
0
 
LVL 3

Author Comment

by:fordraiders
ID: 39695914
ok...
i'll rephrase my repsonse.
Thanks
0
 
LVL 3

Author Comment

by:fordraiders
ID: 39696737
byundt, If a person goes to this website
http://www.grainger.com
and then say finds a product  and click on it to view details...such as "4YP37"

Then i need to pull that sku # from the webpage.

does this help..
Thanks
fordraiders
0
 
LVL 81

Accepted Solution

by:
byundt earned 2000 total points
ID: 39697239
I still don't understand your workflow, but assume that you can modify the following sub to suit your purpose. As written, it searches the webpages returned by a set of URLs starting in cell A1, and puts the SKU in the adjacent cells. Each webpage is returned as a very long string. The sub then searches that string for the text you specified: mboxCreate("hybris_idp", "communityName=Anonymous", "SKU=        If found, the sub returns the SKU that follows that text.
Sub TestScreenScraping()
  Dim webpage As String
  Dim i As Long, j As Long
  Dim sFind As String, SKU As String
  Dim cel As Range, rg As Range
  With Worksheets("Sheet1")
    Set rg = .Range("A1")       'First URL to check
    If rg.Offset(1, 0) <> "" Then Set rg = Range(rg, rg.End(xlDown))    'All the URLs in that column
  End With
  sFind = "mboxCreate(""hybris_idp"", ""communityName=Anonymous"", ""SKU="
    
  For Each cel In rg.Cells
    If cel.Value <> "" Then
      webpage = ""
      i = 0
      j = 0
      SKU = ""
      webpage = GetWebpage(cel.Value)
      i = InStr(1, webpage, sFind)
      If i > 0 Then
        i = i + Len(sFind)
        j = InStr(i, webpage, """")
        If j > 0 Then
          SKU = Mid$(webpage, i, j - i)
        End If
      End If
    End If
    cel.Offset(0, 1).Value = SKU
  Next
End Sub

Open in new window

The code uses a number of functions written by Jimmy Pena, and posted on his website "Screen Scraping 101 with VBA" http://www.jpsoftwaretech.com/screen-scraping-101-with-vba/. Supplemental code is at his page "MSXML Object Library Routines" http://www.jpsoftwaretech.com/vba/msxml-object-library-routines/#classmod
'Screen scraping functions written by Jimmy Pena _
    http://www.jpsoftwaretech.com/screen-scraping-101-with-vba/ _
    http://www.jpsoftwaretech.com/vba/msxml-object-library-routines/#classmod
Function GetIE() As Object
Dim IE As Object, oShApp As Object, oWin As Object
Set oShApp = CreateObject("Shell.Application")
For Each oWin In oShApp.Windows
    If TypeName(oWin.Document) = "HTMLDocument" Then
        Set IE = oWin
        Set oShApp = Nothing
        Set oWin = Nothing
        Exit For
    End If
Next
If IE Is Nothing Then
    Set IE = CreateObject("InternetExplorer.Application")
End If
Set GetIE = IE
End Function

Function GetWebpage(url As String, Optional fileName As String) As String
 
Dim xml As Object ' MSXML2.XMLHTTP
Dim result As String
 
  Set xml = GetMSXML
 
  ' grab webpage
  With xml
    .Open "GET", url, False
    .send
  End With
 
  GetWebpage = xml.responseText
 
  ' write to file?
  If Len(fileName) > 0 Then
    If Not FileExists(fileName) Then
      Call CreateFile(fileName, GetWebpage)
    Else ' file exists
      If MsgBox("File already exists, overwrite?", vbYesNo) = vbYes Then
        Call CreateFile(fileName, GetWebpage)
      End If
    End If
  End If
 
End Function

Function FileExists(fileName As String) As Boolean
  FileExists = (Len(Dir(fileName)) > 0)
End Function

Function GetMSXML() As Object  '  MSXML2.XMLHTTP60
 On Error Resume Next
  Set GetMSXML = CreateObject("MSXML2.XMLHTTP.6.0")
End Function

Function CreateFile(fileName As String, contents As String) As String
' creates file from string contents
 
Dim tempFile As String
Dim nextFileNum As Long
 
  nextFileNum = FreeFile
  tempFile = fileName
 
  Open tempFile For Output As #nextFileNum
  Print #nextFileNum, contents
  Close #nextFileNum
 
  CreateFile = tempFile
End Function

Open in new window

ScreenscrapingQ28302714.xlsm
0
 
LVL 3

Author Closing Comment

by:fordraiders
ID: 39699622
Perfect, Will take it from here...
0
 
LVL 3

Author Comment

by:fordraiders
ID: 39699995
byundt, I guess i expected  the opposite to happen.

No information on the Worksheet will predetermine where to go on this website.

A person goes to this site looking for product. Once they find a product.

They press the button on the sheet... the code will go to the current page a person has open on this website while in product view.

say at this particular moment they are looking at a product number "3UU59"

the code will look for this part of the html code:
mboxCreate("hybris_idp", "communityName=GIS", "SKU=3UU59");

and return me the  
3UU59 to the cell in A1
TO THE SHEET.
0
 
LVL 81

Expert Comment

by:byundt
ID: 39700071
I am stuck trying to figure out which tab is active if you have more than one in your Internet Explorer session. If there is only one, then you can capture the SKU using this sub (along with the previous code). If there are more than one tab, the code checks only the first one.
Sub GetSKUonFirstTab()
Dim IE As Object
Dim webpage As String
Dim i As Long, j As Long
Dim sFind As String, SKU As String, URL As String

sFind = "mboxCreate(""hybris_idp"", ""communityName=Anonymous"", ""SKU="
Set IE = GetIE()

URL = IE.LocationURL
webpage = ""
i = 0
j = 0
SKU = ""
webpage = GetWebpage(URL)
i = InStr(1, webpage, sFind)
If i > 0 Then
    i = i + Len(sFind)
    j = InStr(i, webpage, """")
    If j > 0 Then
        SKU = Mid$(webpage, i, j - i)
    End If
End If
End Sub

Open in new window

0

Featured Post

Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

A little background as to how I came to I design this code: Around 5 years ago I designed an add-in that formatted Excel files to a corporate standard, applying different cell colours and font type depending on whether the cells contained inputs,…
When you see single cell contains number and text, and you have to get any date out of it seems like cracking our heads.
This Micro Tutorial will demonstrate in Google Sheets how to use the HYPERLINK function to create live links inside your spreadsheet.
Excel styles will make formatting consistent and let you apply and change formatting faster. In this tutorial, you'll learn how to use Excel's built-in styles, how to modify styles, and how to create your own. You'll also learn how to use your custo…

650 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question