Solved

Extract Data from a website

Posted on 2013-11-24
12
294 Views
Last Modified: 2013-12-05
Excel vba 2010.


Thanks to byundt,

I have code directing me to a website.
Function GetIE() As Object
Dim IE As Object, oShApp As Object, oWin As Object
Set oShApp = CreateObject("Shell.Application")
For Each oWin In oShApp.Windows
    If TypeName(oWin.Document) = "HTMLDocument" Then
        Set IE = oWin
        Set oShApp = Nothing
        Set oWin = Nothing
        Exit For
    End If
Next
If IE Is Nothing Then
    Set IE = CreateObject("InternetExplorer.Application")
End If
Set GetIE = IE
End Function

Sub test()
Dim IE As Object
Dim t As String
Set IE = GetIE()
t = "11509957"
IE.Navigate ("http://www.fastenal.com/web/products/detail.ex?sku=" & t & " ")
End Sub 

Open in new window


Once i'am at a website:
I need to know if there is way to extract data from a webpage...if i know what html code to look for..

in this case: but not the website above..
mboxCreate("hybris_idp", "communityName=Anonymous", "SKU=4YP37", "db_audience="

The vaue i'm wanting is the value after the

"SKU=<alphanumeric number>",


say to a textbox 2 on my userform...

Thanks
fordraiders
0
Comment
Question by:fordraiders
  • 7
  • 5
12 Comments
 
LVL 80

Expert Comment

by:byundt
ID: 39673621
Could you please post an URL for a web page that has the desired information?
0
 
LVL 3

Author Comment

by:fordraiders
ID: 39673772
0
 
LVL 80

Expert Comment

by:byundt
ID: 39673789
When I used the View...Source menu item on that web  page in Internet Explorer, I found the following text on statement 506:
mboxCreate("hybris_idp", "communityName=Anonymous", "SKU=4YP37", "db_audience=" + data.audience, "db_industry=" + data.industry);

Would you be wanting the code to return the following text?
"db_audience=" + data.audience, "db_industry=" + data.industry);
0
 
LVL 3

Author Comment

by:fordraiders
ID: 39674640
bynudt, No just the "4YP37"
Thanks
0
 
LVL 3

Author Comment

by:fordraiders
ID: 39695583
byundt, when u get a chance...any more suggestions
0
 
LVL 80

Expert Comment

by:byundt
ID: 39695758
I'm not understanding your overall problem.

I asked you for a link that contained the desired information. You gave me:
http://www.grainger.com/search?searchQuery=4YP37

I then asked what information you wanted to get from that web page. You responded "just the 4YP37"

Since the desired information is part of the link, I am definitely not understanding what you want.
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 3

Author Comment

by:fordraiders
ID: 39695914
ok...
i'll rephrase my repsonse.
Thanks
0
 
LVL 3

Author Comment

by:fordraiders
ID: 39696737
byundt, If a person goes to this website
http://www.grainger.com
and then say finds a product  and click on it to view details...such as "4YP37"

Then i need to pull that sku # from the webpage.

does this help..
Thanks
fordraiders
0
 
LVL 80

Accepted Solution

by:
byundt earned 500 total points
ID: 39697239
I still don't understand your workflow, but assume that you can modify the following sub to suit your purpose. As written, it searches the webpages returned by a set of URLs starting in cell A1, and puts the SKU in the adjacent cells. Each webpage is returned as a very long string. The sub then searches that string for the text you specified: mboxCreate("hybris_idp", "communityName=Anonymous", "SKU=        If found, the sub returns the SKU that follows that text.
Sub TestScreenScraping()
  Dim webpage As String
  Dim i As Long, j As Long
  Dim sFind As String, SKU As String
  Dim cel As Range, rg As Range
  With Worksheets("Sheet1")
    Set rg = .Range("A1")       'First URL to check
    If rg.Offset(1, 0) <> "" Then Set rg = Range(rg, rg.End(xlDown))    'All the URLs in that column
  End With
  sFind = "mboxCreate(""hybris_idp"", ""communityName=Anonymous"", ""SKU="
    
  For Each cel In rg.Cells
    If cel.Value <> "" Then
      webpage = ""
      i = 0
      j = 0
      SKU = ""
      webpage = GetWebpage(cel.Value)
      i = InStr(1, webpage, sFind)
      If i > 0 Then
        i = i + Len(sFind)
        j = InStr(i, webpage, """")
        If j > 0 Then
          SKU = Mid$(webpage, i, j - i)
        End If
      End If
    End If
    cel.Offset(0, 1).Value = SKU
  Next
End Sub

Open in new window

The code uses a number of functions written by Jimmy Pena, and posted on his website "Screen Scraping 101 with VBA" http://www.jpsoftwaretech.com/screen-scraping-101-with-vba/. Supplemental code is at his page "MSXML Object Library Routines" http://www.jpsoftwaretech.com/vba/msxml-object-library-routines/#classmod
'Screen scraping functions written by Jimmy Pena _
    http://www.jpsoftwaretech.com/screen-scraping-101-with-vba/ _
    http://www.jpsoftwaretech.com/vba/msxml-object-library-routines/#classmod
Function GetIE() As Object
Dim IE As Object, oShApp As Object, oWin As Object
Set oShApp = CreateObject("Shell.Application")
For Each oWin In oShApp.Windows
    If TypeName(oWin.Document) = "HTMLDocument" Then
        Set IE = oWin
        Set oShApp = Nothing
        Set oWin = Nothing
        Exit For
    End If
Next
If IE Is Nothing Then
    Set IE = CreateObject("InternetExplorer.Application")
End If
Set GetIE = IE
End Function

Function GetWebpage(url As String, Optional fileName As String) As String
 
Dim xml As Object ' MSXML2.XMLHTTP
Dim result As String
 
  Set xml = GetMSXML
 
  ' grab webpage
  With xml
    .Open "GET", url, False
    .send
  End With
 
  GetWebpage = xml.responseText
 
  ' write to file?
  If Len(fileName) > 0 Then
    If Not FileExists(fileName) Then
      Call CreateFile(fileName, GetWebpage)
    Else ' file exists
      If MsgBox("File already exists, overwrite?", vbYesNo) = vbYes Then
        Call CreateFile(fileName, GetWebpage)
      End If
    End If
  End If
 
End Function

Function FileExists(fileName As String) As Boolean
  FileExists = (Len(Dir(fileName)) > 0)
End Function

Function GetMSXML() As Object  '  MSXML2.XMLHTTP60
 On Error Resume Next
  Set GetMSXML = CreateObject("MSXML2.XMLHTTP.6.0")
End Function

Function CreateFile(fileName As String, contents As String) As String
' creates file from string contents
 
Dim tempFile As String
Dim nextFileNum As Long
 
  nextFileNum = FreeFile
  tempFile = fileName
 
  Open tempFile For Output As #nextFileNum
  Print #nextFileNum, contents
  Close #nextFileNum
 
  CreateFile = tempFile
End Function

Open in new window

ScreenscrapingQ28302714.xlsm
0
 
LVL 3

Author Closing Comment

by:fordraiders
ID: 39699622
Perfect, Will take it from here...
0
 
LVL 3

Author Comment

by:fordraiders
ID: 39699995
byundt, I guess i expected  the opposite to happen.

No information on the Worksheet will predetermine where to go on this website.

A person goes to this site looking for product. Once they find a product.

They press the button on the sheet... the code will go to the current page a person has open on this website while in product view.

say at this particular moment they are looking at a product number "3UU59"

the code will look for this part of the html code:
mboxCreate("hybris_idp", "communityName=GIS", "SKU=3UU59");

and return me the  
3UU59 to the cell in A1
TO THE SHEET.
0
 
LVL 80

Expert Comment

by:byundt
ID: 39700071
I am stuck trying to figure out which tab is active if you have more than one in your Internet Explorer session. If there is only one, then you can capture the SKU using this sub (along with the previous code). If there are more than one tab, the code checks only the first one.
Sub GetSKUonFirstTab()
Dim IE As Object
Dim webpage As String
Dim i As Long, j As Long
Dim sFind As String, SKU As String, URL As String

sFind = "mboxCreate(""hybris_idp"", ""communityName=Anonymous"", ""SKU="
Set IE = GetIE()

URL = IE.LocationURL
webpage = ""
i = 0
j = 0
SKU = ""
webpage = GetWebpage(URL)
i = InStr(1, webpage, sFind)
If i > 0 Then
    i = i + Len(sFind)
    j = InStr(i, webpage, """")
    If j > 0 Then
        SKU = Mid$(webpage, i, j - i)
    End If
End If
End Sub

Open in new window

0

Featured Post

How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

Join & Write a Comment

Drop Down List with Unique/Distinct Values (Part II - ComboBox or ListBox and Data Validation List Bonus!) David Miller (dlmille) Intro This article focuses on delivering unique, sorted lists to list objects (e.g., ComboBox, ListBox) and Dat…
Introduction This Article briefly covers methods of calculating the NPV and IRR variants in Excel as well as the limitations in calculating and interpreting IRR results. Paraphrasing Richard Shockley, author of my favourite finance reference tex…
The viewer will learn how to simulate a series of coin tosses with the rand() function and learn how to make these “tosses” depend on a predetermined probability. Flipping Coins in Excel: Enter =RAND() into cell A2: Recalculate the random variable…
The viewer will learn how to use a discrete random variable to simulate the return on an investment over a period of years, create a Monte Carlo simulation using the discrete random variable, and create a graph to represent the possible returns over…

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now