Solved

adding data To regex expression for pulling web data

Posted on 2014-02-16
10
248 Views
Last Modified: 2014-02-17
What I need:
previous question:
http://www.experts-exchange.com/Software/Office_Productivity/Office_Suites/MS_Office/Excel/Q_28364510.html#a39862768


To Add additional information to regex expression.

the html tag...is.
<p class="productPrice"><span class="priceLabel">Price:</span> $39.10


from the same url as indicated below I need to obtain the price for each item in the row.

Answer to previous question:
Function testEERegex()
Dim oRE
Dim oMatches
Dim oMatch
Dim Description, Mfg, GrangerID, MfgID
Dim str As String
Dim url
Dim xml As Object ' MSXML2.XMLHTTP
Dim result As String
Dim x As Integer





url = "http://www.grainger.com/search?searchQuery=RIP CLAW HAMMER 16"

Set oRE = CreateObject("vbscript.regexp")
oRE.Global = True
oRE.Pattern = "<a href.*>(.*?)</a></p><p class=""productBrand"">(.*?)</p>.*?<span class=""productInfoValueList"">(.*?)</span>.*?<span class=""productInfoValueList"">(.*?)</span>"

 
  Set xml = GetMSXML
 
  ' grab webpage
  With xml
    .Open "GET", url, False
    .send
  End With
 
  str = xml.responseText

Set oMatches = oRE.Execute(str)

Sheet1.ListBox1.Clear

x = 0
For Each oMatch In oMatches
    
    Description = Trim(oMatch.Submatches(0))
    Mfg = Trim(oMatch.Submatches(1))
    GrangerID = Trim(oMatch.Submatches(2))
    MfgID = Trim(oMatch.Submatches(3))
    
    ' now add the stuff to the listbox
    With Sheet1.ListBox1
    .ColumnCount = 4
    .ColumnWidths = "100;60;60;60"
    .AddItem
    .List(x, 0) = Description
    .List(x, 1) = Mfg
    .List(x, 2) = GrangerID
    .List(x, 3) = MfgID
    x = x + 1
End With

Open in new window



Make a Pass through ALL the html data:
Example:
Column

  B             C          D                                  E                                                                 F
6R252 STANLEY 51-616     Claw Hammer, 16 Oz, Polished, Hickory                    $36.10
6XV65 STANLEY 51-942     Rip Claw Hammer, 16 Oz, Smooth, Steel                    $54.23

Thanks
 fordraiders
0
Comment
Question by:fordraiders
  • 4
  • 3
  • 2
10 Comments
 
LVL 35

Expert Comment

by:Terry Woods
ID: 39863409
Try this adjustment to your code:

Function testEERegex()
Dim oRE
Dim oMatches
Dim oMatch
Dim Description, Mfg, GrangerID, MfgID, Price
Dim str As String
Dim url
Dim xml As Object ' MSXML2.XMLHTTP
Dim result As String
Dim x As Integer





url = "http://www.grainger.com/search?searchQuery=RIP CLAW HAMMER 16"

Set oRE = CreateObject("vbscript.regexp")
oRE.Global = True
oRE.Pattern = "<a href.*>(.*?)</a></p><p class=""productBrand"">(.*?)</p>.*?<span class=""productInfoValueList"">(.*?)</span>.*?<span class=""productInfoValueList"">(.*?)</span>.*?<span class="priceLabel">Price:</span>[\s""]*(\$[\d,.]+)"
 
  Set xml = GetMSXML
 
  ' grab webpage
  With xml
    .Open "GET", url, False
    .send
  End With
 
  str = xml.responseText

Set oMatches = oRE.Execute(str)

Sheet1.ListBox1.Clear

x = 0
For Each oMatch In oMatches
    
    Description = Trim(oMatch.Submatches(0))
    Mfg = Trim(oMatch.Submatches(1))
    GrangerID = Trim(oMatch.Submatches(2))
    MfgID = Trim(oMatch.Submatches(3))
    Price = Trim(oMatch.Submatches(4))
    
    ' now add the stuff to the listbox
    With Sheet1.ListBox1
    .ColumnCount = 5
    .ColumnWidths = "100;60;60;60;60"
    .AddItem
    .List(x, 0) = Description
    .List(x, 1) = Mfg
    .List(x, 2) = GrangerID
    .List(x, 3) = MfgID
    .List(x, 4) = Price
    x = x + 1
End With

Open in new window

0
 
LVL 3

Author Comment

by:fordraiders
ID: 39863653
no luck terry,

this is my try andf no luck also.
and the end of the regex string.

.*?<p class=""productPrice""><span class=""priceLabel"">Price:(.*?)</span>
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 39863710
Try this. The price is after the </span> tag rather than before, like your pattern uses. I've run this through a regex tester and it seems to work.
.*?<span class=""priceLabel"">Price:</span>\s*(\$[\d,.]+)

Open in new window

0
Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

 
LVL 3

Author Comment

by:fordraiders
ID: 39863998
terry, sorry still not returning for me..
0
 
LVL 45

Accepted Solution

by:
aikimark earned 500 total points
ID: 39864074
This is your regex pattern:
<a href.*>(.*?)</a></p><p class="productBrand">(.*?)</p>.*?<span class="productInfoValueList">(.*?)</span>.*?<span class="productInfoValueList">(.*?)</span>(?:.|\n)*?<span class="priceLabel">Price:</span>\s*(\$[0-9.]+)

Open in new window

So, your line 20 should read:
oRE.Pattern = "<a href.*>(.*?)</a></p><p class=""productBrand"">(.*?)</p>.*?<span class=""productInfoValueList"">(.*?)</span>.*?<span class=""productInfoValueList"">(.*?)</span>(?:.|\n)*?<span class=""priceLabel"">Price:</span>\s*(\$[0-9.]+)"

Open in new window

0
 
LVL 3

Author Comment

by:fordraiders
ID: 39864906
Worked great

aikimark, what program r u using to test the expression. If I can ask please?
0
 
LVL 45

Expert Comment

by:aikimark
ID: 39865017
what program r u using...
A VBScript-powered HTA that I wrote.
I also used Notepad++ to look at the actual text in the HTML source.

In this particular case, I also played with different patterns at http://www.myregextester.com
0
 
LVL 3

Author Closing Comment

by:fordraiders
ID: 39865069
Thanks, Very much !!
0
 
LVL 45

Expert Comment

by:aikimark
ID: 39865155
You're welcome.

You might want to try this pattern.  It runs faster than the one I posted earlier.
<a href=.*?>([^<]*)</a></p><p class="productBrand">([^<]*)</p>.*?<span class="productInfoValueList">([^<]*)</span>.*?<span class="productInfoValueList">([^<]*)</span>(?:.|\n)*?<span class="priceLabel">Price:</span>\s*(\$[0-9.]+)

Open in new window

0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

A little background as to how I came to I design this code: Around 5 years ago I designed an add-in that formatted Excel files to a corporate standard, applying different cell colours and font type depending on whether the cells contained inputs,…
Workbook link problems after copying tabs to a new workbook? David Miller (dlmille) Intro Have you either copied sheets to a new workbook, and after having saved and opened that workbook, you find that there are links back to the original sou…
Graphs within dashboards are meant to be dynamic, representing data from a period of time that will change each time the dashboard is updated with new data. Rather than update each graph to point to a different set within a static set of data, t…
This Micro Tutorial demonstrates how to create Excel charts: column, area, line, bar, and scatter charts. Formatting tips are provided as well.

792 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question