Solved

adding data To regex expression for pulling web data

Posted on 2014-02-16
10
251 Views
Last Modified: 2014-02-17
What I need:
previous question:
http://www.experts-exchange.com/Software/Office_Productivity/Office_Suites/MS_Office/Excel/Q_28364510.html#a39862768


To Add additional information to regex expression.

the html tag...is.
<p class="productPrice"><span class="priceLabel">Price:</span> $39.10


from the same url as indicated below I need to obtain the price for each item in the row.

Answer to previous question:
Function testEERegex()
Dim oRE
Dim oMatches
Dim oMatch
Dim Description, Mfg, GrangerID, MfgID
Dim str As String
Dim url
Dim xml As Object ' MSXML2.XMLHTTP
Dim result As String
Dim x As Integer





url = "http://www.grainger.com/search?searchQuery=RIP CLAW HAMMER 16"

Set oRE = CreateObject("vbscript.regexp")
oRE.Global = True
oRE.Pattern = "<a href.*>(.*?)</a></p><p class=""productBrand"">(.*?)</p>.*?<span class=""productInfoValueList"">(.*?)</span>.*?<span class=""productInfoValueList"">(.*?)</span>"

 
  Set xml = GetMSXML
 
  ' grab webpage
  With xml
    .Open "GET", url, False
    .send
  End With
 
  str = xml.responseText

Set oMatches = oRE.Execute(str)

Sheet1.ListBox1.Clear

x = 0
For Each oMatch In oMatches
    
    Description = Trim(oMatch.Submatches(0))
    Mfg = Trim(oMatch.Submatches(1))
    GrangerID = Trim(oMatch.Submatches(2))
    MfgID = Trim(oMatch.Submatches(3))
    
    ' now add the stuff to the listbox
    With Sheet1.ListBox1
    .ColumnCount = 4
    .ColumnWidths = "100;60;60;60"
    .AddItem
    .List(x, 0) = Description
    .List(x, 1) = Mfg
    .List(x, 2) = GrangerID
    .List(x, 3) = MfgID
    x = x + 1
End With

Open in new window



Make a Pass through ALL the html data:
Example:
Column

  B             C          D                                  E                                                                 F
6R252 STANLEY 51-616     Claw Hammer, 16 Oz, Polished, Hickory                    $36.10
6XV65 STANLEY 51-942     Rip Claw Hammer, 16 Oz, Smooth, Steel                    $54.23

Thanks
 fordraiders
0
Comment
Question by:fordraiders
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
  • 2
10 Comments
 
LVL 35

Expert Comment

by:Terry Woods
ID: 39863409
Try this adjustment to your code:

Function testEERegex()
Dim oRE
Dim oMatches
Dim oMatch
Dim Description, Mfg, GrangerID, MfgID, Price
Dim str As String
Dim url
Dim xml As Object ' MSXML2.XMLHTTP
Dim result As String
Dim x As Integer





url = "http://www.grainger.com/search?searchQuery=RIP CLAW HAMMER 16"

Set oRE = CreateObject("vbscript.regexp")
oRE.Global = True
oRE.Pattern = "<a href.*>(.*?)</a></p><p class=""productBrand"">(.*?)</p>.*?<span class=""productInfoValueList"">(.*?)</span>.*?<span class=""productInfoValueList"">(.*?)</span>.*?<span class="priceLabel">Price:</span>[\s""]*(\$[\d,.]+)"
 
  Set xml = GetMSXML
 
  ' grab webpage
  With xml
    .Open "GET", url, False
    .send
  End With
 
  str = xml.responseText

Set oMatches = oRE.Execute(str)

Sheet1.ListBox1.Clear

x = 0
For Each oMatch In oMatches
    
    Description = Trim(oMatch.Submatches(0))
    Mfg = Trim(oMatch.Submatches(1))
    GrangerID = Trim(oMatch.Submatches(2))
    MfgID = Trim(oMatch.Submatches(3))
    Price = Trim(oMatch.Submatches(4))
    
    ' now add the stuff to the listbox
    With Sheet1.ListBox1
    .ColumnCount = 5
    .ColumnWidths = "100;60;60;60;60"
    .AddItem
    .List(x, 0) = Description
    .List(x, 1) = Mfg
    .List(x, 2) = GrangerID
    .List(x, 3) = MfgID
    .List(x, 4) = Price
    x = x + 1
End With

Open in new window

0
 
LVL 3

Author Comment

by:fordraiders
ID: 39863653
no luck terry,

this is my try andf no luck also.
and the end of the regex string.

.*?<p class=""productPrice""><span class=""priceLabel"">Price:(.*?)</span>
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 39863710
Try this. The price is after the </span> tag rather than before, like your pattern uses. I've run this through a regex tester and it seems to work.
.*?<span class=""priceLabel"">Price:</span>\s*(\$[\d,.]+)

Open in new window

0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 3

Author Comment

by:fordraiders
ID: 39863998
terry, sorry still not returning for me..
0
 
LVL 45

Accepted Solution

by:
aikimark earned 500 total points
ID: 39864074
This is your regex pattern:
<a href.*>(.*?)</a></p><p class="productBrand">(.*?)</p>.*?<span class="productInfoValueList">(.*?)</span>.*?<span class="productInfoValueList">(.*?)</span>(?:.|\n)*?<span class="priceLabel">Price:</span>\s*(\$[0-9.]+)

Open in new window

So, your line 20 should read:
oRE.Pattern = "<a href.*>(.*?)</a></p><p class=""productBrand"">(.*?)</p>.*?<span class=""productInfoValueList"">(.*?)</span>.*?<span class=""productInfoValueList"">(.*?)</span>(?:.|\n)*?<span class=""priceLabel"">Price:</span>\s*(\$[0-9.]+)"

Open in new window

0
 
LVL 3

Author Comment

by:fordraiders
ID: 39864906
Worked great

aikimark, what program r u using to test the expression. If I can ask please?
0
 
LVL 45

Expert Comment

by:aikimark
ID: 39865017
what program r u using...
A VBScript-powered HTA that I wrote.
I also used Notepad++ to look at the actual text in the HTML source.

In this particular case, I also played with different patterns at http://www.myregextester.com
0
 
LVL 3

Author Closing Comment

by:fordraiders
ID: 39865069
Thanks, Very much !!
0
 
LVL 45

Expert Comment

by:aikimark
ID: 39865155
You're welcome.

You might want to try this pattern.  It runs faster than the one I posted earlier.
<a href=.*?>([^<]*)</a></p><p class="productBrand">([^<]*)</p>.*?<span class="productInfoValueList">([^<]*)</span>.*?<span class="productInfoValueList">([^<]*)</span>(?:.|\n)*?<span class="priceLabel">Price:</span>\s*(\$[0-9.]+)

Open in new window

0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article will guide you to convert a grid from a picture into Excel format using Microsoft OneNote and no other 3rd party application.
Excel can be a tricky bit of software to get your head around. Whilst you’ll be able to eventually get to grips with the basic understanding of how to get by, there are a few Excel tips that not everybody will even know about let alone know how to d…
Graphs within dashboards are meant to be dynamic, representing data from a period of time that will change each time the dashboard is updated with new data. Rather than update each graph to point to a different set within a static set of data, t…
Finds all prime numbers in a range requested and places them in a public primes() array. I've demostrated a template size of 30 (2 * 3 * 5) but larger templates can be built such 210  (2 * 3 * 5 * 7) or 2310  (2 * 3 * 5 * 7 * 11). The larger templa…

737 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question