?
Solved

adding data To regex expression for pulling web data

Posted on 2014-02-16
10
Medium Priority
?
257 Views
Last Modified: 2014-02-17
What I need:
previous question:
http://www.experts-exchange.com/Software/Office_Productivity/Office_Suites/MS_Office/Excel/Q_28364510.html#a39862768


To Add additional information to regex expression.

the html tag...is.
<p class="productPrice"><span class="priceLabel">Price:</span> $39.10


from the same url as indicated below I need to obtain the price for each item in the row.

Answer to previous question:
Function testEERegex()
Dim oRE
Dim oMatches
Dim oMatch
Dim Description, Mfg, GrangerID, MfgID
Dim str As String
Dim url
Dim xml As Object ' MSXML2.XMLHTTP
Dim result As String
Dim x As Integer





url = "http://www.grainger.com/search?searchQuery=RIP CLAW HAMMER 16"

Set oRE = CreateObject("vbscript.regexp")
oRE.Global = True
oRE.Pattern = "<a href.*>(.*?)</a></p><p class=""productBrand"">(.*?)</p>.*?<span class=""productInfoValueList"">(.*?)</span>.*?<span class=""productInfoValueList"">(.*?)</span>"

 
  Set xml = GetMSXML
 
  ' grab webpage
  With xml
    .Open "GET", url, False
    .send
  End With
 
  str = xml.responseText

Set oMatches = oRE.Execute(str)

Sheet1.ListBox1.Clear

x = 0
For Each oMatch In oMatches
    
    Description = Trim(oMatch.Submatches(0))
    Mfg = Trim(oMatch.Submatches(1))
    GrangerID = Trim(oMatch.Submatches(2))
    MfgID = Trim(oMatch.Submatches(3))
    
    ' now add the stuff to the listbox
    With Sheet1.ListBox1
    .ColumnCount = 4
    .ColumnWidths = "100;60;60;60"
    .AddItem
    .List(x, 0) = Description
    .List(x, 1) = Mfg
    .List(x, 2) = GrangerID
    .List(x, 3) = MfgID
    x = x + 1
End With

Open in new window



Make a Pass through ALL the html data:
Example:
Column

  B             C          D                                  E                                                                 F
6R252 STANLEY 51-616     Claw Hammer, 16 Oz, Polished, Hickory                    $36.10
6XV65 STANLEY 51-942     Rip Claw Hammer, 16 Oz, Smooth, Steel                    $54.23

Thanks
 fordraiders
0
Comment
Question by:fordraiders
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
  • 2
10 Comments
 
LVL 35

Expert Comment

by:Terry Woods
ID: 39863409
Try this adjustment to your code:

Function testEERegex()
Dim oRE
Dim oMatches
Dim oMatch
Dim Description, Mfg, GrangerID, MfgID, Price
Dim str As String
Dim url
Dim xml As Object ' MSXML2.XMLHTTP
Dim result As String
Dim x As Integer





url = "http://www.grainger.com/search?searchQuery=RIP CLAW HAMMER 16"

Set oRE = CreateObject("vbscript.regexp")
oRE.Global = True
oRE.Pattern = "<a href.*>(.*?)</a></p><p class=""productBrand"">(.*?)</p>.*?<span class=""productInfoValueList"">(.*?)</span>.*?<span class=""productInfoValueList"">(.*?)</span>.*?<span class="priceLabel">Price:</span>[\s""]*(\$[\d,.]+)"
 
  Set xml = GetMSXML
 
  ' grab webpage
  With xml
    .Open "GET", url, False
    .send
  End With
 
  str = xml.responseText

Set oMatches = oRE.Execute(str)

Sheet1.ListBox1.Clear

x = 0
For Each oMatch In oMatches
    
    Description = Trim(oMatch.Submatches(0))
    Mfg = Trim(oMatch.Submatches(1))
    GrangerID = Trim(oMatch.Submatches(2))
    MfgID = Trim(oMatch.Submatches(3))
    Price = Trim(oMatch.Submatches(4))
    
    ' now add the stuff to the listbox
    With Sheet1.ListBox1
    .ColumnCount = 5
    .ColumnWidths = "100;60;60;60;60"
    .AddItem
    .List(x, 0) = Description
    .List(x, 1) = Mfg
    .List(x, 2) = GrangerID
    .List(x, 3) = MfgID
    .List(x, 4) = Price
    x = x + 1
End With

Open in new window

0
 
LVL 3

Author Comment

by:fordraiders
ID: 39863653
no luck terry,

this is my try andf no luck also.
and the end of the regex string.

.*?<p class=""productPrice""><span class=""priceLabel"">Price:(.*?)</span>
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 39863710
Try this. The price is after the </span> tag rather than before, like your pattern uses. I've run this through a regex tester and it seems to work.
.*?<span class=""priceLabel"">Price:</span>\s*(\$[\d,.]+)

Open in new window

0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
LVL 3

Author Comment

by:fordraiders
ID: 39863998
terry, sorry still not returning for me..
0
 
LVL 46

Accepted Solution

by:
aikimark earned 2000 total points
ID: 39864074
This is your regex pattern:
<a href.*>(.*?)</a></p><p class="productBrand">(.*?)</p>.*?<span class="productInfoValueList">(.*?)</span>.*?<span class="productInfoValueList">(.*?)</span>(?:.|\n)*?<span class="priceLabel">Price:</span>\s*(\$[0-9.]+)

Open in new window

So, your line 20 should read:
oRE.Pattern = "<a href.*>(.*?)</a></p><p class=""productBrand"">(.*?)</p>.*?<span class=""productInfoValueList"">(.*?)</span>.*?<span class=""productInfoValueList"">(.*?)</span>(?:.|\n)*?<span class=""priceLabel"">Price:</span>\s*(\$[0-9.]+)"

Open in new window

0
 
LVL 3

Author Comment

by:fordraiders
ID: 39864906
Worked great

aikimark, what program r u using to test the expression. If I can ask please?
0
 
LVL 46

Expert Comment

by:aikimark
ID: 39865017
what program r u using...
A VBScript-powered HTA that I wrote.
I also used Notepad++ to look at the actual text in the HTML source.

In this particular case, I also played with different patterns at http://www.myregextester.com
0
 
LVL 3

Author Closing Comment

by:fordraiders
ID: 39865069
Thanks, Very much !!
0
 
LVL 46

Expert Comment

by:aikimark
ID: 39865155
You're welcome.

You might want to try this pattern.  It runs faster than the one I posted earlier.
<a href=.*?>([^<]*)</a></p><p class="productBrand">([^<]*)</p>.*?<span class="productInfoValueList">([^<]*)</span>.*?<span class="productInfoValueList">([^<]*)</span>(?:.|\n)*?<span class="priceLabel">Price:</span>\s*(\$[0-9.]+)

Open in new window

0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

A little background as to how I came to I design this code: Around 5 years ago I designed an add-in that formatted Excel files to a corporate standard, applying different cell colours and font type depending on whether the cells contained inputs,…
Excel can be a tricky bit of software to get your head around. Whilst you’ll be able to eventually get to grips with the basic understanding of how to get by, there are a few Excel tips that not everybody will even know about let alone know how to d…
The viewer will learn how to use a discrete random variable to simulate the return on an investment over a period of years, create a Monte Carlo simulation using the discrete random variable, and create a graph to represent the possible returns over…
This Micro Tutorial will demonstrate how to use a scrolling table in Microsoft Excel using the INDEX function.

777 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question