Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

adding data To regex expression for pulling web data

Posted on 2014-02-16
10
Medium Priority
?
259 Views
Last Modified: 2014-02-17
What I need:
previous question:
http://www.experts-exchange.com/Software/Office_Productivity/Office_Suites/MS_Office/Excel/Q_28364510.html#a39862768


To Add additional information to regex expression.

the html tag...is.
<p class="productPrice"><span class="priceLabel">Price:</span> $39.10


from the same url as indicated below I need to obtain the price for each item in the row.

Answer to previous question:
Function testEERegex()
Dim oRE
Dim oMatches
Dim oMatch
Dim Description, Mfg, GrangerID, MfgID
Dim str As String
Dim url
Dim xml As Object ' MSXML2.XMLHTTP
Dim result As String
Dim x As Integer





url = "http://www.grainger.com/search?searchQuery=RIP CLAW HAMMER 16"

Set oRE = CreateObject("vbscript.regexp")
oRE.Global = True
oRE.Pattern = "<a href.*>(.*?)</a></p><p class=""productBrand"">(.*?)</p>.*?<span class=""productInfoValueList"">(.*?)</span>.*?<span class=""productInfoValueList"">(.*?)</span>"

 
  Set xml = GetMSXML
 
  ' grab webpage
  With xml
    .Open "GET", url, False
    .send
  End With
 
  str = xml.responseText

Set oMatches = oRE.Execute(str)

Sheet1.ListBox1.Clear

x = 0
For Each oMatch In oMatches
    
    Description = Trim(oMatch.Submatches(0))
    Mfg = Trim(oMatch.Submatches(1))
    GrangerID = Trim(oMatch.Submatches(2))
    MfgID = Trim(oMatch.Submatches(3))
    
    ' now add the stuff to the listbox
    With Sheet1.ListBox1
    .ColumnCount = 4
    .ColumnWidths = "100;60;60;60"
    .AddItem
    .List(x, 0) = Description
    .List(x, 1) = Mfg
    .List(x, 2) = GrangerID
    .List(x, 3) = MfgID
    x = x + 1
End With

Open in new window



Make a Pass through ALL the html data:
Example:
Column

  B             C          D                                  E                                                                 F
6R252 STANLEY 51-616     Claw Hammer, 16 Oz, Polished, Hickory                    $36.10
6XV65 STANLEY 51-942     Rip Claw Hammer, 16 Oz, Smooth, Steel                    $54.23

Thanks
 fordraiders
0
Comment
Question by:fordraiders
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
  • 2
10 Comments
 
LVL 35

Expert Comment

by:Terry Woods
ID: 39863409
Try this adjustment to your code:

Function testEERegex()
Dim oRE
Dim oMatches
Dim oMatch
Dim Description, Mfg, GrangerID, MfgID, Price
Dim str As String
Dim url
Dim xml As Object ' MSXML2.XMLHTTP
Dim result As String
Dim x As Integer





url = "http://www.grainger.com/search?searchQuery=RIP CLAW HAMMER 16"

Set oRE = CreateObject("vbscript.regexp")
oRE.Global = True
oRE.Pattern = "<a href.*>(.*?)</a></p><p class=""productBrand"">(.*?)</p>.*?<span class=""productInfoValueList"">(.*?)</span>.*?<span class=""productInfoValueList"">(.*?)</span>.*?<span class="priceLabel">Price:</span>[\s""]*(\$[\d,.]+)"
 
  Set xml = GetMSXML
 
  ' grab webpage
  With xml
    .Open "GET", url, False
    .send
  End With
 
  str = xml.responseText

Set oMatches = oRE.Execute(str)

Sheet1.ListBox1.Clear

x = 0
For Each oMatch In oMatches
    
    Description = Trim(oMatch.Submatches(0))
    Mfg = Trim(oMatch.Submatches(1))
    GrangerID = Trim(oMatch.Submatches(2))
    MfgID = Trim(oMatch.Submatches(3))
    Price = Trim(oMatch.Submatches(4))
    
    ' now add the stuff to the listbox
    With Sheet1.ListBox1
    .ColumnCount = 5
    .ColumnWidths = "100;60;60;60;60"
    .AddItem
    .List(x, 0) = Description
    .List(x, 1) = Mfg
    .List(x, 2) = GrangerID
    .List(x, 3) = MfgID
    .List(x, 4) = Price
    x = x + 1
End With

Open in new window

0
 
LVL 3

Author Comment

by:fordraiders
ID: 39863653
no luck terry,

this is my try andf no luck also.
and the end of the regex string.

.*?<p class=""productPrice""><span class=""priceLabel"">Price:(.*?)</span>
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 39863710
Try this. The price is after the </span> tag rather than before, like your pattern uses. I've run this through a regex tester and it seems to work.
.*?<span class=""priceLabel"">Price:</span>\s*(\$[\d,.]+)

Open in new window

0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 
LVL 3

Author Comment

by:fordraiders
ID: 39863998
terry, sorry still not returning for me..
0
 
LVL 46

Accepted Solution

by:
aikimark earned 2000 total points
ID: 39864074
This is your regex pattern:
<a href.*>(.*?)</a></p><p class="productBrand">(.*?)</p>.*?<span class="productInfoValueList">(.*?)</span>.*?<span class="productInfoValueList">(.*?)</span>(?:.|\n)*?<span class="priceLabel">Price:</span>\s*(\$[0-9.]+)

Open in new window

So, your line 20 should read:
oRE.Pattern = "<a href.*>(.*?)</a></p><p class=""productBrand"">(.*?)</p>.*?<span class=""productInfoValueList"">(.*?)</span>.*?<span class=""productInfoValueList"">(.*?)</span>(?:.|\n)*?<span class=""priceLabel"">Price:</span>\s*(\$[0-9.]+)"

Open in new window

0
 
LVL 3

Author Comment

by:fordraiders
ID: 39864906
Worked great

aikimark, what program r u using to test the expression. If I can ask please?
0
 
LVL 46

Expert Comment

by:aikimark
ID: 39865017
what program r u using...
A VBScript-powered HTA that I wrote.
I also used Notepad++ to look at the actual text in the HTML source.

In this particular case, I also played with different patterns at http://www.myregextester.com
0
 
LVL 3

Author Closing Comment

by:fordraiders
ID: 39865069
Thanks, Very much !!
0
 
LVL 46

Expert Comment

by:aikimark
ID: 39865155
You're welcome.

You might want to try this pattern.  It runs faster than the one I posted earlier.
<a href=.*?>([^<]*)</a></p><p class="productBrand">([^<]*)</p>.*?<span class="productInfoValueList">([^<]*)</span>.*?<span class="productInfoValueList">([^<]*)</span>(?:.|\n)*?<span class="priceLabel">Price:</span>\s*(\$[0-9.]+)

Open in new window

0

Featured Post

Tech or Treat!

Submit an article about your scariest tech experience—and the solution—and you’ll be automatically entered to win one of 4 fantastic tech gadgets.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Freeze panes is an option within all variants of Excel to enable parts of a sheet to remain stationary when the cursor is in another part of the sheet. This is a very useful feature which is overlooked or under used.
This article describes how to use a set of graphical playing cards to create a Draw Poker game in Excel or VB6.
The viewer will learn how to create two correlated normally distributed random variables in Excel, use a normal distribution to simulate the return on different levels of investment in each of the two funds over a period of ten years, and, create a …
Finds all prime numbers in a range requested and places them in a public primes() array. I've demostrated a template size of 30 (2 * 3 * 5) but larger templates can be built such 210  (2 * 3 * 5 * 7) or 2310  (2 * 3 * 5 * 7 * 11). The larger templa…

609 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question