Solved

adding data To regex expression for pulling web data

Posted on 2014-02-16
10
240 Views
Last Modified: 2014-02-17
What I need:
previous question:
http://www.experts-exchange.com/Software/Office_Productivity/Office_Suites/MS_Office/Excel/Q_28364510.html#a39862768


To Add additional information to regex expression.

the html tag...is.
<p class="productPrice"><span class="priceLabel">Price:</span> $39.10


from the same url as indicated below I need to obtain the price for each item in the row.

Answer to previous question:
Function testEERegex()
Dim oRE
Dim oMatches
Dim oMatch
Dim Description, Mfg, GrangerID, MfgID
Dim str As String
Dim url
Dim xml As Object ' MSXML2.XMLHTTP
Dim result As String
Dim x As Integer





url = "http://www.grainger.com/search?searchQuery=RIP CLAW HAMMER 16"

Set oRE = CreateObject("vbscript.regexp")
oRE.Global = True
oRE.Pattern = "<a href.*>(.*?)</a></p><p class=""productBrand"">(.*?)</p>.*?<span class=""productInfoValueList"">(.*?)</span>.*?<span class=""productInfoValueList"">(.*?)</span>"

 
  Set xml = GetMSXML
 
  ' grab webpage
  With xml
    .Open "GET", url, False
    .send
  End With
 
  str = xml.responseText

Set oMatches = oRE.Execute(str)

Sheet1.ListBox1.Clear

x = 0
For Each oMatch In oMatches
    
    Description = Trim(oMatch.Submatches(0))
    Mfg = Trim(oMatch.Submatches(1))
    GrangerID = Trim(oMatch.Submatches(2))
    MfgID = Trim(oMatch.Submatches(3))
    
    ' now add the stuff to the listbox
    With Sheet1.ListBox1
    .ColumnCount = 4
    .ColumnWidths = "100;60;60;60"
    .AddItem
    .List(x, 0) = Description
    .List(x, 1) = Mfg
    .List(x, 2) = GrangerID
    .List(x, 3) = MfgID
    x = x + 1
End With

Open in new window



Make a Pass through ALL the html data:
Example:
Column

  B             C          D                                  E                                                                 F
6R252 STANLEY 51-616     Claw Hammer, 16 Oz, Polished, Hickory                    $36.10
6XV65 STANLEY 51-942     Rip Claw Hammer, 16 Oz, Smooth, Steel                    $54.23

Thanks
 fordraiders
0
Comment
Question by:fordraiders
  • 4
  • 3
  • 2
10 Comments
 
LVL 35

Expert Comment

by:Terry Woods
ID: 39863409
Try this adjustment to your code:

Function testEERegex()
Dim oRE
Dim oMatches
Dim oMatch
Dim Description, Mfg, GrangerID, MfgID, Price
Dim str As String
Dim url
Dim xml As Object ' MSXML2.XMLHTTP
Dim result As String
Dim x As Integer





url = "http://www.grainger.com/search?searchQuery=RIP CLAW HAMMER 16"

Set oRE = CreateObject("vbscript.regexp")
oRE.Global = True
oRE.Pattern = "<a href.*>(.*?)</a></p><p class=""productBrand"">(.*?)</p>.*?<span class=""productInfoValueList"">(.*?)</span>.*?<span class=""productInfoValueList"">(.*?)</span>.*?<span class="priceLabel">Price:</span>[\s""]*(\$[\d,.]+)"
 
  Set xml = GetMSXML
 
  ' grab webpage
  With xml
    .Open "GET", url, False
    .send
  End With
 
  str = xml.responseText

Set oMatches = oRE.Execute(str)

Sheet1.ListBox1.Clear

x = 0
For Each oMatch In oMatches
    
    Description = Trim(oMatch.Submatches(0))
    Mfg = Trim(oMatch.Submatches(1))
    GrangerID = Trim(oMatch.Submatches(2))
    MfgID = Trim(oMatch.Submatches(3))
    Price = Trim(oMatch.Submatches(4))
    
    ' now add the stuff to the listbox
    With Sheet1.ListBox1
    .ColumnCount = 5
    .ColumnWidths = "100;60;60;60;60"
    .AddItem
    .List(x, 0) = Description
    .List(x, 1) = Mfg
    .List(x, 2) = GrangerID
    .List(x, 3) = MfgID
    .List(x, 4) = Price
    x = x + 1
End With

Open in new window

0
 
LVL 3

Author Comment

by:fordraiders
ID: 39863653
no luck terry,

this is my try andf no luck also.
and the end of the regex string.

.*?<p class=""productPrice""><span class=""priceLabel"">Price:(.*?)</span>
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 39863710
Try this. The price is after the </span> tag rather than before, like your pattern uses. I've run this through a regex tester and it seems to work.
.*?<span class=""priceLabel"">Price:</span>\s*(\$[\d,.]+)

Open in new window

0
 
LVL 3

Author Comment

by:fordraiders
ID: 39863998
terry, sorry still not returning for me..
0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 
LVL 45

Accepted Solution

by:
aikimark earned 500 total points
ID: 39864074
This is your regex pattern:
<a href.*>(.*?)</a></p><p class="productBrand">(.*?)</p>.*?<span class="productInfoValueList">(.*?)</span>.*?<span class="productInfoValueList">(.*?)</span>(?:.|\n)*?<span class="priceLabel">Price:</span>\s*(\$[0-9.]+)

Open in new window

So, your line 20 should read:
oRE.Pattern = "<a href.*>(.*?)</a></p><p class=""productBrand"">(.*?)</p>.*?<span class=""productInfoValueList"">(.*?)</span>.*?<span class=""productInfoValueList"">(.*?)</span>(?:.|\n)*?<span class=""priceLabel"">Price:</span>\s*(\$[0-9.]+)"

Open in new window

0
 
LVL 3

Author Comment

by:fordraiders
ID: 39864906
Worked great

aikimark, what program r u using to test the expression. If I can ask please?
0
 
LVL 45

Expert Comment

by:aikimark
ID: 39865017
what program r u using...
A VBScript-powered HTA that I wrote.
I also used Notepad++ to look at the actual text in the HTML source.

In this particular case, I also played with different patterns at http://www.myregextester.com
0
 
LVL 3

Author Closing Comment

by:fordraiders
ID: 39865069
Thanks, Very much !!
0
 
LVL 45

Expert Comment

by:aikimark
ID: 39865155
You're welcome.

You might want to try this pattern.  It runs faster than the one I posted earlier.
<a href=.*?>([^<]*)</a></p><p class="productBrand">([^<]*)</p>.*?<span class="productInfoValueList">([^<]*)</span>.*?<span class="productInfoValueList">([^<]*)</span>(?:.|\n)*?<span class="priceLabel">Price:</span>\s*(\$[0-9.]+)

Open in new window

0

Featured Post

How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

Join & Write a Comment

A little background as to how I came to I design this code: Around 5 years ago I designed an add-in that formatted Excel files to a corporate standard, applying different cell colours and font type depending on whether the cells contained inputs,…
Drop Down List with Unique/Distinct Values (Part II - ComboBox or ListBox and Data Validation List Bonus!) David Miller (dlmille) Intro This article focuses on delivering unique, sorted lists to list objects (e.g., ComboBox, ListBox) and Dat…
Viewers will learn the basics of slicers and timelines for both PivotTables and standard Excel tables in Excel 2013.
This Micro Tutorial will demonstrate the scrolling table in Microsoft Excel using the INDEX function.

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now