Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

Regular expression to extract a string

Posted on 2014-09-12
15
Medium Priority
?
211 Views
Last Modified: 2014-09-17
I need to extract 1.50 in this string from the source code of the web page, or the entire line and then I'll parse it down to the 1.50.
<\/span>\n\n\n"},"MNN$":{"9303565":"$1.50"},"WARRANTY":{}}),

I am using Excel VBA.

Here is the function

Sub ExtractData()

Dim regEx
Dim i As Long
Dim pattern As String
Set regEx = CreateObject("VBScript.RegExp")
regEx.IgnoreCase = True
regEx.Global = True
regEx.pattern = "[what do I put in here?]"


End Sub
0
Comment
Question by:rrhandle8
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 8
  • 7
15 Comments
 

Author Comment

by:rrhandle8
ID: 40319693
I should have add this clue. The string below is unique in the document

"MNN$":{"9303565":"$1.50"}
0
 
LVL 46

Expert Comment

by:aikimark
ID: 40319764
include the quotes in the expression.  You will need to double the quotes inside a string.
"$(\d+\.\d\d)"

Open in new window

0
 

Author Comment

by:rrhandle8
ID: 40319789
First I need to extract the entire line from the html document.
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
LVL 46

Expert Comment

by:aikimark
ID: 40319812
no, you don't
0
 
LVL 46

Expert Comment

by:aikimark
ID: 40319816
If you need just that one item, among many similar items, use this pattern
{"9303565":"$(\d+\.\d\d)"}

Open in new window

0
 

Author Comment

by:rrhandle8
ID: 40319866
OK. Thanks. I will try it.
0
 

Author Comment

by:rrhandle8
ID: 40320007
regEx.pattern = "{"9303565":"$(\d+\.\d\d)"}"

Expect end of statement error
0
 
LVL 46

Expert Comment

by:aikimark
ID: 40320039
As I stated earlier, in order for a string literal to contain quote characters, the internal quote characters need to be doubled.
regEx.pattern = "{""9303565"":""$(\d+\.\d\d)""}"

Open in new window

0
 

Author Comment

by:rrhandle8
ID: 40320063
That didn't work either.  
The 9303565 is a unique number, so I can understand why it does work on the items I an feeding.
The line would be like "MNN$":{"9303565":"$1.50"} where 9303565 changes, then the MNN$":{ is unique within the document.
0
 
LVL 46

Expert Comment

by:aikimark
ID: 40320083
That didn't work either.
Are you still getting an error message?
What string are you using for your pattern matching?  Are you sure it contains the 9303565 data?
0
 
LVL 46

Expert Comment

by:aikimark
ID: 40320104
I'm stepping away from the keyboard for a while.
0
 

Author Comment

by:rrhandle8
ID: 40320131
9303565 is a unique number that changes on each web page.  I am looking for the $1.50 that is to the right of it.
0
 
LVL 46

Expert Comment

by:aikimark
ID: 40320332
the 1.50 will be in submatches(0)

If the number changes, then I should be able to alter the pattern to work in the general case if you tell me more about those values.
0
 

Author Comment

by:rrhandle8
ID: 40321186
aikimark,  Thanks for the help.  I solved the problem using a different technique, but I would like to know how to do this with regular expressions.

There is only one line in the html that contains MNN$":{"9303565":"$1.50"}
The only thing that changes is the long number in the middle and the price.
There is always a long number in the middle, and a price at the end.
I have discovered that I need to extract long number in the middle and the price.
The regular expression should return (in this case) 9303565 and 1.50.
0
 
LVL 46

Accepted Solution

by:
aikimark earned 2000 total points
ID: 40321388
If it is the only price (currently formatted) value in these web pages, you can use the original pattern I posted. http:#a40319764


If you need to expand the match out to the curly brackets, you can use a modified version of what I posted in this comment: http:#a40320039
regEx.pattern = "{""\d+"":""$(\d+\.\d\d)""}"

Open in new window

In both cases, Submatches(0) contains the data you seek.
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Freeze panes is an option within all variants of Excel to enable parts of a sheet to remain stationary when the cursor is in another part of the sheet. This is a very useful feature which is overlooked or under used.
Excel can be a tricky bit of software to get your head around. Whilst you’ll be able to eventually get to grips with the basic understanding of how to get by, there are a few Excel tips that not everybody will even know about let alone know how to d…
This Micro Tutorial demonstrates how to create Excel charts: column, area, line, bar, and scatter charts. Formatting tips are provided as well.
Excel styles will make formatting consistent and let you apply and change formatting faster. In this tutorial, you'll learn how to use Excel's built-in styles, how to modify styles, and how to create your own. You'll also learn how to use your custo…

721 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question