Solved

regex pattern not quite right....

Posted on 2003-10-23
8
294 Views
Last Modified: 2010-04-17
I have a block of text like this:
"
just a test advert phone <!-- param1: 999 param2: 3  -->&lt;<a href="javascript:;">Secure&nbsp;Number</a>&gt; or <!-- param1: 123  param2: 3  -->&lt;&nbsp;<a href="javascript:;">Secure&nbsp;Number</a>&gt;
"

I'm attempting to grab 2 matches here, isolating 2 parameters at the same time param1 and param2
Here's my pattern:
<!-- param1:\s*(?<phone>.*)param2:\s*(?<expiry>.*)\s*-->.*Secure&nbsp;Number(</a>)?&gt;

My first match isn't ending at the first &gt; it is ending at the last one, example 1st match is:
<!-- param1: 999 param2: 3  -->&lt;<a href="javascript:;">Secure&nbsp;Number</a>&gt; or <!-- param1: 123  param2: 3  -->&lt;&nbsp;<a href="javascript:;">Secure&nbsp;Number</a>&gt;

when it should be:
<!-- param1: 999 param2: 3  -->&lt;<a href="javascript:;">Secure&nbsp;Number</a>&gt;

When I place a carriage return between the 2 parts it works fine.  How can I make the first match stop at the first &gt; ?
Thanks
0
Comment
Question by:joegass
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 2
  • 2
8 Comments
 
LVL 9

Expert Comment

by:malharone
ID: 9609496
first of all , you'll find http://www.codeproject.com/dotnet/Expresso.asp useful
0
 
LVL 9

Expert Comment

by:malharone
ID: 9609803
and second of all ....
(</?a>)? (<!-- \s* (param\d+: \s* (\d+) \s*)+ --\> &lt?;?)+ .*? Secure? &nbsp;Number(</?a>)? &gt; (\s* or \s*)?


hope this helps
0
 
LVL 2

Author Comment

by:joegass
ID: 9641321
Sorry for my delay in getting back to you!
Thanks for the link to expresso
I'm still not having any luck with that expression, I need to name 2 groups in my regex, 1 called phone the other called expiry

<!-- param1:\s*(?<phone>.*)param2:\s*(?<expiry>.*)\s*-->.*Secure&nbsp;Number(</a>)?&gt;

The one above works OK when it is the only match on the line, but if the isn't a line break between the patterns in the text it fails to end at the first &gt; but seems to skip to the 2nd one

Thanks for your help
0
Optimize your web performance

What's in the eBook?
- Full list of reasons for poor performance
- Ultimate measures to speed things up
- Primary web monitoring types
- KPIs you should be monitoring in order to increase your ROI

 
LVL 2

Author Comment

by:joegass
ID: 9641390
I think the crux of the problem is my use of .* in the middle of the expression

trying a very simple example
string = just a test advert phone Secure&nbsp;Number</a>1&gt; or Secure&nbsp;Number</a>2&gt;
pattern = (Secure&nbsp;Number){1,1}(</a>)?.*&gt;

Still returns a match of
Secure&nbsp;Number</a>1&gt; or Secure&nbsp;Number</a>2&gt;

When I'm expecting it to have "secure number" only once

I'm trying to use some form of wild card in the middle as the bit in my first example "<a href="javascript:;">" may include other parameters in it e.g. "<a href="javascript:;" onclick="window.open('/test/test/secureNumbers.htm','test','toolbar=no,location=no,status=no,menubar=no,scrollbars=yes,resizable=yes,width=400,height=450')" class="yacLink">"
These parameters being variable and optional
Looks like I'm not being specific enough
0
 
LVL 16

Accepted Solution

by:
_nn_ earned 250 total points
ID: 9641463
From malharone's contribution, I infer that making the * meta non-greedy by postpending a ? is supported. So maybe following would work :

<!-- param1:\s*(?<phone>.*?)\s*param2:\s*(?<expiry>.*?)\s*-->.*?Secure&nbsp;Number(</a>)?&gt;

I guess, the reason why the original one works when there's a line-break is that in standard regexp the dot does not match end-of-lines, so it forced the pattern matcher to find a match in the first line only.
0
 
LVL 2

Author Comment

by:joegass
ID: 9641847
That did the trick excellent - thank you
Not too sure if I follow this non-greedy match, but it works great

Thanks to malharone for your help too
0
 
LVL 16

Expert Comment

by:_nn_
ID: 9642051
>> Not too sure if I follow this non-greedy match, but it works great

The standard behavior for the * meta character is to try to "fit as much as it can". What can be fed into depends on the preceeding character (or more precisely, class). Examples will show better I think :

regexp : "start(.*)stop"

string : "this is a start"
matched : (nothing)

string : "this is a start and this is a stop"
matched : " and this is a "
(quite normal)

string : "this is a start and this is a stop and there, just for fun, another stop"
matched : " and this is a stop and there, just for fun, another "
(the matcher took as much as it could)

string : "this is a start and this is a stop and there\n, just for fun, another stop"
matched : " and this is a "
(the matcher could not get past the \n end-of-line marker because the (.) class does not match an eol character)

Now we change to :

regexp : "start(.*?)stop"

string : "this is a start and this is a stop and there, just for fun, another stop"
matched : " and this is a "
(because we specified *? instead of * alone, the matcher stopped at the first occurence of "stop")

Hope this explains.
0
 
LVL 2

Author Comment

by:joegass
ID: 9642115
Right - makes more sense when I think of it as you describe - "fit as much as it can"
Was unaware that giving it a ? stops it at the first occurence
I'll add this to my (slowly) growing regex knowledge
Thanks very much for all your time
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Whether you've completed a degree in computer sciences or you're a self-taught programmer, writing your first lines of code in the real world is always a challenge. Here are some of the most common pitfalls for new programmers.
In this post we will learn how to make Android Gesture Tutorial and give different functionality whenever a user Touch or Scroll android screen.
In this fourth video of the Xpdf series, we discuss and demonstrate the PDFinfo utility, which retrieves the contents of a PDF's Info Dictionary, as well as some other information, including the page count. We show how to isolate the page count in a…
Introduction to Processes

627 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question