regex pattern not quite right....

I have a block of text like this:
"
just a test advert phone <!-- param1: 999 param2: 3  -->&lt;<a href="javascript:;">Secure&nbsp;Number</a>&gt; or <!-- param1: 123  param2: 3  -->&lt;&nbsp;<a href="javascript:;">Secure&nbsp;Number</a>&gt;
"

I'm attempting to grab 2 matches here, isolating 2 parameters at the same time param1 and param2
Here's my pattern:
<!-- param1:\s*(?<phone>.*)param2:\s*(?<expiry>.*)\s*-->.*Secure&nbsp;Number(</a>)?&gt;

My first match isn't ending at the first &gt; it is ending at the last one, example 1st match is:
<!-- param1: 999 param2: 3  -->&lt;<a href="javascript:;">Secure&nbsp;Number</a>&gt; or <!-- param1: 123  param2: 3  -->&lt;&nbsp;<a href="javascript:;">Secure&nbsp;Number</a>&gt;

when it should be:
<!-- param1: 999 param2: 3  -->&lt;<a href="javascript:;">Secure&nbsp;Number</a>&gt;

When I place a carriage return between the 2 parts it works fine.  How can I make the first match stop at the first &gt; ?
Thanks
LVL 2
joegassAsked:
Who is Participating?
 
_nn_Commented:
From malharone's contribution, I infer that making the * meta non-greedy by postpending a ? is supported. So maybe following would work :

<!-- param1:\s*(?<phone>.*?)\s*param2:\s*(?<expiry>.*?)\s*-->.*?Secure&nbsp;Number(</a>)?&gt;

I guess, the reason why the original one works when there's a line-break is that in standard regexp the dot does not match end-of-lines, so it forced the pattern matcher to find a match in the first line only.
0
 
malharoneCommented:
first of all , you'll find http://www.codeproject.com/dotnet/Expresso.asp useful
0
 
malharoneCommented:
and second of all ....
(</?a>)? (<!-- \s* (param\d+: \s* (\d+) \s*)+ --\> &lt?;?)+ .*? Secure? &nbsp;Number(</?a>)? &gt; (\s* or \s*)?


hope this helps
0
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

 
joegassAuthor Commented:
Sorry for my delay in getting back to you!
Thanks for the link to expresso
I'm still not having any luck with that expression, I need to name 2 groups in my regex, 1 called phone the other called expiry

<!-- param1:\s*(?<phone>.*)param2:\s*(?<expiry>.*)\s*-->.*Secure&nbsp;Number(</a>)?&gt;

The one above works OK when it is the only match on the line, but if the isn't a line break between the patterns in the text it fails to end at the first &gt; but seems to skip to the 2nd one

Thanks for your help
0
 
joegassAuthor Commented:
I think the crux of the problem is my use of .* in the middle of the expression

trying a very simple example
string = just a test advert phone Secure&nbsp;Number</a>1&gt; or Secure&nbsp;Number</a>2&gt;
pattern = (Secure&nbsp;Number){1,1}(</a>)?.*&gt;

Still returns a match of
Secure&nbsp;Number</a>1&gt; or Secure&nbsp;Number</a>2&gt;

When I'm expecting it to have "secure number" only once

I'm trying to use some form of wild card in the middle as the bit in my first example "<a href="javascript:;">" may include other parameters in it e.g. "<a href="javascript:;" onclick="window.open('/test/test/secureNumbers.htm','test','toolbar=no,location=no,status=no,menubar=no,scrollbars=yes,resizable=yes,width=400,height=450')" class="yacLink">"
These parameters being variable and optional
Looks like I'm not being specific enough
0
 
joegassAuthor Commented:
That did the trick excellent - thank you
Not too sure if I follow this non-greedy match, but it works great

Thanks to malharone for your help too
0
 
_nn_Commented:
>> Not too sure if I follow this non-greedy match, but it works great

The standard behavior for the * meta character is to try to "fit as much as it can". What can be fed into depends on the preceeding character (or more precisely, class). Examples will show better I think :

regexp : "start(.*)stop"

string : "this is a start"
matched : (nothing)

string : "this is a start and this is a stop"
matched : " and this is a "
(quite normal)

string : "this is a start and this is a stop and there, just for fun, another stop"
matched : " and this is a stop and there, just for fun, another "
(the matcher took as much as it could)

string : "this is a start and this is a stop and there\n, just for fun, another stop"
matched : " and this is a "
(the matcher could not get past the \n end-of-line marker because the (.) class does not match an eol character)

Now we change to :

regexp : "start(.*?)stop"

string : "this is a start and this is a stop and there, just for fun, another stop"
matched : " and this is a "
(because we specified *? instead of * alone, the matcher stopped at the first occurence of "stop")

Hope this explains.
0
 
joegassAuthor Commented:
Right - makes more sense when I think of it as you describe - "fit as much as it can"
Was unaware that giving it a ? stops it at the first occurence
I'll add this to my (slowly) growing regex knowledge
Thanks very much for all your time
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.