Solved

regex pattern not quite right....

Posted on 2003-10-23
8
290 Views
Last Modified: 2010-04-17
I have a block of text like this:
"
just a test advert phone <!-- param1: 999 param2: 3  -->&lt;<a href="javascript:;">Secure&nbsp;Number</a>&gt; or <!-- param1: 123  param2: 3  -->&lt;&nbsp;<a href="javascript:;">Secure&nbsp;Number</a>&gt;
"

I'm attempting to grab 2 matches here, isolating 2 parameters at the same time param1 and param2
Here's my pattern:
<!-- param1:\s*(?<phone>.*)param2:\s*(?<expiry>.*)\s*-->.*Secure&nbsp;Number(</a>)?&gt;

My first match isn't ending at the first &gt; it is ending at the last one, example 1st match is:
<!-- param1: 999 param2: 3  -->&lt;<a href="javascript:;">Secure&nbsp;Number</a>&gt; or <!-- param1: 123  param2: 3  -->&lt;&nbsp;<a href="javascript:;">Secure&nbsp;Number</a>&gt;

when it should be:
<!-- param1: 999 param2: 3  -->&lt;<a href="javascript:;">Secure&nbsp;Number</a>&gt;

When I place a carriage return between the 2 parts it works fine.  How can I make the first match stop at the first &gt; ?
Thanks
0
Comment
Question by:joegass
  • 4
  • 2
  • 2
8 Comments
 
LVL 9

Expert Comment

by:malharone
ID: 9609496
first of all , you'll find http://www.codeproject.com/dotnet/Expresso.asp useful
0
 
LVL 9

Expert Comment

by:malharone
ID: 9609803
and second of all ....
(</?a>)? (<!-- \s* (param\d+: \s* (\d+) \s*)+ --\> &lt?;?)+ .*? Secure? &nbsp;Number(</?a>)? &gt; (\s* or \s*)?


hope this helps
0
 
LVL 2

Author Comment

by:joegass
ID: 9641321
Sorry for my delay in getting back to you!
Thanks for the link to expresso
I'm still not having any luck with that expression, I need to name 2 groups in my regex, 1 called phone the other called expiry

<!-- param1:\s*(?<phone>.*)param2:\s*(?<expiry>.*)\s*-->.*Secure&nbsp;Number(</a>)?&gt;

The one above works OK when it is the only match on the line, but if the isn't a line break between the patterns in the text it fails to end at the first &gt; but seems to skip to the 2nd one

Thanks for your help
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 2

Author Comment

by:joegass
ID: 9641390
I think the crux of the problem is my use of .* in the middle of the expression

trying a very simple example
string = just a test advert phone Secure&nbsp;Number</a>1&gt; or Secure&nbsp;Number</a>2&gt;
pattern = (Secure&nbsp;Number){1,1}(</a>)?.*&gt;

Still returns a match of
Secure&nbsp;Number</a>1&gt; or Secure&nbsp;Number</a>2&gt;

When I'm expecting it to have "secure number" only once

I'm trying to use some form of wild card in the middle as the bit in my first example "<a href="javascript:;">" may include other parameters in it e.g. "<a href="javascript:;" onclick="window.open('/test/test/secureNumbers.htm','test','toolbar=no,location=no,status=no,menubar=no,scrollbars=yes,resizable=yes,width=400,height=450')" class="yacLink">"
These parameters being variable and optional
Looks like I'm not being specific enough
0
 
LVL 16

Accepted Solution

by:
_nn_ earned 250 total points
ID: 9641463
From malharone's contribution, I infer that making the * meta non-greedy by postpending a ? is supported. So maybe following would work :

<!-- param1:\s*(?<phone>.*?)\s*param2:\s*(?<expiry>.*?)\s*-->.*?Secure&nbsp;Number(</a>)?&gt;

I guess, the reason why the original one works when there's a line-break is that in standard regexp the dot does not match end-of-lines, so it forced the pattern matcher to find a match in the first line only.
0
 
LVL 2

Author Comment

by:joegass
ID: 9641847
That did the trick excellent - thank you
Not too sure if I follow this non-greedy match, but it works great

Thanks to malharone for your help too
0
 
LVL 16

Expert Comment

by:_nn_
ID: 9642051
>> Not too sure if I follow this non-greedy match, but it works great

The standard behavior for the * meta character is to try to "fit as much as it can". What can be fed into depends on the preceeding character (or more precisely, class). Examples will show better I think :

regexp : "start(.*)stop"

string : "this is a start"
matched : (nothing)

string : "this is a start and this is a stop"
matched : " and this is a "
(quite normal)

string : "this is a start and this is a stop and there, just for fun, another stop"
matched : " and this is a stop and there, just for fun, another "
(the matcher took as much as it could)

string : "this is a start and this is a stop and there\n, just for fun, another stop"
matched : " and this is a "
(the matcher could not get past the \n end-of-line marker because the (.) class does not match an eol character)

Now we change to :

regexp : "start(.*?)stop"

string : "this is a start and this is a stop and there, just for fun, another stop"
matched : " and this is a "
(because we specified *? instead of * alone, the matcher stopped at the first occurence of "stop")

Hope this explains.
0
 
LVL 2

Author Comment

by:joegass
ID: 9642115
Right - makes more sense when I think of it as you describe - "fit as much as it can"
Was unaware that giving it a ? stops it at the first occurence
I'll add this to my (slowly) growing regex knowledge
Thanks very much for all your time
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
t-sql split string into multiple rows 7 155
ejb stateless example 2 44
jboss 7.1 start up error 1 64
Java List 4 42
A short article about a problem I had getting the GPS LocationListener working.
Whether you've completed a degree in computer sciences or you're a self-taught programmer, writing your first lines of code in the real world is always a challenge. Here are some of the most common pitfalls for new programmers.

730 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question