Solved

regex pattern not quite right....

Posted on 2003-10-23
8
285 Views
Last Modified: 2010-04-17
I have a block of text like this:
"
just a test advert phone <!-- param1: 999 param2: 3  -->&lt;<a href="javascript:;">Secure&nbsp;Number</a>&gt; or <!-- param1: 123  param2: 3  -->&lt;&nbsp;<a href="javascript:;">Secure&nbsp;Number</a>&gt;
"

I'm attempting to grab 2 matches here, isolating 2 parameters at the same time param1 and param2
Here's my pattern:
<!-- param1:\s*(?<phone>.*)param2:\s*(?<expiry>.*)\s*-->.*Secure&nbsp;Number(</a>)?&gt;

My first match isn't ending at the first &gt; it is ending at the last one, example 1st match is:
<!-- param1: 999 param2: 3  -->&lt;<a href="javascript:;">Secure&nbsp;Number</a>&gt; or <!-- param1: 123  param2: 3  -->&lt;&nbsp;<a href="javascript:;">Secure&nbsp;Number</a>&gt;

when it should be:
<!-- param1: 999 param2: 3  -->&lt;<a href="javascript:;">Secure&nbsp;Number</a>&gt;

When I place a carriage return between the 2 parts it works fine.  How can I make the first match stop at the first &gt; ?
Thanks
0
Comment
Question by:joegass
  • 4
  • 2
  • 2
8 Comments
 
LVL 9

Expert Comment

by:malharone
ID: 9609496
first of all , you'll find http://www.codeproject.com/dotnet/Expresso.asp useful
0
 
LVL 9

Expert Comment

by:malharone
ID: 9609803
and second of all ....
(</?a>)? (<!-- \s* (param\d+: \s* (\d+) \s*)+ --\> &lt?;?)+ .*? Secure? &nbsp;Number(</?a>)? &gt; (\s* or \s*)?


hope this helps
0
 
LVL 2

Author Comment

by:joegass
ID: 9641321
Sorry for my delay in getting back to you!
Thanks for the link to expresso
I'm still not having any luck with that expression, I need to name 2 groups in my regex, 1 called phone the other called expiry

<!-- param1:\s*(?<phone>.*)param2:\s*(?<expiry>.*)\s*-->.*Secure&nbsp;Number(</a>)?&gt;

The one above works OK when it is the only match on the line, but if the isn't a line break between the patterns in the text it fails to end at the first &gt; but seems to skip to the 2nd one

Thanks for your help
0
 
LVL 2

Author Comment

by:joegass
ID: 9641390
I think the crux of the problem is my use of .* in the middle of the expression

trying a very simple example
string = just a test advert phone Secure&nbsp;Number</a>1&gt; or Secure&nbsp;Number</a>2&gt;
pattern = (Secure&nbsp;Number){1,1}(</a>)?.*&gt;

Still returns a match of
Secure&nbsp;Number</a>1&gt; or Secure&nbsp;Number</a>2&gt;

When I'm expecting it to have "secure number" only once

I'm trying to use some form of wild card in the middle as the bit in my first example "<a href="javascript:;">" may include other parameters in it e.g. "<a href="javascript:;" onclick="window.open('/test/test/secureNumbers.htm','test','toolbar=no,location=no,status=no,menubar=no,scrollbars=yes,resizable=yes,width=400,height=450')" class="yacLink">"
These parameters being variable and optional
Looks like I'm not being specific enough
0
Maximize Your Threat Intelligence Reporting

Reporting is one of the most important and least talked about aspects of a world-class threat intelligence program. Here’s how to do it right.

 
LVL 16

Accepted Solution

by:
_nn_ earned 250 total points
ID: 9641463
From malharone's contribution, I infer that making the * meta non-greedy by postpending a ? is supported. So maybe following would work :

<!-- param1:\s*(?<phone>.*?)\s*param2:\s*(?<expiry>.*?)\s*-->.*?Secure&nbsp;Number(</a>)?&gt;

I guess, the reason why the original one works when there's a line-break is that in standard regexp the dot does not match end-of-lines, so it forced the pattern matcher to find a match in the first line only.
0
 
LVL 2

Author Comment

by:joegass
ID: 9641847
That did the trick excellent - thank you
Not too sure if I follow this non-greedy match, but it works great

Thanks to malharone for your help too
0
 
LVL 16

Expert Comment

by:_nn_
ID: 9642051
>> Not too sure if I follow this non-greedy match, but it works great

The standard behavior for the * meta character is to try to "fit as much as it can". What can be fed into depends on the preceeding character (or more precisely, class). Examples will show better I think :

regexp : "start(.*)stop"

string : "this is a start"
matched : (nothing)

string : "this is a start and this is a stop"
matched : " and this is a "
(quite normal)

string : "this is a start and this is a stop and there, just for fun, another stop"
matched : " and this is a stop and there, just for fun, another "
(the matcher took as much as it could)

string : "this is a start and this is a stop and there\n, just for fun, another stop"
matched : " and this is a "
(the matcher could not get past the \n end-of-line marker because the (.) class does not match an eol character)

Now we change to :

regexp : "start(.*?)stop"

string : "this is a start and this is a stop and there, just for fun, another stop"
matched : " and this is a "
(because we specified *? instead of * alone, the matcher stopped at the first occurence of "stop")

Hope this explains.
0
 
LVL 2

Author Comment

by:joegass
ID: 9642115
Right - makes more sense when I think of it as you describe - "fit as much as it can"
Was unaware that giving it a ? stops it at the first occurence
I'll add this to my (slowly) growing regex knowledge
Thanks very much for all your time
0

Featured Post

Top 6 Sources for Identifying Threat Actor TTPs

Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

Join & Write a Comment

Here we come across an interesting topic of coding guidelines while designing automation test scripts. The scope of this article will not be limited to QTP but to an overall extent of using VB Scripting for automation projects. Introduction Now…
Displaying an arrayList in a listView using the default adapter is rarely the best solution. To get full control of your display data, and to be able to refresh it after editing, requires the use of a custom adapter.
An introduction to basic programming syntax in Java by creating a simple program. Viewers can follow the tutorial as they create their first class in Java. Definitions and explanations about each element are given to help prepare viewers for future …
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…

746 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now