Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 147
  • Last Modified:

Regular expression required.

I have the following regexp ...

'\<a href\=\"/scan\.asp\?page\=title\&r\=R2\&title\=(\d*)\".*\>(.*)\</a\>',

This retrieves the title (which is a number) and the name (which is the link, not the URL).

This is fine.

How do I modify this so that the &r parameter is NOT R2. I can be MANY other things and may be empty and may be longer than 2 characters.

Richard.


0
Richard Quadling
Asked:
Richard Quadling
  • 5
  • 4
1 Solution
 
Richard QuadlingSenior Software DeverloperAuthor Commented:
I know I could ...

'\<a href\=\"/scan\.asp\?page\=title\&r\=(.*)\&title\=(\d*)\".*\>(.*)\</a\>',

But that would not be the requirement. This would result in a single array with all the information for R2s and non R2s in it.


0
 
Richard QuadlingSenior Software DeverloperAuthor Commented:
Ha.

'\<a href\=\"/scan\.asp\?page\=title\&r\=(?!R2)(.*)(.*)\&title\=(\d*)\".*\>(.*)\</a\>',


seems to do the trick.

Any comments, explanations (I was just trying everything I could think of), better ways?

Free points!!!
0
 
Richard QuadlingSenior Software DeverloperAuthor Commented:
'\<a href\=\"/scan\.asp\?page\=title\&r\=(?!R2)(.*)\&title\=(\d*)\".*\>(.*)\</a\>',

Oops. Cut and paste overload!
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
holliCommented:
'\<a href\=\"/scan\.asp\?page\=title\&r\=(.*)\&title\=(\d*)\".*\>(.*)\</a\>'

will cause the regex-machine to do backtracking because the .* will read in the whole string to the end and then backtrack to find the & before title.

avoid this using:
'\<a href\=\"/scan\.asp\?page\=title\&r\=(.*)\&title\=(\d*)\".*\>(.*)\</a\>'

or just better:
'\<a href\=\"/scan\.asp\?page\=title\&r\=[^&]\&title\=(\d*)\".*\>(.*)\</a\>'


look ahead/behinds (?!R2) are also somewhat time consuming and should be avoided if not necessary.


holli
0
 
holliCommented:
avoid this using:
'\<a href\=\"/scan\.asp\?page\=title\&r\=(.*?)\&title\=(\d*)\".*\>(.*)\</a\>'

sorry, just got up.
0
 
holliCommented:
or just better:
'\<a href\=\"/scan\.asp\?page\=title\&r\=[^&]*\&title\=(\d*)\".*\>(.*)\</a\>'
0
 
Richard QuadlingSenior Software DeverloperAuthor Commented:
How do the example you give reject &r=R2?

0
 
holliCommented:
you can catch the value of the parameter with braces and then check the catched value, eg:

if ( $line=~m'\<a href\=\"/scan\.asp\?page\=title\&r\=([^&])*\&title\=(\d*)\".*\>(.*)\</a\>')
{
  ($r, $title, $link) = ($1, $2, $3);
  if ($r eq "R2") { ... } else { ... };
}

this is also better readable.

however, i did not notice the "not R2" part of your question (as i said, just got up.) thinking about it again the "look ahead" is suitable here, if do not want to use a construct as the above.

0
 
Richard QuadlingSenior Software DeverloperAuthor Commented:
I'm happy with the "look ahead" (i.e. it works), but I'm not totally sure what is happening with it.

But it works.

Thanks for your comments.

Richard.
0

Featured Post

Get your problem seen by more experts

Be seen. Boost your question’s priority for more expert views and faster solutions

  • 5
  • 4
Tackle projects and never again get stuck behind a technical roadblock.
Join Now