?
Solved

Regular expression required.

Posted on 2003-03-20
9
Medium Priority
?
145 Views
Last Modified: 2010-03-05
I have the following regexp ...

'\<a href\=\"/scan\.asp\?page\=title\&r\=R2\&title\=(\d*)\".*\>(.*)\</a\>',

This retrieves the title (which is a number) and the name (which is the link, not the URL).

This is fine.

How do I modify this so that the &r parameter is NOT R2. I can be MANY other things and may be empty and may be longer than 2 characters.

Richard.


0
Comment
Question by:Richard Quadling
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 4
9 Comments
 
LVL 40

Author Comment

by:Richard Quadling
ID: 8174597
I know I could ...

'\<a href\=\"/scan\.asp\?page\=title\&r\=(.*)\&title\=(\d*)\".*\>(.*)\</a\>',

But that would not be the requirement. This would result in a single array with all the information for R2s and non R2s in it.


0
 
LVL 40

Author Comment

by:Richard Quadling
ID: 8174652
Ha.

'\<a href\=\"/scan\.asp\?page\=title\&r\=(?!R2)(.*)(.*)\&title\=(\d*)\".*\>(.*)\</a\>',


seems to do the trick.

Any comments, explanations (I was just trying everything I could think of), better ways?

Free points!!!
0
 
LVL 40

Author Comment

by:Richard Quadling
ID: 8174659
'\<a href\=\"/scan\.asp\?page\=title\&r\=(?!R2)(.*)\&title\=(\d*)\".*\>(.*)\</a\>',

Oops. Cut and paste overload!
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
LVL 6

Expert Comment

by:holli
ID: 8179284
'\<a href\=\"/scan\.asp\?page\=title\&r\=(.*)\&title\=(\d*)\".*\>(.*)\</a\>'

will cause the regex-machine to do backtracking because the .* will read in the whole string to the end and then backtrack to find the & before title.

avoid this using:
'\<a href\=\"/scan\.asp\?page\=title\&r\=(.*)\&title\=(\d*)\".*\>(.*)\</a\>'

or just better:
'\<a href\=\"/scan\.asp\?page\=title\&r\=[^&]\&title\=(\d*)\".*\>(.*)\</a\>'


look ahead/behinds (?!R2) are also somewhat time consuming and should be avoided if not necessary.


holli
0
 
LVL 6

Expert Comment

by:holli
ID: 8179285
avoid this using:
'\<a href\=\"/scan\.asp\?page\=title\&r\=(.*?)\&title\=(\d*)\".*\>(.*)\</a\>'

sorry, just got up.
0
 
LVL 6

Expert Comment

by:holli
ID: 8179288
or just better:
'\<a href\=\"/scan\.asp\?page\=title\&r\=[^&]*\&title\=(\d*)\".*\>(.*)\</a\>'
0
 
LVL 40

Author Comment

by:Richard Quadling
ID: 8179513
How do the example you give reject &r=R2?

0
 
LVL 6

Accepted Solution

by:
holli earned 200 total points
ID: 8180202
you can catch the value of the parameter with braces and then check the catched value, eg:

if ( $line=~m'\<a href\=\"/scan\.asp\?page\=title\&r\=([^&])*\&title\=(\d*)\".*\>(.*)\</a\>')
{
  ($r, $title, $link) = ($1, $2, $3);
  if ($r eq "R2") { ... } else { ... };
}

this is also better readable.

however, i did not notice the "not R2" part of your question (as i said, just got up.) thinking about it again the "look ahead" is suitable here, if do not want to use a construct as the above.

0
 
LVL 40

Author Comment

by:Richard Quadling
ID: 8180233
I'm happy with the "look ahead" (i.e. it works), but I'm not totally sure what is happening with it.

But it works.

Thanks for your comments.

Richard.
0

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Many time we need to work with multiple files all together. If its windows system then we can use some GUI based editor to accomplish our task. But what if you are on putty or have only CLI(Command Line Interface) as an option to  edit your files. I…
I have been pestered over the years to produce and distribute regular data extracts, and often the request have explicitly requested the data be emailed as an Excel attachement; specifically Excel, as it appears: CSV files confuse (no Red or Green h…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans

770 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question