asked on

Regular expression required.

I have the following regexp ...

'\<a href\=\"/scan\.asp\?page\=title\&r\=R2\&title\=(\d*)\".*\>(.*)\</a\>',

This retrieves the title (which is a number) and the name (which is the link, not the URL).

This is fine.

How do I modify this so that the &r parameter is NOT R2. I can be MANY other things and may be empty and may be longer than 2 characters.

Richard.

Richard Quadling

ASKER

I know I could ...

'\<a href\=\"/scan\.asp\?page\=title\&r\=(.*)\&title\=(\d*)\".*\>(.*)\</a\>',

But that would not be the requirement. This would result in a single array with all the information for R2s and non R2s in it.

Richard Quadling

ASKER

Ha.

'\<a href\=\"/scan\.asp\?page\=title\&r\=(?!R2)(.*)(.*)\&title\=(\d*)\".*\>(.*)\</a\>',

seems to do the trick.

Any comments, explanations (I was just trying everything I could think of), better ways?

Free points!!!

Richard Quadling

ASKER

'\<a href\=\"/scan\.asp\?page\=title\&r\=(?!R2)(.*)\&title\=(\d*)\".*\>(.*)\</a\>',

Oops. Cut and paste overload!

holli

'\<a href\=\"/scan\.asp\?page\=title\&r\=(.*)\&title\=(\d*)\".*\>(.*)\</a\>'

will cause the regex-machine to do backtracking because the .* will read in the whole string to the end and then backtrack to find the & before title.

avoid this using:
'\<a href\=\"/scan\.asp\?page\=title\&r\=(.*)\&title\=(\d*)\".*\>(.*)\</a\>'

or just better:
'\<a href\=\"/scan\.asp\?page\=title\&r\=[^&]\&title\=(\d*)\".*\>(.*)\</a\>'

look ahead/behinds (?!R2) are also somewhat time consuming and should be avoided if not necessary.

holli

holli

avoid this using:
'\<a href\=\"/scan\.asp\?page\=title\&r\=(.*?)\&title\=(\d*)\".*\>(.*)\</a\>'

sorry, just got up.

holli

or just better:
'\<a href\=\"/scan\.asp\?page\=title\&r\=[^&]*\&title\=(\d*)\".*\>(.*)\</a\>'

Richard Quadling

ASKER

How do the example you give reject &r=R2?

ASKER CERTIFIED SOLUTION

holli

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Richard Quadling

ASKER

I'm happy with the "look ahead" (i.e. it works), but I'm not totally sure what is happening with it.

But it works.

Thanks for your comments.

Richard.