Richard Quadling
asked on
Regular expression required.
I have the following regexp ...
'\<a href\=\"/scan\.asp\?page\= title\&r\= R2\&title\ =(\d*)\".* \>(.*)\</a \>',
This retrieves the title (which is a number) and the name (which is the link, not the URL).
This is fine.
How do I modify this so that the &r parameter is NOT R2. I can be MANY other things and may be empty and may be longer than 2 characters.
Richard.
'\<a href\=\"/scan\.asp\?page\=
This retrieves the title (which is a number) and the name (which is the link, not the URL).
This is fine.
How do I modify this so that the &r parameter is NOT R2. I can be MANY other things and may be empty and may be longer than 2 characters.
Richard.
ASKER
Ha.
'\<a href\=\"/scan\.asp\?page\= title\&r\= (?!R2)(.*) (.*)\&titl e\=(\d*)\" .*\>(.*)\< /a\>',
seems to do the trick.
Any comments, explanations (I was just trying everything I could think of), better ways?
Free points!!!
'\<a href\=\"/scan\.asp\?page\=
seems to do the trick.
Any comments, explanations (I was just trying everything I could think of), better ways?
Free points!!!
ASKER
'\<a href\=\"/scan\.asp\?page\= title\&r\= (?!R2)(.*) \&title\=( \d*)\".*\> (.*)\</a\> ',
Oops. Cut and paste overload!
Oops. Cut and paste overload!
'\<a href\=\"/scan\.asp\?page\= title\&r\= (.*)\&titl e\=(\d*)\" .*\>(.*)\< /a\>'
will cause the regex-machine to do backtracking because the .* will read in the whole string to the end and then backtrack to find the & before title.
avoid this using:
'\<a href\=\"/scan\.asp\?page\= title\&r\= (.*)\&titl e\=(\d*)\" .*\>(.*)\< /a\>'
or just better:
'\<a href\=\"/scan\.asp\?page\= title\&r\= [^&]\&titl e\=(\d*)\" .*\>(.*)\< /a\>'
look ahead/behinds (?!R2) are also somewhat time consuming and should be avoided if not necessary.
holli
will cause the regex-machine to do backtracking because the .* will read in the whole string to the end and then backtrack to find the & before title.
avoid this using:
'\<a href\=\"/scan\.asp\?page\=
or just better:
'\<a href\=\"/scan\.asp\?page\=
look ahead/behinds (?!R2) are also somewhat time consuming and should be avoided if not necessary.
holli
avoid this using:
'\<a href\=\"/scan\.asp\?page\= title\&r\= (.*?)\&tit le\=(\d*)\ ".*\>(.*)\ </a\>'
sorry, just got up.
'\<a href\=\"/scan\.asp\?page\=
sorry, just got up.
or just better:
'\<a href\=\"/scan\.asp\?page\= title\&r\= [^&]*\&tit le\=(\d*)\ ".*\>(.*)\ </a\>'
'\<a href\=\"/scan\.asp\?page\=
ASKER
How do the example you give reject &r=R2?
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
I'm happy with the "look ahead" (i.e. it works), but I'm not totally sure what is happening with it.
But it works.
Thanks for your comments.
Richard.
But it works.
Thanks for your comments.
Richard.
ASKER
'\<a href\=\"/scan\.asp\?page\=
But that would not be the requirement. This would result in a single array with all the information for R2s and non R2s in it.