Link to home
Start Free TrialLog in
Avatar of curiouswebster
curiouswebsterFlag for United States of America

asked on

questions about a RegEx used to analyze URL's

Question about a RegEx:

 @"[&|?](" + "myDomain.com" + ")=(.*?[^&]+)?";

what do these require or prevent before the domain?

[&|?]

and what does this require or prevent after the domain?

(.*?[^&]+)?

Thanks.
SOLUTION
Avatar of Dr. Klahn
Dr. Klahn

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of curiouswebster

ASKER

Ah, I was tied up thinking it was

& OR ?

The person who wrote this was focused on the query string parameters.

I have seen & AND ? with query string params, but do not recall seeing the | sign being used with query string params.
then, comes a capture set:

(.*?[^&]+)?"

It looks like any number of characters, NOT containing a &

am I seeing that right?

And what does the trailing ? mean?

and the ? after .* means "lazy" but I am not sure what that means.
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Awesome tool! I am still trying to get my arms around it, but this is THE BEST RegEx site I have seen!

What's the best Flavor for me to use, given my target platform is C#?
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Well, funny you should ask.

I had this super RegEx working to enforce the domain was on a white list:

            string testRegEx = @"^https?:\/\/(" + whitelistedRedirects + ")[^.].*\\/?((goto|returnurl)=https?:\/\/(" + whitelistedRedirects + ")[:|\\/].*)?";

but it enforced that sub-domains must also be white-listed. The whitelist was to look thusly:

whitelistedRedirects = "mydomain.org|sso.mydomain.org";

But I wanted to have a version that mandated only that "mydomain.org" was in the whitelist, when it was part of the ReturnURL. (is this risky? Or does it add no value to force ALL domains to be in the whitelist?)

Another developer on the team came up with that other one I posted up top, but I did not understand it like the above one, since mine was created via multiple posts on EE, and I actually understand it (for the most part)

I feel better being more expressive, to make the RegEx more reaqdable. For example, if goto or returnurl is always in a return url, then it helps me to see it there. Brevity is confusing when reading both hieroglyphics AND RegEx.

Plus, I have never gotten the other guy's to return True, which normally means I am dead in the water. Mine return true, when expected, so I can take baby steps to bring it to the next level of functionality.

I am fine updating my latest RegEx, but it needs to no longer have the requirement that sub-domains be listed on the whitelist.

It seem the following "https?://" needs to be replaced with a wildcard of any number of characters which could make up a sub-domain.


Also, I added "[^.].*"

to prevent a hacker from making my domain into a sub-domain on HIS domain, thusly

mydomain.org.EVILSITE.COM

and having my RegEx thing it was a success,
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
That still returns false...But we are getting snow and I gotta head out until Tuesday AM. I can leave this issue open....

until then.

Cheers.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
thanks