Avatar of curiouswebster
curiouswebster
Flag for United States of America asked on

questions about a RegEx used to analyze URL's

Question about a RegEx:

 @"[&|?](" + "myDomain.com" + ")=(.*?[^&]+)?";

what do these require or prevent before the domain?

[&|?]

and what does this require or prevent after the domain?

(.*?[^&]+)?

Thanks.
Regular ExpressionsC#Web Languages and StandardsScripting Languages

Avatar of undefined
Last Comment
Ben Personick (Previously QCubed)

8/22/2022 - Mon
SOLUTION
Dr. Klahn

Log in or sign up to see answer
Become an EE member today7-DAY FREE TRIAL
Members can start a 7-Day Free trial then enjoy unlimited access to the platform
Sign up - Free for 7 days
or
Learn why we charge membership fees
We get it - no one likes a content blocker. Take one extra minute and find out why we block content.
Not exactly the question you had in mind?
Sign up for an EE membership and get your own personalized solution. With an EE membership, you can ask unlimited troubleshooting, research, or opinion questions.
ask a question
curiouswebster

ASKER
Ah, I was tied up thinking it was

& OR ?

The person who wrote this was focused on the query string parameters.

I have seen & AND ? with query string params, but do not recall seeing the | sign being used with query string params.
curiouswebster

ASKER
then, comes a capture set:

(.*?[^&]+)?"

It looks like any number of characters, NOT containing a &

am I seeing that right?

And what does the trailing ? mean?

and the ? after .* means "lazy" but I am not sure what that means.
ASKER CERTIFIED SOLUTION
Ben Personick (Previously QCubed)

Log in or sign up to see answer
Become an EE member today7-DAY FREE TRIAL
Members can start a 7-Day Free trial then enjoy unlimited access to the platform
Sign up - Free for 7 days
or
Learn why we charge membership fees
We get it - no one likes a content blocker. Take one extra minute and find out why we block content.
Not exactly the question you had in mind?
Sign up for an EE membership and get your own personalized solution. With an EE membership, you can ask unlimited troubleshooting, research, or opinion questions.
ask a question
SOLUTION
Log in to continue reading
Log In
Sign up - Free for 7 days
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
curiouswebster

ASKER
Awesome tool! I am still trying to get my arms around it, but this is THE BEST RegEx site I have seen!

What's the best Flavor for me to use, given my target platform is C#?
Experts Exchange is like having an extremely knowledgeable team sitting and waiting for your call. Couldn't do my job half as well as I do without it!
James Murphy
SOLUTION
Log in to continue reading
Log In
Sign up - Free for 7 days
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
curiouswebster

ASKER
Well, funny you should ask.

I had this super RegEx working to enforce the domain was on a white list:

            string testRegEx = @"^https?:\/\/(" + whitelistedRedirects + ")[^.].*\\/?((goto|returnurl)=https?:\/\/(" + whitelistedRedirects + ")[:|\\/].*)?";

but it enforced that sub-domains must also be white-listed. The whitelist was to look thusly:

whitelistedRedirects = "mydomain.org|sso.mydomain.org";

But I wanted to have a version that mandated only that "mydomain.org" was in the whitelist, when it was part of the ReturnURL. (is this risky? Or does it add no value to force ALL domains to be in the whitelist?)

Another developer on the team came up with that other one I posted up top, but I did not understand it like the above one, since mine was created via multiple posts on EE, and I actually understand it (for the most part)

I feel better being more expressive, to make the RegEx more reaqdable. For example, if goto or returnurl is always in a return url, then it helps me to see it there. Brevity is confusing when reading both hieroglyphics AND RegEx.

Plus, I have never gotten the other guy's to return True, which normally means I am dead in the water. Mine return true, when expected, so I can take baby steps to bring it to the next level of functionality.

I am fine updating my latest RegEx, but it needs to no longer have the requirement that sub-domains be listed on the whitelist.

It seem the following "https?://" needs to be replaced with a wildcard of any number of characters which could make up a sub-domain.


Also, I added "[^.].*"

to prevent a hacker from making my domain into a sub-domain on HIS domain, thusly

mydomain.org.EVILSITE.COM

and having my RegEx thing it was a success,
SOLUTION
Log in to continue reading
Log In
Sign up - Free for 7 days
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
curiouswebster

ASKER
That still returns false...But we are getting snow and I gotta head out until Tuesday AM. I can leave this issue open....

until then.

Cheers.
SOLUTION
Log in to continue reading
Log In
Sign up - Free for 7 days
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
curiouswebster

ASKER
thanks
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
Ben Personick (Previously QCubed)

Glad to help :)