Link to home
Start Free TrialLog in
Avatar of curiouswebster
curiouswebsterFlag for United States of America

asked on

RegEx: Asserting a valid ReturnURL

RegEx: Need to assert that only a specific ReturnURL exists in the browser path.

I have a "white list" an approved domain for use as the ReturnURL. Now I need a RegEx to be sure no errant domains were inserted into the URL.

If my white listed domain is:
MyDom.com

https://sso.MyDom.com/?goto=http://MyDom.com

Could someone provide me a RegEx which can assert: MyDom.com is not preceded (after goto) or followed with an errant domain?



Thanks
Avatar of aikimark
aikimark
Flag of United States of America image

please give a representative sample of your white list as well as some expected behavior examples this validation should accept/reject.
SOLUTION
Avatar of arnold
arnold
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of curiouswebster

ASKER

Thanks for the feedback. I am about to test your suggestions, but wanted to first answer the question about my whitelist.

For now, in C#, it is as easy as:

string redirectWhitelist = "abc.com;xyz.com;a.def.com";

obviously, MyDom.com fails this one.

I have code in C# which parses the string and iterates for each delimited domain in the whitelist.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I tried both RegEx's with different results.

One returns false when I think it should be true. The other threw an exception.

Here's my code:

            string testURL = "https://sso.mydoc.com/?goto=http%3a%2f%2fmydoc.com";

            string regEx1 = @"https://sso\.mydoc\.com/\?goto=http://(mydoc.com)$";
            string regEx2 = @"^[A - Za - z0 - 9.+ -] *://(?:[^@]*@)?(?:[^.:/?]+\.)*(?<mydoc.com>[^.:/?]+\.[^.:/?]+)(?!\.)";

            bool match1 = Regex.IsMatch(testURL, regEx1);
            bool match2 = Regex.IsMatch(testURL, regEx2);


The regEx1 returned false and regEx2 threw the following exception:

Message = "parsing \"^[A - Za - z0 - 9.+ -] *://(?:[^@]*@)?(?:[^.:/?]+\\.)*(?<mydoc.com>[^.:/?]+\\.[^.:/?]+)(?!\\.)\" - Invalid group name: Group names must begin with a word character."

I assume I missed a character on regEx2, but have no idea why regEx1 failed to return true.


Also, the following even failed to match my testURL:
            string regEx3 = @"https://sso\.mydoc\.com/$";
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I like the idea of using the pipe as the delimiter, but am missing the context to run your test.

Using C#, do I need any prefix to:
\?goto=https?%3a%2f%2f(?:abc.com|xyz.com|a.def.com)

like the following?

string myRegEx = @"\?goto=https?%3a%2f%2f(?:abc.com|xyz.com|a.def.com)";

also, if I do not include the https://

how can I be sure to block a hacker who inserts a malicious domain BEFORE a white listed URL?
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thanks. And can I assume if I remove "?:" at the start of the white list, this will ensure white list items immediately follow the protocol?prefix
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
louisfr, you idea is interesting, but please include a testURL where it matches. I can not use a RegEx when I have no starting point, of a working RegEx.

For example, the following returns true
           string regEx4 = @"\?goto=https?%3a%2f%2f(?:abc.com|xyz.com|a.def.com|mydoc.com)";

when testing the string:
            string testURL = "https://sso.mydoc.com/?goto=http%3a%2f%2fmydoc.com";

so I can modify it pretty easily, with little enhancements.

Thanks.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
and, removing  the ?: will force the developer to explicitly list all sub-domains in that white list, which is ideal.
For flexibility, I encourage you to use an external file or DB for the whitelist information versus hardcoding it into the code.

With a file, the change is effective on the next request.
Yes., hard coding was a hack for quick testing. It'll go in a config file. Thanks.