RegEx: Asserting a valid ReturnURL

RegEx: Need to assert that only a specific ReturnURL exists in the browser path.

I have a "white list" an approved domain for use as the ReturnURL. Now I need a RegEx to be sure no errant domains were inserted into the URL.

If my white listed domain is:

Could someone provide me a RegEx which can assert: is not preceded (after goto) or followed with an errant domain?

newbiewebSr. Software EngineerAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

please give a representative sample of your white list as well as some expected behavior examples this validation should accept/reject.
Not clear under what circumstances you are trying validate
the context under which this is being enforced.
I think, I understand, your goto variable regex http://.*\.mydom\.com.*
Is how to detect.

Not sure in, you can look at match
Regex match the goto variable extracting the portion between the http://([A-Za-z0-9\-\.]+)/
Then compare the matched data against your white list potentially having to regex again to compare by stripping the somehost.
You didn't specify if it was limited to http, so I made a regex allowing any scheme, and also userinfo.

Open in new window

Cloud as a Security Delivery Platform for MSSPs

Every Managed Security Service Provider (MSSP) needs a platform to deliver effective and efficient security-as-a-service to their customers. Scale, elasticity and profitability are a few of the many features that a Cloud platform offers. View our on-demand webinar to learn more!


pls try

Open in new window

newbiewebSr. Software EngineerAuthor Commented:
Thanks for the feedback. I am about to test your suggestions, but wanted to first answer the question about my whitelist.

For now, in C#, it is as easy as:

string redirectWhitelist = ";;";

obviously, fails this one.

I have code in C# which parses the string and iterates for each delimited domain in the whitelist.
then replace with
your list separated with pipe | the or in regex
The first thing is to extract the domain name portion from the doto= variable. Depending on which you are using, you could have the whitelist in a DB like structure, or a hash that you can then queriy if a record is returned, that means it is whitelist, no record means it is not.
With few domains, the variable, hard coded is fine, as you potentially grow the whitelist URL, repeatedly modifying or using an external flat file that is read in every time this page loads to build the current whitelist will become a better approach, flexible.
An example of what Rgonzo recommended.  I simplified the pattern and added during my testing.

Open in new window

If you don't really need to capture anything, then you can just test this pattern against the string.  If you do need to capture the redirect domain, then drop the ?:

You should make the pattern case-insensitive.  Otherwise, you will need to make sure the string matches the case of your pattern.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
newbiewebSr. Software EngineerAuthor Commented:
I tried both RegEx's with different results.

One returns false when I think it should be true. The other threw an exception.

Here's my code:

            string testURL = "";

            string regEx1 = @"https://sso\.mydoc\.com/\?goto=http://($";
            string regEx2 = @"^[A - Za - z0 - 9.+ -] *://(?:[^@]*@)?(?:[^.:/?]+\.)*(?<>[^.:/?]+\.[^.:/?]+)(?!\.)";

            bool match1 = Regex.IsMatch(testURL, regEx1);
            bool match2 = Regex.IsMatch(testURL, regEx2);

The regEx1 returned false and regEx2 threw the following exception:

Message = "parsing \"^[A - Za - z0 - 9.+ -] *://(?:[^@]*@)?(?:[^.:/?]+\\.)*(?<>[^.:/?]+\\.[^.:/?]+)(?!\\.)\" - Invalid group name: Group names must begin with a word character."

I assume I missed a character on regEx2, but have no idea why regEx1 failed to return true.

Also, the following even failed to match my testURL:
            string regEx3 = @"https://sso\.mydoc\.com/$";
I'd recommend not including the http/https prefix in your white list.  The escaped characters caused a mismatch.

Open in new window

newbiewebSr. Software EngineerAuthor Commented:
I like the idea of using the pipe as the delimiter, but am missing the context to run your test.

Using C#, do I need any prefix to:

like the following?

string myRegEx = @"\?goto=https?%3a%2f%2f(?||";

also, if I do not include the https://

how can I be sure to block a hacker who inserts a malicious domain BEFORE a white listed URL?
In this case, the white list items must immediately follow the protocol prefix.  If you want to prevent appended data, then we need to add the $ terminating character at the end.
newbiewebSr. Software EngineerAuthor Commented:
Thanks. And can I assume if I remove "?:" at the start of the white list, this will ensure white list items immediately follow the protocol?prefix
For my regex, the idea is to return what is matched and check if it's one of your domains.
Match match = Regex.Match(testURL, @"^[A-Za-z0-9.+ -]*://(?:[^@]*@)?(?:[^.:/?]+\.)*(?<domain>[^.:/?]+\.[^.:/?]+)(?!\.)");
if (match.Success)
    string domain = match.Groups["domain"].Value;
    foreach (string url in redirectWhitelist.Split(';'))
        if (domain == url) return true;

Open in new window

the point you seem to be too much, why not extract the goto data presumably this is what needs to be validate, the URL if it wound up on your site will match...
process the query-string which is where the ?goto=
goto can further be extracted to deal with only the referal....

out of
which are the items of interest to you?
focus on extracting the items there and matching the items there ....,
newbiewebSr. Software EngineerAuthor Commented:
louisfr, you idea is interesting, but please include a testURL where it matches. I can not use a RegEx when I have no starting point, of a working RegEx.

For example, the following returns true
           string regEx4 = @"\?goto=https?%3a%2f%2f(?|||";

when testing the string:
            string testURL = "";

so I can modify it pretty easily, with little enhancements.

Are you using the white list item that matches?  If so, then remove the ?:
newbiewebSr. Software EngineerAuthor Commented:
and, removing  the ?: will force the developer to explicitly list all sub-domains in that white list, which is ideal.
For flexibility, I encourage you to use an external file or DB for the whitelist information versus hardcoding it into the code.

With a file, the change is effective on the next request.
newbiewebSr. Software EngineerAuthor Commented:
Yes., hard coding was a hack for quick testing. It'll go in a config file. Thanks.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today

From novice to tech pro — start learning today.