Modifying this regular expression

This is related question and we're close to getting it resolved.

The first regular expression allows leaving out ".com" but it shouldnt. For example "http://www.test" gets validated but it shouldnt.

The second one requires "www" to be in the URL. For example, http://test.com doesnt get validated but it should.

How can I fix the first one to disallow leaving out the ".com" or ".org" etc?
---If I leave out ".com", it works but it should say invalid
ValidationExpression="([nN][oO][nN][eE]|(((http|https):\/\/)?((([a-zA-Z0-9]+\.)?[a-zA-Z0-9\-]+(\.[a-zA-Z]+){1,2})|((25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[0-9])))+(:[1-9][0-9]*)?)+((\/([a-zA-Z0-9_\-\%\~\+]+)?)*)?(\.([a-zA-Z0-9_]+))?(\?([a-zA-Z0-9_\-]+\=[a-z-A-Z0-9_\-\%\~\+]+)?(\&([a-zA-Z0-9_\-]+\=[a-z-A-Z0-9_\-\%\~\+]+)?)*)?)" 

------------ This one requires a "www." , so how can I change this to accept wwww?

 ValidationExpression="([nN][oO][nN][eE]|(((http|https):\/\/)?(([a-zA-Z0-9]+\.[a-zA-Z0-9\-]+(\.[a-zA-Z]+){1,2})|((25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[0-9])))+(:[1-9][0-9]*)?)+((\/([a-zA-Z0-9_\-\%\~\+]+)?)*)?(\.([a-zA-Z0-9_]+))?(\?([a-zA-Z0-9_\-]+\=[a-z-A-Z0-9_\-\%\~\+]+)?(\&([a-zA-Z0-9_\-]+\=[a-z-A-Z0-9_\-\%\~\+]+)?)*)?)"

Open in new window

LVL 8
CamilliaAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

HonorGodSoftware EngineerCommented:
Using stuff like this: "[nN][oO][nN][eE]"  is rarely valuable.  It is generally better to add "case insensitive" flag to the regular expression.

You can also simplify: "(http|https)" as "https?" (i.e., make the trailing "s" optional)

If you want to require specific suffixes, you could have something like:

\.(com|org|gov|edu)$

at the end of the RegExp
HonorGodSoftware EngineerCommented:
by the way, the second doesn't "require" www., it allows it.

This part: "[a-zA-Z0-9]+" indicates 1 or more alphanumeric characters (without an upper limit).
CamilliaAuthor Commented:
HonorGod, thanks but this doesnt answer my question. I've been at this for a week now. If you look at the related question, you can see the thread...

I have this  "[nN][oO][nN][eE]"  because user can enter None or none, etc.

I just need to change the first one to allow ".com" or ".org". As it is now, it validates "www.test"
I dont know how to modify it to allow it?

The second one doesnt validate "http://test.com" but it doesnt matter..if I can get the first one going, i'll be done with this.
OWASP Proactive Controls

Learn the most important control and control categories that every architect and developer should include in their projects.

HonorGodSoftware EngineerCommented:
> I have this  "[nN][oO][nN][eE]"  because user can enter None or none, etc.
   I understand that completely.  My comment was made to simplify your RegExp.

   My comment was just trying to say that using "none" using a case insensitive match match is much easier to read and understand.

   In order to be able to modify a RegExp, we first need to dissect, and understand how it works.

   I  find that in order to do that, it first helps to match parenthesis:
 
ValidationExpression="([nN][oO][nN][eE]|(((http|https):\/\/)?((([a-zA-Z0-9]+\.)?[a-zA-Z0-9\-]+(\.[a-zA-Z]+){1,2})|((25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[0-9])))+(:[1-9][0-9]*)?)+((\/([a-zA-Z0-9_\-\%\~\+]+)?)*)?(\.([a-zA-Z0-9_]+))?(\?([a-zA-Z0-9_\-]+\=[a-z-A-Z0-9_\-\%\~\+]+)?(\&([a-zA-Z0-9_\-]+\=[a-z-A-Z0-9_\-\%\~\+]+)?)*)?)" 
                      |                 |||          |     | |||              |               |           |     | ||                                                           |  |                                                             |  |                                                             |  |                                                           ||| |            | | ||  |                     | | | |  |             || |  |                                       | |  |                                       | | | |
                      |                 ||+----------+     | ||+--------------+               +-----------+     | |+-----------------------------------------------------------+  +-------------------------------------------------------------+  +-------------------------------------------------------------+  +-----------------------------------------------------------+|| +------------+ | ||  +---------------------+ | | |  +-------------+| |  +---------------------------------------+ |  +---------------------------------------+ | | |
                      |                 |+-----------------+ |+-------------------------------------------------+ +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+|                | |+--------------------------+ | +-----------------+ |                                            +--------------------------------------------+ | |
                      |                 |                    +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+                | +-----------------------------+                     +-------------------------------------------------------------------------------------------+ |
                      ||


Open in new window


  Then, we can look for repeating, or similar patters within the RegExp, for example:
 
(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9])

Open in new window


 Which matches 1..255, but could be simplified to:
 
(25[0-5]|2[0-4][0-9]|[0-1][0-9]{2}|[1-9][0-9]|[1-9])

Open in new window


 And the adjacent patterns to match 0..255 following a period.  It's interesting that the 3 patterns aren't identical.  The first 2 look like:
 
(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)

Open in new window


  And the last like this:
 
(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[0-9])

Open in new window


  Let me come back to this after I get something to eat...
  But the could all be simplified to:
 
(25[0-5]|2[0-4][0-9]|[0-1][0-9]{2}|[1-9][0-9]|[0-9])

Open in new window


  Which is nearly identical to the first simplified pattern to match 1..255.

  Since the last three octets are identical (i.e., 0..255), we can simplify this IP address pattern to be:
 
(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9])(\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)){3}

Open in new window

HonorGodSoftware EngineerCommented:
For a more detailed explanation of using a RegExp to match an IP Address, take a look at this article I wrote: http://www.experts-exchange.com/A_1074.html
CamilliaAuthor Commented:
ok, but still how can I change it to check for ".com", ".org", etc...which part of that by passes the validation of ".com", etc??

I got it from here http://regexlib.com/Search.aspx?k=web+url&c=-1&m=-1&ps=20


In the related question, Terry tests it here http://regexlib.com/RETester.aspx


HonorGodSoftware EngineerCommented:
While dissecting the RegExp, I found this group:
 
ValidationExpression="([nN][oO][nN][eE]|(((http|https):\/\/)?((([a-zA-Z0-9]+\.)?[a-zA-Z0-9\-]+(\.[a-zA-Z]+){1,2})|((25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[0-9])))+(:[1-9][0-9]*)?)+((\/([a-zA-Z0-9_\-\%\~\+]+)?)*)?(\.([a-zA-Z0-9_]+))?(\?([a-zA-Z0-9_\-]+\=[a-z-A-Z0-9_\-\%\~\+]+)?(\&([a-zA-Z0-9_\-]+\=[a-z-A-Z0-9_\-\%\~\+]+)?)*)?)"
                                        |----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|

Open in new window


Which is followed by a '+' meaning that it is allowed to occur 1 or more times...

This makes the RegExp wrong... (sigh)
HonorGodSoftware EngineerCommented:
Still working on understanding it... before I can fix (extend) it... sorry it is taking so long.
CamilliaAuthor Commented:
I think if we compare the first to second one, maybe it could tell us why one validates "www" and the other doesnt. Why one validates ".xxx" and the other doesnt.
HonorGodSoftware EngineerCommented:
What groups (i.e., "com", "edu", "gov", "xxx", "org") do you want to match?
HonorGodSoftware EngineerCommented:
I think that if you replace this portion:
((([a-zA-Z0-9]+\.)?[a-zA-Z0-9\-]+(\.[a-zA-Z]+){1,2})

Open in new window


with this:
(([a-zA-Z0-9]+\.)?[a-zA-Z0-9\-]+(\.(com|org|net|edu|gov|biz|info|name|museum|[a-z]{2})))

Open in new window


That you'll be ok.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
HonorGodSoftware EngineerCommented:
if you want to allow "xxx" you could also add that:
(([a-zA-Z0-9]+\.)?[a-zA-Z0-9\-]+(\.(com|org|net|edu|gov|biz|info|name|museum|xxx|[a-z]{2})))

Open in new window

CamilliaAuthor Commented:
how about if someone has ".aspx"? would that work?? any ".xyz" should match. Let me try it
CamilliaAuthor Commented:
No, I get parser error. How can I test it using this link  not sure how Terry did it (in related question)

http://regexlib.com/RETester.aspx
ValidationExpression="([nN][oO][nN][eE]|(((http|https):\/\/)?(([a-zA-Z0-9]+\.)?[a-zA-Z0-9\-]+(\.(com|org|net|edu|gov|biz|info|name|museum|[a-z]{2})))|((25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[0-9])))+(:[1-9][0-9]*)?)+((\/([a-zA-Z0-9_\-\%\~\+]+)?)*)?(\.([a-zA-Z0-9_]+))?(\?([a-zA-Z0-9_\-]+\=[a-z-A-Z0-9_\-\%\~\+]+)?(\&([a-zA-Z0-9_\-]+\=[a-z-A-Z0-9_\-\%\~\+]+)?)*)?)"

Open in new window

HonorGodSoftware EngineerCommented:
> how about if someone has ".aspx"? would that work?? any ".xyz" should match. Let me try it
   On what?  Please provide an example URL

To test a RegExp on that site:

1. Paste the RegExp (i.e., the stuff between the double quotes) into the Regular Expression text area
2. You may want to uncheck the Multiline checkbox
3. You may want to uncheck the Case Insensitive checkbox
4. Paste (or type) an example to be matched into the "Source" text area
5. Click the "Submit" button

and I get:

Too many )'s.
CamilliaAuthor Commented:
yeah, and I did what you have in  ID: 37334275

 (i tried 2 things actually as far as () but both gave me parse error)

Now, which ) should be removed??
CamilliaAuthor Commented:
I think i got it
ValidationExpression="([nN][oO][nN][eE]|(((http|https):\/\/)?(([a-zA-Z0-9]+\.)?[a-zA-Z0-9\-]+(\.(com|org|net|edu|gov|biz|info|name|museum|xxx|[a-z]{2}))|((25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[0-9])))+(:[1-9][0-9]*)?)+((\/([a-zA-Z0-9_\-\%\~\+]+)?)*)?(\.([a-zA-Z0-9_]+))?(\?([a-zA-Z0-9_\-]+\=[a-z-A-Z0-9_\-\%\~\+]+)?(\&([a-zA-Z0-9_\-]+\=[a-z-A-Z0-9_\-\%\~\+]+)?)*)?)"

Open in new window

CamilliaAuthor Commented:
As far as ".aspx"...can a site be like this:
http://www.mytest.com/mypage.aspx?
CamilliaAuthor Commented:
yes, i tried it and .aspx works http://www.test.com/test.aspx

Thanks so much for your help.
CamilliaAuthor Commented:
HonorGodSoftware EngineerCommented:
Thanks for the grade & points... you put me over 3M ;-)

Good luck, have a great day, and Merry Christmas.
CamilliaAuthor Commented:
Merry Christmas to you too. Kamila.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
ASP.NET

From novice to tech pro — start learning today.