Solved

My regex is very inefficient and slow...help with improving speed

Posted on 2008-10-17
20
255 Views
Last Modified: 2012-05-05
I'm using two regex's.  I was trying to allow for any wildcard in the domain portion of an email address to be allowed through the first one, and caught by the second one.  My app is getting hung with the first one if the domain is more than like 8 chars.  

emailReg = /^[a-z0-9_-]+((.*?[a-z0-9_-])*)@[a-z0-9]+(.*?[a-z0-9])*\.([a-z]{2,}|[0-9]+)$/i

Also,

should I be using .exec or .test?  What's faster?  

does the /i slow things up?  Is it because of the .*? in the 2nd substring after the @ because I'm basically allowing unlimited strings?

This is my more thorough check

/^[a-z0-9_-]+(([.]?[a-z0-9_-])*)@([a-z0-9])+([\.\-]?[a-z0-9]*)*\.([a-z]{2,}|[0-9]+)$/i

which, if it passes form on the first one, goes through this one.  not my idea but the business wants 2 error messages.

Any help with efficiency and improving speed on both of these would be most appreciated.  Thanks
0
Comment
Question by:Bishork
  • 11
  • 9
20 Comments
 
LVL 84

Accepted Solution

by:
ozo earned 500 total points
ID: 22745162
(.*?[a-z0-9])*
can match in multiple ways, and it may take exponentially long to fins all possible ways it might match
([^a-z0-9]*[a-z0-9])*
should match only one way so it should be more efficient
0
 
LVL 84

Expert Comment

by:ozo
ID: 22745211
in fact
((.*?[a-z0-9_-])*)
is equivalent to just
(.*([a-z0-9_-]))
but without all the backtracking
0
 

Author Comment

by:Bishork
ID: 22745362
(.*([a-z0-9_-]))

wont let you have a string of any char like .*? would.  .*? allows ######F but .? requires #F#F#F#F#F
0
 

Author Comment

by:Bishork
ID: 22745378
---
in fact
((.*?[a-z0-9_-])*)
is equivalent to just
(.*([a-z0-9_-]))
----

i thought ? meant lookahead, and without it it wouldnt look ahead.  
oh wait i see what you did thar, nm lemme test that out
0
 

Author Comment

by:Bishork
ID: 22745391
do you prefer .test or .exec  and do you prefer /i?
0
 

Author Comment

by:Bishork
ID: 22745409
last q - why'd you use a ^ in

([^a-z0-9]*[a-z0-9])*

if i put that part in the domain portion, i dont need the ^ since it isnt the start of the string right
0
 
LVL 84

Expert Comment

by:ozo
ID: 22745516
if you don't have /i
you'd probably want to change [a-z] to [a-zA-Z]
either way should make little difference in efficiency
0
 

Author Comment

by:Bishork
ID: 22745554
Last Q, then ill give you the pts right now -

which is superior, the way using what you gave me above:

/^[a-z0-9_-]+(.*([a-z0-9_-]))@[a-z0-9]+([^a-z0-9]*[a-z0-9])*\.([a-z]{2,}|[0-9]+)/i

then if you pass through that you go to

/^[a-z0-9_-]+(([.]?[a-z0-9_-])*)@([a-z0-9])+([\.\-]?[a-z0-9]*)\.([a-z]{2,}|[0-9]+)$/i

or this simpler way:

[@.][@.]+;

for the first check, then if test is true, you get an error

if that is false, you go through

/^([a-z0-9._%+-])+@[a-z0-9.-]+\.[a-z0-9]{2,4}$/i

0
 
LVL 84

Expert Comment

by:ozo
ID: 22745563
in a character class
[^...] matches any character not in [...]
0
 

Author Comment

by:Bishork
ID: 22745735
j@cnn.com wont work with (.*([a-z0-9_-])) because its requiring that as a second char.

will it make it do all the backtracing if i make it (.*([a-z0-9_-]))*


0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 
LVL 84

Expert Comment

by:ozo
ID: 22745818
Adding that * can potentially cause it to spend a lot of time backtracking, and doesn't change what is matched (it just lets there be a lot if different ways of matching the same thing)
Why do you think you need it?
0
 

Author Comment

by:Bishork
ID: 22754070
(.*([a-z0-9_-]))* = look for 0 or many of anything thats followed by what's in []

(.*([a-z0-9_-])) = look for one or more (or one any only one).  right?  although I don't have a {1} or a + I thought leaving any delimiter off means it defaults to one of those.  When I run it in my test it's requiring i have two chars before the @

0
 

Author Comment

by:Bishork
ID: 22754164
Are ([^a-z0-9]*[a-z0-9])* and (.*([a-z0-9])) equally efficient?
0
 
LVL 84

Expert Comment

by:ozo
ID: 22755050
not exactly equal, but neither will blow up exponentially.
They differ in what they will match in that the first can match an empty string while the second will not
They also differ in what the parentheses will capture
0
 
LVL 84

Expert Comment

by:ozo
ID: 22755057
leaving any delimiter off means it defaults to {1}
(.*([a-z0-9_-])) requires at least one character,
(.*([a-z0-9_-]))* requires at least 0 characters
(.*([a-z0-9_-]))? also requires at least 0 characters
0
 

Author Comment

by:Bishork
ID: 22755065
so last q - would you prefer i just do the first check as
[.@][.@]+ test for that, if they have @@ @. .@ .. it then youll get error.  if you pass that, then just have

/^([a-z0-9._-])+@[a-z0-9.-]+\.[a-z0-9]{2,4}$/i

make it alot simpler?
0
 
LVL 84

Expert Comment

by:ozo
ID: 22755101
Exactly what strings do you want to match, and what strings do you want to not match?
0
 

Author Comment

by:Bishork
ID: 22755112
the first one i want to catch bad form and the second i want to catch invalid symbols in the address.

like - j...@joe.com should be caught by the first one, j#ck@joe.com should pass through the first one and be caught by the second.  see what i was trying to do by allowing anything to pass through the first one after the first letter?
0
 
LVL 84

Expert Comment

by:ozo
ID: 22755260
Although I don't know whether j... or j#ck happen to exist on joe.com, those happen to be perfectly valid email addresses
0
 

Author Closing Comment

by:Bishork
ID: 31507299
thanks
0

Featured Post

Threat Intelligence Starter Resources

Integrating threat intelligence can be challenging, and not all companies are ready. These resources can help you build awareness and prepare for defense.

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
angularls and plnkr 14 18
Cross domain request in Javescript 9 28
JavaScript Scope issue 4 20
Javascript generate change location 12 24
It is a general practice to get rid of old user profiles on a computer  in a LAN environment. As I have been working with a company in a LAN environment where users move from one place to some other place at times. This will make many user profil…
Boost your ability to deliver ambitious and competitive web apps by choosing the right JavaScript framework to best suit your project’s needs.
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
The viewer will learn how to count occurrences of each item in an array.

705 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now