• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 273
  • Last Modified:

My regex is very inefficient and slow...help with improving speed

I'm using two regex's.  I was trying to allow for any wildcard in the domain portion of an email address to be allowed through the first one, and caught by the second one.  My app is getting hung with the first one if the domain is more than like 8 chars.  

emailReg = /^[a-z0-9_-]+((.*?[a-z0-9_-])*)@[a-z0-9]+(.*?[a-z0-9])*\.([a-z]{2,}|[0-9]+)$/i

Also,

should I be using .exec or .test?  What's faster?  

does the /i slow things up?  Is it because of the .*? in the 2nd substring after the @ because I'm basically allowing unlimited strings?

This is my more thorough check

/^[a-z0-9_-]+(([.]?[a-z0-9_-])*)@([a-z0-9])+([\.\-]?[a-z0-9]*)*\.([a-z]{2,}|[0-9]+)$/i

which, if it passes form on the first one, goes through this one.  not my idea but the business wants 2 error messages.

Any help with efficiency and improving speed on both of these would be most appreciated.  Thanks
0
Bishork
Asked:
Bishork
  • 11
  • 9
1 Solution
 
ozoCommented:
(.*?[a-z0-9])*
can match in multiple ways, and it may take exponentially long to fins all possible ways it might match
([^a-z0-9]*[a-z0-9])*
should match only one way so it should be more efficient
0
 
ozoCommented:
in fact
((.*?[a-z0-9_-])*)
is equivalent to just
(.*([a-z0-9_-]))
but without all the backtracking
0
 
BishorkAuthor Commented:
(.*([a-z0-9_-]))

wont let you have a string of any char like .*? would.  .*? allows ######F but .? requires #F#F#F#F#F
0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 
BishorkAuthor Commented:
---
in fact
((.*?[a-z0-9_-])*)
is equivalent to just
(.*([a-z0-9_-]))
----

i thought ? meant lookahead, and without it it wouldnt look ahead.  
oh wait i see what you did thar, nm lemme test that out
0
 
BishorkAuthor Commented:
do you prefer .test or .exec  and do you prefer /i?
0
 
BishorkAuthor Commented:
last q - why'd you use a ^ in

([^a-z0-9]*[a-z0-9])*

if i put that part in the domain portion, i dont need the ^ since it isnt the start of the string right
0
 
ozoCommented:
if you don't have /i
you'd probably want to change [a-z] to [a-zA-Z]
either way should make little difference in efficiency
0
 
BishorkAuthor Commented:
Last Q, then ill give you the pts right now -

which is superior, the way using what you gave me above:

/^[a-z0-9_-]+(.*([a-z0-9_-]))@[a-z0-9]+([^a-z0-9]*[a-z0-9])*\.([a-z]{2,}|[0-9]+)/i

then if you pass through that you go to

/^[a-z0-9_-]+(([.]?[a-z0-9_-])*)@([a-z0-9])+([\.\-]?[a-z0-9]*)\.([a-z]{2,}|[0-9]+)$/i

or this simpler way:

[@.][@.]+;

for the first check, then if test is true, you get an error

if that is false, you go through

/^([a-z0-9._%+-])+@[a-z0-9.-]+\.[a-z0-9]{2,4}$/i

0
 
ozoCommented:
in a character class
[^...] matches any character not in [...]
0
 
BishorkAuthor Commented:
j@cnn.com wont work with (.*([a-z0-9_-])) because its requiring that as a second char.

will it make it do all the backtracing if i make it (.*([a-z0-9_-]))*


0
 
ozoCommented:
Adding that * can potentially cause it to spend a lot of time backtracking, and doesn't change what is matched (it just lets there be a lot if different ways of matching the same thing)
Why do you think you need it?
0
 
BishorkAuthor Commented:
(.*([a-z0-9_-]))* = look for 0 or many of anything thats followed by what's in []

(.*([a-z0-9_-])) = look for one or more (or one any only one).  right?  although I don't have a {1} or a + I thought leaving any delimiter off means it defaults to one of those.  When I run it in my test it's requiring i have two chars before the @

0
 
BishorkAuthor Commented:
Are ([^a-z0-9]*[a-z0-9])* and (.*([a-z0-9])) equally efficient?
0
 
ozoCommented:
not exactly equal, but neither will blow up exponentially.
They differ in what they will match in that the first can match an empty string while the second will not
They also differ in what the parentheses will capture
0
 
ozoCommented:
leaving any delimiter off means it defaults to {1}
(.*([a-z0-9_-])) requires at least one character,
(.*([a-z0-9_-]))* requires at least 0 characters
(.*([a-z0-9_-]))? also requires at least 0 characters
0
 
BishorkAuthor Commented:
so last q - would you prefer i just do the first check as
[.@][.@]+ test for that, if they have @@ @. .@ .. it then youll get error.  if you pass that, then just have

/^([a-z0-9._-])+@[a-z0-9.-]+\.[a-z0-9]{2,4}$/i

make it alot simpler?
0
 
ozoCommented:
Exactly what strings do you want to match, and what strings do you want to not match?
0
 
BishorkAuthor Commented:
the first one i want to catch bad form and the second i want to catch invalid symbols in the address.

like - j...@joe.com should be caught by the first one, j#ck@joe.com should pass through the first one and be caught by the second.  see what i was trying to do by allowing anything to pass through the first one after the first letter?
0
 
ozoCommented:
Although I don't know whether j... or j#ck happen to exist on joe.com, those happen to be perfectly valid email addresses
0
 
BishorkAuthor Commented:
thanks
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

  • 11
  • 9
Tackle projects and never again get stuck behind a technical roadblock.
Join Now