Solved

My regex is very inefficient and slow...help with improving speed

Posted on 2008-10-17
20
259 Views
Last Modified: 2012-05-05
I'm using two regex's.  I was trying to allow for any wildcard in the domain portion of an email address to be allowed through the first one, and caught by the second one.  My app is getting hung with the first one if the domain is more than like 8 chars.  

emailReg = /^[a-z0-9_-]+((.*?[a-z0-9_-])*)@[a-z0-9]+(.*?[a-z0-9])*\.([a-z]{2,}|[0-9]+)$/i

Also,

should I be using .exec or .test?  What's faster?  

does the /i slow things up?  Is it because of the .*? in the 2nd substring after the @ because I'm basically allowing unlimited strings?

This is my more thorough check

/^[a-z0-9_-]+(([.]?[a-z0-9_-])*)@([a-z0-9])+([\.\-]?[a-z0-9]*)*\.([a-z]{2,}|[0-9]+)$/i

which, if it passes form on the first one, goes through this one.  not my idea but the business wants 2 error messages.

Any help with efficiency and improving speed on both of these would be most appreciated.  Thanks
0
Comment
Question by:Bishork
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 11
  • 9
20 Comments
 
LVL 84

Accepted Solution

by:
ozo earned 500 total points
ID: 22745162
(.*?[a-z0-9])*
can match in multiple ways, and it may take exponentially long to fins all possible ways it might match
([^a-z0-9]*[a-z0-9])*
should match only one way so it should be more efficient
0
 
LVL 84

Expert Comment

by:ozo
ID: 22745211
in fact
((.*?[a-z0-9_-])*)
is equivalent to just
(.*([a-z0-9_-]))
but without all the backtracking
0
 

Author Comment

by:Bishork
ID: 22745362
(.*([a-z0-9_-]))

wont let you have a string of any char like .*? would.  .*? allows ######F but .? requires #F#F#F#F#F
0
Salesforce Has Never Been Easier

Improve and reinforce salesforce training & adoption using WalkMe's digital adoption platform. Start saving on costly employee training by creating fast intuitive Walk-Thrus for Salesforce. Claim your Free Account Now

 

Author Comment

by:Bishork
ID: 22745378
---
in fact
((.*?[a-z0-9_-])*)
is equivalent to just
(.*([a-z0-9_-]))
----

i thought ? meant lookahead, and without it it wouldnt look ahead.  
oh wait i see what you did thar, nm lemme test that out
0
 

Author Comment

by:Bishork
ID: 22745391
do you prefer .test or .exec  and do you prefer /i?
0
 

Author Comment

by:Bishork
ID: 22745409
last q - why'd you use a ^ in

([^a-z0-9]*[a-z0-9])*

if i put that part in the domain portion, i dont need the ^ since it isnt the start of the string right
0
 
LVL 84

Expert Comment

by:ozo
ID: 22745516
if you don't have /i
you'd probably want to change [a-z] to [a-zA-Z]
either way should make little difference in efficiency
0
 

Author Comment

by:Bishork
ID: 22745554
Last Q, then ill give you the pts right now -

which is superior, the way using what you gave me above:

/^[a-z0-9_-]+(.*([a-z0-9_-]))@[a-z0-9]+([^a-z0-9]*[a-z0-9])*\.([a-z]{2,}|[0-9]+)/i

then if you pass through that you go to

/^[a-z0-9_-]+(([.]?[a-z0-9_-])*)@([a-z0-9])+([\.\-]?[a-z0-9]*)\.([a-z]{2,}|[0-9]+)$/i

or this simpler way:

[@.][@.]+;

for the first check, then if test is true, you get an error

if that is false, you go through

/^([a-z0-9._%+-])+@[a-z0-9.-]+\.[a-z0-9]{2,4}$/i

0
 
LVL 84

Expert Comment

by:ozo
ID: 22745563
in a character class
[^...] matches any character not in [...]
0
 

Author Comment

by:Bishork
ID: 22745735
j@cnn.com wont work with (.*([a-z0-9_-])) because its requiring that as a second char.

will it make it do all the backtracing if i make it (.*([a-z0-9_-]))*


0
 
LVL 84

Expert Comment

by:ozo
ID: 22745818
Adding that * can potentially cause it to spend a lot of time backtracking, and doesn't change what is matched (it just lets there be a lot if different ways of matching the same thing)
Why do you think you need it?
0
 

Author Comment

by:Bishork
ID: 22754070
(.*([a-z0-9_-]))* = look for 0 or many of anything thats followed by what's in []

(.*([a-z0-9_-])) = look for one or more (or one any only one).  right?  although I don't have a {1} or a + I thought leaving any delimiter off means it defaults to one of those.  When I run it in my test it's requiring i have two chars before the @

0
 

Author Comment

by:Bishork
ID: 22754164
Are ([^a-z0-9]*[a-z0-9])* and (.*([a-z0-9])) equally efficient?
0
 
LVL 84

Expert Comment

by:ozo
ID: 22755050
not exactly equal, but neither will blow up exponentially.
They differ in what they will match in that the first can match an empty string while the second will not
They also differ in what the parentheses will capture
0
 
LVL 84

Expert Comment

by:ozo
ID: 22755057
leaving any delimiter off means it defaults to {1}
(.*([a-z0-9_-])) requires at least one character,
(.*([a-z0-9_-]))* requires at least 0 characters
(.*([a-z0-9_-]))? also requires at least 0 characters
0
 

Author Comment

by:Bishork
ID: 22755065
so last q - would you prefer i just do the first check as
[.@][.@]+ test for that, if they have @@ @. .@ .. it then youll get error.  if you pass that, then just have

/^([a-z0-9._-])+@[a-z0-9.-]+\.[a-z0-9]{2,4}$/i

make it alot simpler?
0
 
LVL 84

Expert Comment

by:ozo
ID: 22755101
Exactly what strings do you want to match, and what strings do you want to not match?
0
 

Author Comment

by:Bishork
ID: 22755112
the first one i want to catch bad form and the second i want to catch invalid symbols in the address.

like - j...@joe.com should be caught by the first one, j#ck@joe.com should pass through the first one and be caught by the second.  see what i was trying to do by allowing anything to pass through the first one after the first letter?
0
 
LVL 84

Expert Comment

by:ozo
ID: 22755260
Although I don't know whether j... or j#ck happen to exist on joe.com, those happen to be perfectly valid email addresses
0
 

Author Closing Comment

by:Bishork
ID: 31507299
thanks
0

Featured Post

MS Dynamics Made Instantly Simpler

Make Your Microsoft Dynamics Investment Count  & Drastically Decrease Training Time by Providing Intuitive Step-By-Step WalkThru Tutorials.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Popup write two lines 3 22
Merging text files strings with filename 18 45
-OutVariable to Global 1 22
javascript works in Chrome, but none of the other browsers 14 34
This article is meant to give a basic understanding of how to use R Sweave as a way to merge LaTeX and R code seamlessly into one presentable document.
This article will show, step by step, how to integrate R code into a R Sweave document
Learn the basics of modules and packages in Python. Every Python file is a module, ending in the suffix: .py: Modules are a collection of functions and variables.: Packages are a collection of modules.: Module functions and variables are accessed us…
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…

733 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question