Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people, just like you, are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
Solved

My regex is very inefficient and slow...help with improving speed

Posted on 2008-10-17
20
258 Views
Last Modified: 2012-05-05
I'm using two regex's.  I was trying to allow for any wildcard in the domain portion of an email address to be allowed through the first one, and caught by the second one.  My app is getting hung with the first one if the domain is more than like 8 chars.  

emailReg = /^[a-z0-9_-]+((.*?[a-z0-9_-])*)@[a-z0-9]+(.*?[a-z0-9])*\.([a-z]{2,}|[0-9]+)$/i

Also,

should I be using .exec or .test?  What's faster?  

does the /i slow things up?  Is it because of the .*? in the 2nd substring after the @ because I'm basically allowing unlimited strings?

This is my more thorough check

/^[a-z0-9_-]+(([.]?[a-z0-9_-])*)@([a-z0-9])+([\.\-]?[a-z0-9]*)*\.([a-z]{2,}|[0-9]+)$/i

which, if it passes form on the first one, goes through this one.  not my idea but the business wants 2 error messages.

Any help with efficiency and improving speed on both of these would be most appreciated.  Thanks
0
Comment
Question by:Bishork
  • 11
  • 9
20 Comments
 
LVL 84

Accepted Solution

by:
ozo earned 500 total points
ID: 22745162
(.*?[a-z0-9])*
can match in multiple ways, and it may take exponentially long to fins all possible ways it might match
([^a-z0-9]*[a-z0-9])*
should match only one way so it should be more efficient
0
 
LVL 84

Expert Comment

by:ozo
ID: 22745211
in fact
((.*?[a-z0-9_-])*)
is equivalent to just
(.*([a-z0-9_-]))
but without all the backtracking
0
 

Author Comment

by:Bishork
ID: 22745362
(.*([a-z0-9_-]))

wont let you have a string of any char like .*? would.  .*? allows ######F but .? requires #F#F#F#F#F
0
Webinar: Aligning, Automating, Winning

Join Dan Russo, Senior Manager of Operations Intelligence, for an in-depth discussion on how Dealertrack, leading provider of integrated digital solutions for the automotive industry, transformed their DevOps processes to increase collaboration and move with greater velocity.

 

Author Comment

by:Bishork
ID: 22745378
---
in fact
((.*?[a-z0-9_-])*)
is equivalent to just
(.*([a-z0-9_-]))
----

i thought ? meant lookahead, and without it it wouldnt look ahead.  
oh wait i see what you did thar, nm lemme test that out
0
 

Author Comment

by:Bishork
ID: 22745391
do you prefer .test or .exec  and do you prefer /i?
0
 

Author Comment

by:Bishork
ID: 22745409
last q - why'd you use a ^ in

([^a-z0-9]*[a-z0-9])*

if i put that part in the domain portion, i dont need the ^ since it isnt the start of the string right
0
 
LVL 84

Expert Comment

by:ozo
ID: 22745516
if you don't have /i
you'd probably want to change [a-z] to [a-zA-Z]
either way should make little difference in efficiency
0
 

Author Comment

by:Bishork
ID: 22745554
Last Q, then ill give you the pts right now -

which is superior, the way using what you gave me above:

/^[a-z0-9_-]+(.*([a-z0-9_-]))@[a-z0-9]+([^a-z0-9]*[a-z0-9])*\.([a-z]{2,}|[0-9]+)/i

then if you pass through that you go to

/^[a-z0-9_-]+(([.]?[a-z0-9_-])*)@([a-z0-9])+([\.\-]?[a-z0-9]*)\.([a-z]{2,}|[0-9]+)$/i

or this simpler way:

[@.][@.]+;

for the first check, then if test is true, you get an error

if that is false, you go through

/^([a-z0-9._%+-])+@[a-z0-9.-]+\.[a-z0-9]{2,4}$/i

0
 
LVL 84

Expert Comment

by:ozo
ID: 22745563
in a character class
[^...] matches any character not in [...]
0
 

Author Comment

by:Bishork
ID: 22745735
j@cnn.com wont work with (.*([a-z0-9_-])) because its requiring that as a second char.

will it make it do all the backtracing if i make it (.*([a-z0-9_-]))*


0
 
LVL 84

Expert Comment

by:ozo
ID: 22745818
Adding that * can potentially cause it to spend a lot of time backtracking, and doesn't change what is matched (it just lets there be a lot if different ways of matching the same thing)
Why do you think you need it?
0
 

Author Comment

by:Bishork
ID: 22754070
(.*([a-z0-9_-]))* = look for 0 or many of anything thats followed by what's in []

(.*([a-z0-9_-])) = look for one or more (or one any only one).  right?  although I don't have a {1} or a + I thought leaving any delimiter off means it defaults to one of those.  When I run it in my test it's requiring i have two chars before the @

0
 

Author Comment

by:Bishork
ID: 22754164
Are ([^a-z0-9]*[a-z0-9])* and (.*([a-z0-9])) equally efficient?
0
 
LVL 84

Expert Comment

by:ozo
ID: 22755050
not exactly equal, but neither will blow up exponentially.
They differ in what they will match in that the first can match an empty string while the second will not
They also differ in what the parentheses will capture
0
 
LVL 84

Expert Comment

by:ozo
ID: 22755057
leaving any delimiter off means it defaults to {1}
(.*([a-z0-9_-])) requires at least one character,
(.*([a-z0-9_-]))* requires at least 0 characters
(.*([a-z0-9_-]))? also requires at least 0 characters
0
 

Author Comment

by:Bishork
ID: 22755065
so last q - would you prefer i just do the first check as
[.@][.@]+ test for that, if they have @@ @. .@ .. it then youll get error.  if you pass that, then just have

/^([a-z0-9._-])+@[a-z0-9.-]+\.[a-z0-9]{2,4}$/i

make it alot simpler?
0
 
LVL 84

Expert Comment

by:ozo
ID: 22755101
Exactly what strings do you want to match, and what strings do you want to not match?
0
 

Author Comment

by:Bishork
ID: 22755112
the first one i want to catch bad form and the second i want to catch invalid symbols in the address.

like - j...@joe.com should be caught by the first one, j#ck@joe.com should pass through the first one and be caught by the second.  see what i was trying to do by allowing anything to pass through the first one after the first letter?
0
 
LVL 84

Expert Comment

by:ozo
ID: 22755260
Although I don't know whether j... or j#ck happen to exist on joe.com, those happen to be perfectly valid email addresses
0
 

Author Closing Comment

by:Bishork
ID: 31507299
thanks
0

Featured Post

Master Your Team's Linux and Cloud Stack

Come see why top tech companies like Mailchimp and Media Temple use Linux Academy to build their employee training programs.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Batch, VBS, and scripts in general are incredibly useful for repetitive tasks.  Some tasks can take a while to complete and it can be annoying to check back only to discover that your script finished 5 minutes ago.  Some scripts may complete nearly …
This article will show, step by step, how to integrate R code into a R Sweave document
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.
In this fifth video of the Xpdf series, we discuss and demonstrate the PDFdetach utility, which is able to list and, more importantly, extract attachments that are embedded in PDF files. It does this via a command line interface, making it suitable …

809 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question