Regular Expressions

6K

Solutions

7

Articles & Videos

5K

Contributors

A regular expression ("regex") is a sequence of characters that define a search pattern, mainly for use in pattern matching with strings, or string matching, i.e. "find and replace"-like operations. Regular expression processors are found in several search engines, search and replace dialogs of several word processors and text editors, and in the command lines of text processing utilities, such as sed and AWK. Many programming languages provide regular expression capabilities, some built-in, for example Perl, JavaScript, Ruby, AWK, and Tcl, and others via a standard library, for example .NET languages, Java, Python and C++ (since C++11). Most other languages offer regular expressions via a library.

Share tech news, updates, or what's on your mind.

Sign up to Post

Whatever be the reason, if you are working on web development side,  you will need day-today validation codes like email validation, date validation , IP address validation, phone validation on any of the edit page or say at the time of registration (minimum).

You can do these by using JavaScript (client-side) or through any language like PHP, Perl etc. (Back end side). As right now I am working on Perl, so my validation codes would be in Perl only.  

Date Validation using Regular expression in Perl
Date consists of three main things i.e. Date of the month, month of the year and year itself and one more thing that is required is the separator (/ or –  ). You may choose any other separator, no issues. So, after combining all these things we will get date like MM/DD/YYYY or DD/MM/YYYY or YYYY/MM/DD (you can use – or anything instead of /, that I used here). DD being the date, MM being the month and YYYY being the year.

So , to  accomplish the task, we have to check that:
a.     DD of the date should be between 1 and 31 ,
b.    MM should be between 1 and 12
c.   YYYY should be from 1900 and till date (Instead of today’s date you can put  boundary of any date)
d.   If MM is 02 i.e. February, then DD should be between 1 and 28 and if its leap year then DD can extend up to 29.
So, here is the magic of line which will do all validation itself, using regular expression:

It would match dd-mm-yyyy or dd/mm/yyyy pattern for rest of the patterns you have to …
0
 
LVL 6

Author Comment

by:Sanjeev Jaiswal
Comment Utility
Yes you are right. Thanks for your review.
I just tried to keep it as simple as i can. Otherwise validating i na single would make it more complex and less preferable.
0
 
LVL 75

Expert Comment

by:käµfm³d 👽
Comment Utility
I see time and time again posts for people asking how to validate dates and other strange values that would be better serviced by full logic, yet people still want to use Regex to do the validation. I'm guessing it's because they don't fully understand what Regex is or is useful for. I think the article is well intentioned and useful to those looking for that sort of thing. Keep at it  :)
0
Industry Leaders: We Want Your Opinion!
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Do you hate spam? I do, and I am willing to bet you do as well. I often wonder, though, "if people hate spam so much, why do they still post their email addresses on the web?" I'm not talking about a plain-text posting here. I am referring to the fancy obfuscations of said addresses that I see posted on profiles (and message boards). Don't get me wrong, I think security-through-obscurity is a good approach to enabling communication with another party via your established email address. What gets me is how simple the obfuscations appear to be. In this article I am going to demonstrate how simple it is to de-obfuscate some of the simpler patterns I have seen using regular expressions (regex). I will also offer some alternative obfuscation methods. Though I am going to focus on email, the ideas presented here could be applied to any online moniker that could be used for spamming purposes (e.g. Twitter, Facebook, etc.). Although some of the discussion will be technical, the audience of this article is anyone who actively posts their email to public web sites.


Disclaimer

I am not advocating you go out and "screen-scrape" anyone's profile or any message boards. The intent of my article is to demonstrate the inherent vulnerability of certain obfuscation patterns, not to instigate some "script-kiddie" to go address fishing. If you are a script-kiddie, or just a genuine degenerate, please don't hold me responsible for your actions. You have been warned!


Prerequisites

20
 
LVL 75

Author Comment

by:käµfm³d 👽
Comment Utility
tfewster,

Interesting angle. Not foolproof, at least not to someone actively reviewing the addresses returned by their scraper, but a good approach. I just hope you don't have to change your address too often  ; )
0
 
LVL 61

Expert Comment

by:mbizup
Comment Utility
Nice article!

Voted "yes" above.
0
As most anyone who uses or has come across them can attest to, regular expressions (regex) are a complicated bit of magic. Packed so succinctly within their cryptic syntax lies a great deal of power. It's not the "take over the world" kind of power, at least not to the average programmer, but it is the kind of power that can be used to save numerous lines of code. One of more complicated regex tools I'd like to describe to you is that of lookaround. When executed properly, lookaround can supercharge your patterns to provide you pattern-matching capabilities otherwise achieved through numerous procedures and even more numerous lines of code.

Regular expression lookaround is not a glaringly simple concept when you first see it. For this reason, readers of this article should at least be familiar with regular expressions in general. EE contributor BatuhanCetin has written a nice introduction to regular expressions here: Regular Expressions Starter Guide.

Outside of its complexity, another thing to be mindful of is that not every regex engine supports lookaround. If you plan on experimenting with any of the patterns demonstrated in this article, you should confirm that your editor or language supports lookaround. As described in the section Types of Lookaround, the two directions of lookaround are lookahead and lookbehind.  Regex engines can implement none, one, or both directions. Be sure …
14
 
LVL 40

Expert Comment

by:footech
Comment Utility
Nice article!  One point, near the start, in the "Lookahead" section, you have a regex which is
^(?=.*[0-9])[a-zA-Z0-9]+$
and you talk about the dot-star being non-greedy...  Shouldn't this then be like the following?
^(?=.*?[0-9])[a-zA-Z0-9]+$

Hah!  I'm getting a headache and dizzy trying to work through this... though I'm pretty sure I came up with the same thing once when banging my head against the keyboard. :)
\s+|(?<=\w)(?=\W)(?!(?<=\d)(?=([-/])\d\d?\1(?:\d\d){1,2}))(?!(?<=\d([-/])\d\d?)(?=\2(?:\d\d){1,2}))|(?<=\W)(?=\W)|(?<=\W)(?=\w)(?!(?<=\d([-/]))(?=\d\d?\3(?:\d\d){1,2}))(?!(?<=\d\4\d\d?([-/]))(?=(?:\d\d){1,2}))

Open in new window

0
 
LVL 75

Author Comment

by:käµfm³d 👽
Comment Utility
@footech

Ah the difference a single character can make  = )

Yes, you are correct. I have put in for the correction to be made.

(Sorry for the delay; I'm terrible about checking my email [for EE notices]!)
0
I have been reconstructing a PHP-based application that has grown into a full blown interface system over the last ten years by a developer that has now gone into business for himself building websites. I am not incredibly fond of writing PHP code on a daily basis and have been working on getting the system migrated to an up to date implementation of PHP 5.3.1 and I’ve run across some issues in the migration that I thought warranted documenting.

Problem 1
The application as it stands is currently on a Linux box running PHP 4.4.2 which allows you to use variables without pre-defining them.  So if you want to write a conditional loop that takes a variable named $var and loop through a query adding things to the variable. You don’t need to pre-define the variable you just put in the loop $var .= “new conditions” and the variable gets appended including the new string.

The problem is security of course and the most recent implementations do not allow a variable to be appended unless it pre-exists. So I needed to devise a way to find every existence of the
   $var .=
no matter what the variable was called and then append the code so that it now says
    if (!isset($variable)) { $variable=”";} $variable .= “

Solution:
Adobe Dreamweaver (or any other API that includes a find and replace utilizing regular expressions) I just like using Dreamweaver because I’ve been using it for so long. I am sure you can do the same thing in many other APIs like …
0
by Batuhan Cetin

Regular expression is a language that we use to edit a string or retrieve sub-strings that meets specific rules from a text. A regular expression can be applied to a set of string variables.

There are many RegEx engines for use and these engines have different syntax and compilation. Perl5 is the most popular syntax which runs on NFA engine. There are three main types of engines: NFA, POSIX and DFA. Please see the references section at the end of the article for deatiled information.

Regular expressions are hard to explain by words and looks frightening. But if you have the patience and courage to jump into, it is one of the most useful and funny languages you may ever learn. So, here are the most used special characters, with examples.

Special Characters Used in Regular Expressions

"()" character

Matches the pattern between the parenthesis or used to logically group patterns or characters together.

RegEx: (exchange)
Match: exchange in expertsexchange

"." character

The "dot" matches a single character. Note that it does not match line breaks unless the engine is operating in single line mode.

RegEx: experts.
Match: experts, experts1, expertsa, ...

"*" character

This returns a result with zero or more occurences of the character before this. For example:

RegEx: experts*
Match: expert, experts, expertss, expertsss, ...

Regex: exper(ts)*
Match: exper, experts, expertsts, expertststs ...

"?" character
13
 
LVL 75

Expert Comment

by:käµfm³d 👽
Comment Utility
Hello BatuhanCetin,

It seems I'm now the one who has been away for some time! Thanks. I'm finishing up one now  :)
0
 

Expert Comment

by:xenium
Comment Utility
Thanks a lot this guide is proving useful having come from google docs complete lack of help on the topic. I hadn't even heard of "Regular expression" which must be one of the biggest misnomers in programming!

I've still a way to go...if anyone can help i've got a question open on the topic..
http://www.experts-exchange.com/Programming/Languages/Regular_Expressions/Q_28531374.html

Thanks a lot
0

Regular Expressions

6K

Solutions

7

Articles & Videos

5K

Contributors

A regular expression ("regex") is a sequence of characters that define a search pattern, mainly for use in pattern matching with strings, or string matching, i.e. "find and replace"-like operations. Regular expression processors are found in several search engines, search and replace dialogs of several word processors and text editors, and in the command lines of text processing utilities, such as sed and AWK. Many programming languages provide regular expression capabilities, some built-in, for example Perl, JavaScript, Ruby, AWK, and Tcl, and others via a standard library, for example .NET languages, Java, Python and C++ (since C++11). Most other languages offer regular expressions via a library.