<

[Webinar] Learn how to a build a cloud-first strategyRegister Now

x

Regular Expressions Starter Guide

Published on
12,446 Points
5,146 Views
13 Endorsements
Last Modified:
Approved
by Batuhan Cetin

Regular expression is a language that we use to edit a string or retrieve sub-strings that meets specific rules from a text. A regular expression can be applied to a set of string variables.

There are many RegEx engines for use and these engines have different syntax and compilation. Perl5 is the most popular syntax which runs on NFA engine. There are three main types of engines: NFA, POSIX and DFA. Please see the references section at the end of the article for deatiled information.

Regular expressions are hard to explain by words and looks frightening. But if you have the patience and courage to jump into, it is one of the most useful and funny languages you may ever learn. So, here are the most used special characters, with examples.

Special Characters Used in Regular Expressions

"()" character

Matches the pattern between the parenthesis or used to logically group patterns or characters together.

RegEx: (exchange)
Match: exchange in expertsexchange

"." character

The "dot" matches a single character. Note that it does not match line breaks unless the engine is operating in single line mode.

RegEx: experts.
Match: experts, experts1, expertsa, ...

"*" character

This returns a result with zero or more occurences of the character before this. For example:

RegEx: experts*
Match: expert, experts, expertss, expertsss, ...

Regex: exper(ts)*
Match: exper, experts, expertsts, expertststs ...

"?" character

This returns a result with or without the character before it.

RegEx: experts?
Match: expert, experts

RegEx: exper(ts)?
Match: exper, experts

"-" character

This character defines a range of characters in a character class. It also specifies a hyphen if placed immediately after the opening "[". If you want either "-" or "]" itself to be a member of a class, put it at the start of the list (possibly after a "^"), or escape it with a backslash. "-" is also taken literally when it is at the end of the list, just before the closing "]". The following all specify the same class of three characters: [-az] , [az-] , and [a\-z] . All are different from [a-z] , which specifies a class containing twenty-six characters, even on EBCDIC-based character sets. Also, if you try to use the character classes \w , \W , \s, \S , \d , or \D as endpoints of a range, the "-" is understood literally.

RegEx: [1-9]
Match: All numbers between 1 and 9

RegEx: [a-z]
Match: All lowercase letters from a to z

RegEx: [A-Z]
Match: All uppercase characters from A to Z

"[]" character

This matches the array or any of the characters enclosed.

RegEx: expert[SAB]
Match: expertS, expertA, expertB

RegEx: expert[A-Z]
Match: expertA, expertB, expertC, ...., expertZ

RegEx: expert[1-9]
Match: expert1, expert2, ...., expert9

"^" or "\A" character

This matches the start of a string. This character will behave differently depending upon whether the engine is operating in multi-line mode or not; it will also match line breaks in multi-line mode.

RegEx: ^(experts) or \A(experts)
Match: experts in expertsexchange

RegEx: ^. or \A.
Match: E in Experts

If used in brackets, it has a negative meaning that matches the characters other than enclosed ones. This only applies to "^" character.

RegEx: [^exp]
Match: r, t, s in experts

RegEx: [^1-9]
Match: m, s, t, r in m1st3r

"$" or "\Z" character

This matches the end of the string. This character will behave differently depending upon whether the engine is operating in multi-line mode or not; it will also match line breaks in multi-line mode.

RegEx: (exchange)$ or (exchange)\Z
Match: exchange in expertsexchange

RegEx: .$ or .\Z
Match: s in experts

"{n}" character

This matches exactly n number of characters before it.

RegEx: experts{5}
Match:expertsssss
This expression pattern will not match "expertss" or "experts"

"{n,}" character

Matches the character before it at least n times

RegEx: experts{2,}
Match: expertss or expertssssss
This will not match "experts"

"{n,m}" character

In this pattern, the integer "n" MUST BE smaller than or equal to "m". This matches the character before it at least "n" times AND at most "m" times

RegEx: experts{2,4}
Match: expertss, expertsss, expertssss

"\" character

The "backslash" is the escape character for any special characters after it.

RegEx: expert\^
Match: expert^

RegEx: [\^\(]
Match: ^ or (

"\d", "\w" and "\s" characters

Matches digits, word characters (letters, digits, underscores) and whitespaces (tabs, spaces, line breaks) relatively.

RegEx: \dm1st3r\d
Match: 5m1st3r2

"\D", "\W" and "\S" characters

The negated versions of the above.

RegEx: 1\D2\W3\S4
Match: 1g2%354

"\b" character

This matches a backspace character when used inside a character class.

RegEx: [\b]
Match: \

Matches at the position between a word character (anything matched by \w) and a non-word character (anything matched by [^\w] or \W) as well as at the start and/or end of the string if the first and/or last characters in the string are word characters.

RegEx: .\b
Match: s in experts

"\B" character

Matches at the position between two word characters (the position between \w\w) as well as at the position between two non-word characters (\W\W).

RegEx: \B.\B
Match: x in exp

"|" character

The OR character matches either characters on the left and right side.

RegEx: x|rts
Match: x or rts in experts

RegEx: (x|r)ts
Match: x or r in experts

It can also be used to combine expression patterns:

RegEx: ((0?[1-9])|(1[0-9])|(2[0-9]))
Match: Numbers from 1 to 29

"\Q...\E" character

Matches the characters between "\Q" and "\E", suppressing the meaning of special characters.

RegEx: "\Q+*/\E
Match: +*/

Now let's put these into use to understand it better:

Building a Date Expression

Let's build a date expression that will catch the dates in a text in the format of MM/DD/YY.

We will start with the MM part:

(0?[1-9])

Open in new window


Defining months written as: 1, 2, .., 9, 01, 02, .., 09

(1[0-2])

Open in new window

Defining months written as: 10, 11, 12

When we combine these with the "OR (|)" operator, we get the month part:

((0?[1-9])|(1[0-2]))

Open in new window


Now let's define the DD part:

(0?[1-9])

Open in new window

Defining days written as: 1, 2, .., 9, 01, 02, ..., 09

([12][0-9])

Open in new window

Defining days written as: 10, 11, ..., 29

(3[01])

Open in new window

Defining days written as: 30, 31

Combining these with the OR operator, we'll get:

((0?[1-9])|([12][0-9])|(3[01]))

Open in new window


Now the YYYY part:

([12][0-9][0-9][0-9])

Open in new window

Defines years between 1000 and 2999

Finally, let's combine them to get our MM/DD/YYYY result:

(0?[1-9]|1[0-2])(/)((0?[1-9])|([12][0-9])|(3[01]))(/)([12][0-9][0-9][0-9]))

Open in new window


References

Before using RegEx, I personally recommend you to decide which engine to use and read the resources about that engine as all the engines' behaviour differs when interpreting the special characters. Please review the references below if you want to dive deeper into Regular Expressions.

http://www.regular-expressions.info/
http://regexlib.com/

The following is a great reference for learning more about how a regex engine (NFA) works:
http://www.foo.be/docs/tpj/issues/vol1_2/tpj0102-0006.html

Here is a detailed list of POSIX/NFS/DFA differences:
http://www.51773.com/tools/Mastering%20Regular%20ExpressionsEdition/0596528124/regex3-CHP-4-SECT-6.html

See you in another article
13
Comment
6 Comments
 
LVL 35

Expert Comment

by:Terry Woods
Updated examples for 2 sections that seem incorrect:

== "\d", "\w" and "\s" characters ==

Matches digits, word characters (letters, digits, underscores) and whitespaces (tabs, spaces, line breaks) relatively.

RegEx: \dm1st3r\d
Match: 5m1st3r2

== "\D", "\W" and "\S" characters ==

The negated versions of the above.

RegEx: 1\D2\W3\S4
Match: 1g2%354
0
 
LVL 11

Author Comment

by:Batuhan Cetin
Thanks Terry, I edited the article as your correction. Thanks for reading.
0
 
LVL 75

Expert Comment

by:käµfm³d 👽
BatuhanCetin,

Would it be OK with you if I link to your article from mine? I will not be presenting any information you covered; I will merely be pointing newcomers your way  :)
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 11

Author Comment

by:Batuhan Cetin
Hi kaufmed,

Sorry I was away for some time and just read your post. If you're still interested, you can link to my article.
0
 
LVL 75

Expert Comment

by:käµfm³d 👽
Hello BatuhanCetin,

It seems I'm now the one who has been away for some time! Thanks. I'm finishing up one now  :)
0
 

Expert Comment

by:xenium
Thanks a lot this guide is proving useful having come from google docs complete lack of help on the topic. I hadn't even heard of "Regular expression" which must be one of the biggest misnomers in programming!

I've still a way to go...if anyone can help i've got a question open on the topic..
http://www.experts-exchange.com/Programming/Languages/Regular_Expressions/Q_28531374.html

Thanks a lot
0

Featured Post

Vote for the Most Valuable Expert

It’s time to recognize experts that go above and beyond with helpful solutions and engagement on site. Choose from the top experts in the Hall of Fame or on the right rail of your favorite topic page. Look for the blue “Nominate” button on their profile to vote.

Join & Write a Comment

Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

Keep in touch with Experts Exchange

Tech news and trends delivered to your inbox every month