Regular Expressions with JavaScript

Published:
Regular Expressions with JavaScript
By Ivo Stoykov

Regular Expression is a very useful object that allows developers to apply variety of patterns on a text object. Combination of different metacharacters, switches and modifiers allows creation of complicated patterns that helps reducing the necessary code to manipulate different string sources, like user input to mention one. Often Regular expressions can substitute other string handling functionality which otherwise will need complicated loops. Each JavaScript execution thread has one RegExp object pre-initialized.


Syntax

Regular Expressions construction consists of two parts.

First part is the pattern, which is mandatory one. It follows Perl regular expression syntax, but implements less features than available in Perl. Pattern acts as a template describing what will be sought in the text.

Second part consists of optional modifiers, which describe how pattern must be applied. If used, they can be placed in any combinations.

In JavaScript Regular expressions are handled by RegExp Object. As any object it has a constructor for explicit declaration.
var re = new RegExp(“pattern”, “modifiers”);

Open in new window

RegExp constructor accepts two string parameters – first for the pattern and second for the modifiers. Because parameters are strings if quotation marks are between characters they must be escaped by a backslash.

RegExp could be created implicitly by assigning a pattern and optional modifiers to a variable.
var re = /pattern/modifiers;

Open in new window

The above statement assigns a RegExp object to the variable re. Please note that there are no quotation marks anywhere in the right part of the assignment. They are not required but might appear as a part of the pattern.

Supported modifies in JavaScript are:
g - for global match
i – ignore case in search
m – multiline search


Properties

RegExp object has following bool properties that correspond to modifiers above. They specify whether a particular modifier has been set
RegExpObject.global– true/false depending on whether /g modifier is set
RegExpObject.ignoreCase– true/false depending on whether /m modifier is set
RegExpObject.multiline– true/false depending on whether /m modifier is set

Additionally there are two more properties:

RegExpObject.lastIndex - specifies the index at which to start the next match. It works if the "g" modifier is set and returns an integer that specifies the character position immediately after the last match found by exec( ) or test( ) methods.
RegExpObject.source - contains the text of the RegExp pattern.


Patterns

Patterns might consist from any combination of alphabetic characters and metacharacters. Any character preceded by a backslash is treated as a metacharacter. If it is not in the range listed below an error is thrown. Any metacharacter that should be treated as regular literal must be escaped with a preceding backslash.

Please note that because backslash is used to define a character as a metacharacter it must be doubled (escaped) when backslash itself must be treated as an ordinary character. For instance:
var re = /\w/

Open in new window

will search for any word character, i.e. character in range a-z, while
var re = /\\w/

Open in new window

will search for a backslash followed by letter “w”.


Brackets

Round Brackets

Round brackets used in the pattern are treated as delimiters and define exact sequence of characters to search for. If round bracket is sought as a literal it must be escaped with backslash. The part of the patter matched the round brackets contents is available in special RegExp properties named with numbers – RegExp.$1 … RegExp.$9
var re = /(\w+)\s(\w+)/;
                      var name = "John Doe";
                      var name = name.replace(re, "$2, $1");  // name value now is "Doe, John"

Open in new window


Additionally pattern in the round brackets accept following modifiers at the beginning:

?:      match is not stored for later use
var str="comunity industries source Regexp";
                      var res = str.match(/\w+(?:y|ies)/gi);
                      // res value now is: comunity,industries

Open in new window

?= Positive lookahead. Match is not stored for later use.
str.match(/JavaScript (?=\d)/gi);

Open in new window

will match “JavaScript 2” but not “JavaScript ”.
?! Negative lookahead. Match is not stored for later use.
str.match(/JavaScript (?!\d)/gi);

Open in new window

Just the opposite - will match “JavaScript” but not “JavaScript 2”.

Square Brackets

Square brackets define character set patter. It matches any of the characters in the set. Putting “^” character in the beginning negates and matches all but listed characters. Characters in square brackets might be abbreviated by using a hyphen. I.e. following is equivalent:
[abcd] and [a-d]. This will match any of the characters “a”, “b”, “c” and “d”.
Be aware that when using hyphen range must be in ascending order [a-d], otherwise [d-a] an error will be thrown.
[^0-9]      matches anything but digits.
[a-z]      matches any (“a” through “z”) lowercase character only.

Curly brackets

Curly brackets are used for quantifier definitions. They might be used in one of following three variations:
{x} Matches x sequences of the character preceding brackets.
/\d{3}/

Open in new window

will search for three consecutive number characters.
{x,y} Matches minimum x but not more than y sequences of the character preceding brackets.
/\d{3,5}/

Open in new window

will search for at least three but not more than 5 consecutive number characters.
{x,} Matches minimum x sequences of the character preceding brackets.
/\d{3,}/

Open in new window

 will search for at least three consecutive number characters.

Numbers in curly brackets must be positive. When two values are supplied (min & max) min values must be first one.

Metacharacters

.      Search for any single character, except newline or line terminator
\w      Search for any word character
\W      Search for any non-word character
\d      Search for any digit
\D      Search for any non-digit character
\s      Search for any whitespace character (including tabs, form-feed, etc)
\S      Search for any non-whitespace character
\b      Search for any match at the beginning/end of a word
\B      Search for any match not at the beginning/end of a word
\0      Search for any NUL character
\n      Search for any new line character
\f      Search for any form feed character
\r      Search for any carriage return character
\t      Search for any tab character
\v      Search for any vertical tab character
\xxx      Search for character specified by an octal number xxx
\xdd      Search for character specified by a hexadecimal number dd
\uxxxx      Search for Unicode character specified by a hexadecimal number xxxx

Quantifies

Quantifies are used to define repetition of the pattern. (See also curly brackets)

+      Search the character preceding “+” that repeats consecutively at least once
*      Search the character preceding “*” that repeats consecutively zero or more times
?      Search the character preceding “?” that repeats zero or one time
$      Search the character preceding “$” at the end of the string
^      Search the character preceding “^” at the beginning of the string


Methods

compile()

Open in new window

This method compiles a regular expression during execution of a script. This method is suitable for dynamically constructed regular expressions and allows changing an expression. It requires pattern as input parameter.
var str="Everywhere in the world!";
                      str2=str.replace(/every/ig,"Any"); // str2 = Anywhere in the world!
                      patt=/Any/ig;
                      patt.compile(patt);
                      str2=str.replace(patt,"Every"); // str2 = Everywhere in the world!

Open in new window


match()

Open in new window

This method requires a pattern as input parameter and tests whether it exists in the string object. It returns an array of matches, or null if no match is found.
var str="Hello World!";
                      var res = str.match(/\wo/gi);   //res value is: lo,Wo

Open in new window


test()

Open in new window

This method requires a pattern as input parameter and returns true or false, depending on whether pattern exists in the string object.
var str = “Test string”;
                      if(str.match(/st/)) { /* true because st found at position 3 */ }

Open in new window


exec()

Open in new window

This method inspects the string object for a pattern supplied as input parameter.  Result is the text found or null if no match is found.
var patt1=new RegExp("re", "i");
                      var res = patt1.exec("RegExp is a great option");
                      // res = “Re” because first two letter match to ignore case modifier.

Open in new window

Additionally RegExp object can be used in String.replace() method. It returns a copy of String Object after the replacements is made.


Samples

Check whether a string is not blank
var res = “string_to_test”.search(/\S/);

Open in new window

Check string for a number:
var res = “12345”.search(/^\s*\d+\s*$/);
                      var res = “123.45”.search(/^[-]?[0-9]+[\.]?[0-9]+$/);

Open in new window

Email check:
/^(\w+\.)*\w+@(\w+\.)+[A-Za-z]+$/;

Open in new window

Password check:
/^[A-Za-z\d]{6,8}$/

Open in new window

Validate an URL:
/^[A-Za-z]+://[A-Za-z0-9-_]+\\.[A-Za-z0-9-_%&\?\/.=]+$/

Open in new window



Compatibility

Regular Expressions is part of ECMA-262 language standard. JavaScript (from version 1.2) ActionScript (from version 3) and Jscript support it. In browsers support begins from Internet Explorer 4 and Netscape 4 and later.


Usefull links

http://www.w3schools.com/jsref/jsref_obj_regexp.asp
http://www.regular-expressions.info
http://msdn.microsoft.com/en-us/library/9dthzd08%28VS.85%29.aspx
http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-262.pdf
0
4,382 Views

Comments (0)

Have a question about something in this article? You can receive help directly from the article author. Sign up for a free trial to get started.