asked on

regex tutorial help

Hi,

I'm just trying to practice regexes. I made a dummy string with an age and address in it. I want to pull out the age and the building number from the string. So I'm really looking for a sequence of 3 numbers, then another sequence of 3 numbers. Here's the script:

use strict;

my $str = "hello I am 500 years old and my address is 123 Main Street.";

# Try to find the age and the address.
if ($str =~ m/ (\d{3})[\w\s]+(\d{3})/) {
print("Yeah it matched and the extracted stuff is: $1, $2", "\n");
print($1, "\n"); // the age
print($2, "\n"); // the building number
}

I get the first extraction:

(\d{3}) - look for a sequence of 3 digits.

I don't get this part:

[\w\s]+

how do you express the [] brackets? I jsut gave it a shot cause I saw it somewhere else, but what I was trying to express is:

look for a sequence of 3 digits, then some spaces and characters, then another sequence of 3 digits.

so the [\w\s]+ is the (some spaces and characters) part, I just don't understand technically what it is saying.

Thanks

ASKER CERTIFIED SOLUTION

hielo

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

SOLUTION

FishMonger

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

DJ_AM_Juicebox

ASKER

Ah ok yeah this makes more sense to me:

m/ (\d{3})\D+(\d{3})/)

so that says look for 3 digits, followed by one or more non-digit characters (so this includes alphabet chars and whitespaces), then 3 digits, right?

SOLUTION

ozo

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

SOLUTION

Adam314

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

hielo

>>so that says look for 3 digits, followed by one or more non-digit characters (so this includes alphabet chars and whitespaces), then 3 digits, right?
Exactly! Wasn't that easy? :)

ozo

perl -MYAPE::Regex::Explain -e "print YAPE::Regex::Explain->new(qr/ (\d{3})[\w\s]+(\d{3})/(->explain"
syntax error at -e line 1, near "qr/ (\d{3})[\w\s]+(\d{3})/("
Execution of -e aborted due to compilation errors.
PowerMac-G5:~/ee dmi$ perl -MYAPE::Regex::Explain -e "print YAPE::Regex::Explain->new(qr/ (\d{3})[\w\s]+(\d{3})/)->explain"
The regular expression:

(?-imsx: (\d{3})[\w\s]+(\d{3}))

matches as follows:

NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
' '
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
\d{3} digits (0-9) (3 times)
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
[\w\s]+ any character of: word characters (a-z, A-
Z, 0-9, _), whitespace (\n, \r, \t, \f,
and " ") (1 or more times (matching the
most amount possible))
----------------------------------------------------------------------
( group and capture to \2:
----------------------------------------------------------------------
\d{3} digits (0-9) (3 times)
----------------------------------------------------------------------
) end of \2
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------

SOLUTION

ghostdog74

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

ozo

if there is no requirement that the sequences have 3 digits