regex tutorial help

Hi,

I'm just trying to practice regexes. I made a dummy string with an age and address in it. I want to pull out the age and the building number from the string. So I'm really looking for a sequence of 3 numbers, then another sequence of 3 numbers.  Here's the script:


use strict;

my $str = "hello I am 500 years old and my address is 123 Main Street.";


# Try to find the age and the address.
if ($str =~ m/ (\d{3})[\w\s]+(\d{3})/) {
    print("Yeah it matched and the extracted stuff is: $1, $2", "\n");
    print($1, "\n"); // the age
    print($2, "\n"); // the building number
}


I get the first extraction:

    (\d{3})   - look for a sequence of 3 digits.

I don't get this part:

    [\w\s]+

how do you express the [] brackets? I jsut gave it a shot cause I saw it somewhere else, but what I was trying to express is:

    look for a sequence of 3 digits, then some spaces and characters, then another sequence of 3 digits.

so the [\w\s]+ is the (some spaces and characters) part, I just don't understand technically what it is saying.

Thanks
DJ_AM_JuiceboxAsked:
Who is Participating?
 
hieloConnect With a Mentor Commented:
>> look for a sequence of 3 digits, then some spaces and characters, then another sequence of 3 digits.
The problem is that \w is shortcut for [a-zA-Z0-9_]. So when you get to your second set of digits, it also forms part of the [].

If you take a step back and look at your input string again, another way to look at it is some digits followed by non-digits followed by digits:
if ($str =~ m/ (\d{3})\D+(\d{3})/) {
0
 
FishMongerConnect With a Mentor Commented:
I'd take an additional step and use named vars instead of $1 and $2.

my $str = "hello I am 500 years old and my address 123 is Main Street.";
 
if ( (my $age, $number) = $str =~ /(\d{3})\D+(\d{3})/ ) {
    print "Yeah it matched and the extracted stuff is:\n";
    print "$age\n";
    print "$number\n";
}

Open in new window

0
 
DJ_AM_JuiceboxAuthor Commented:
Ah ok yeah this makes more sense to me:

    m/ (\d{3})\D+(\d{3})/)


so that says look for 3 digits, followed by one or more non-digit characters (so this includes alphabet chars and whitespaces), then 3 digits, right?
0
Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
ozoConnect With a Mentor Commented:
yes, although it would  also match the 123 in
hello I am 500 years old and my address is 1234 Main Street
0
 
Adam314Connect With a Mentor Commented:
Your original regex worked, and the [\w\s] didn't match the numbers because if it did, the overall regex wouldn't match - there were no numbers left for the \d+ to match.  The regex will have each +, *, or {min,max} using as many characters as possible, while still allowing the overall regex to match.  If you use +?, *?, or {min,max}?, then it will match as few as possible, while still allowing the overall regex to match.
0
 
hieloCommented:
>>so that says look for 3 digits, followed by one or more non-digit characters (so this includes alphabet chars and whitespaces), then 3 digits, right?
Exactly! Wasn't that easy? :)
0
 
ozoCommented:
perl -MYAPE::Regex::Explain -e "print YAPE::Regex::Explain->new(qr/ (\d{3})[\w\s]+(\d{3})/(->explain"
syntax error at -e line 1, near "qr/ (\d{3})[\w\s]+(\d{3})/("
Execution of -e aborted due to compilation errors.
PowerMac-G5:~/ee dmi$ perl -MYAPE::Regex::Explain -e "print YAPE::Regex::Explain->new(qr/ (\d{3})[\w\s]+(\d{3})/)->explain"
The regular expression:

(?-imsx: (\d{3})[\w\s]+(\d{3}))

matches as follows:
 
NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
                           ' '
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    \d{3}                    digits (0-9) (3 times)
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  [\w\s]+                  any character of: word characters (a-z, A-
                           Z, 0-9, _), whitespace (\n, \r, \t, \f,
                           and " ") (1 or more times (matching the
                           most amount possible))
----------------------------------------------------------------------
  (                        group and capture to \2:
----------------------------------------------------------------------
    \d{3}                    digits (0-9) (3 times)
----------------------------------------------------------------------
  )                        end of \2
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------
0
 
ghostdog74Connect With a Mentor Commented:
don't need that much regexp.
you can use split to split by non digits.
my $str = "hello I am 500 years old and my address is 123 Main Street.";
@array = split( /\D+/ ,$str);
print @array;

Open in new window

0
 
ozoCommented:
if there is no requirement that the sequences have 3 digits
0
All Courses

From novice to tech pro — start learning today.