Go Premium for a chance to win a PS4. Enter to Win

x
?
Solved

regex tutorial help

Posted on 2008-06-25
9
Medium Priority
?
299 Views
Last Modified: 2010-03-05
Hi,

I'm just trying to practice regexes. I made a dummy string with an age and address in it. I want to pull out the age and the building number from the string. So I'm really looking for a sequence of 3 numbers, then another sequence of 3 numbers.  Here's the script:


use strict;

my $str = "hello I am 500 years old and my address is 123 Main Street.";


# Try to find the age and the address.
if ($str =~ m/ (\d{3})[\w\s]+(\d{3})/) {
    print("Yeah it matched and the extracted stuff is: $1, $2", "\n");
    print($1, "\n"); // the age
    print($2, "\n"); // the building number
}


I get the first extraction:

    (\d{3})   - look for a sequence of 3 digits.

I don't get this part:

    [\w\s]+

how do you express the [] brackets? I jsut gave it a shot cause I saw it somewhere else, but what I was trying to express is:

    look for a sequence of 3 digits, then some spaces and characters, then another sequence of 3 digits.

so the [\w\s]+ is the (some spaces and characters) part, I just don't understand technically what it is saying.

Thanks
0
Comment
Question by:DJ_AM_Juicebox
9 Comments
 
LVL 82

Accepted Solution

by:
hielo earned 400 total points
ID: 21867482
>> look for a sequence of 3 digits, then some spaces and characters, then another sequence of 3 digits.
The problem is that \w is shortcut for [a-zA-Z0-9_]. So when you get to your second set of digits, it also forms part of the [].

If you take a step back and look at your input string again, another way to look at it is some digits followed by non-digits followed by digits:
if ($str =~ m/ (\d{3})\D+(\d{3})/) {
0
 
LVL 28

Assisted Solution

by:FishMonger
FishMonger earned 400 total points
ID: 21867737
I'd take an additional step and use named vars instead of $1 and $2.

my $str = "hello I am 500 years old and my address 123 is Main Street.";
 
if ( (my $age, $number) = $str =~ /(\d{3})\D+(\d{3})/ ) {
    print "Yeah it matched and the extracted stuff is:\n";
    print "$age\n";
    print "$number\n";
}

Open in new window

0
 

Author Comment

by:DJ_AM_Juicebox
ID: 21867989
Ah ok yeah this makes more sense to me:

    m/ (\d{3})\D+(\d{3})/)


so that says look for 3 digits, followed by one or more non-digit characters (so this includes alphabet chars and whitespaces), then 3 digits, right?
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
LVL 85

Assisted Solution

by:ozo
ozo earned 400 total points
ID: 21868071
yes, although it would  also match the 123 in
hello I am 500 years old and my address is 1234 Main Street
0
 
LVL 39

Assisted Solution

by:Adam314
Adam314 earned 400 total points
ID: 21868728
Your original regex worked, and the [\w\s] didn't match the numbers because if it did, the overall regex wouldn't match - there were no numbers left for the \d+ to match.  The regex will have each +, *, or {min,max} using as many characters as possible, while still allowing the overall regex to match.  If you use +?, *?, or {min,max}?, then it will match as few as possible, while still allowing the overall regex to match.
0
 
LVL 82

Expert Comment

by:hielo
ID: 21869001
>>so that says look for 3 digits, followed by one or more non-digit characters (so this includes alphabet chars and whitespaces), then 3 digits, right?
Exactly! Wasn't that easy? :)
0
 
LVL 85

Expert Comment

by:ozo
ID: 21870806
perl -MYAPE::Regex::Explain -e "print YAPE::Regex::Explain->new(qr/ (\d{3})[\w\s]+(\d{3})/(->explain"
syntax error at -e line 1, near "qr/ (\d{3})[\w\s]+(\d{3})/("
Execution of -e aborted due to compilation errors.
PowerMac-G5:~/ee dmi$ perl -MYAPE::Regex::Explain -e "print YAPE::Regex::Explain->new(qr/ (\d{3})[\w\s]+(\d{3})/)->explain"
The regular expression:

(?-imsx: (\d{3})[\w\s]+(\d{3}))

matches as follows:
 
NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
                           ' '
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    \d{3}                    digits (0-9) (3 times)
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  [\w\s]+                  any character of: word characters (a-z, A-
                           Z, 0-9, _), whitespace (\n, \r, \t, \f,
                           and " ") (1 or more times (matching the
                           most amount possible))
----------------------------------------------------------------------
  (                        group and capture to \2:
----------------------------------------------------------------------
    \d{3}                    digits (0-9) (3 times)
----------------------------------------------------------------------
  )                        end of \2
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------
0
 
LVL 9

Assisted Solution

by:ghostdog74
ghostdog74 earned 400 total points
ID: 21871218
don't need that much regexp.
you can use split to split by non digits.
my $str = "hello I am 500 years old and my address is 123 Main Street.";
@array = split( /\D+/ ,$str);
print @array;

Open in new window

0
 
LVL 85

Expert Comment

by:ozo
ID: 21872012
if there is no requirement that the sequences have 3 digits
0

Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Many time we need to work with multiple files all together. If its windows system then we can use some GUI based editor to accomplish our task. But what if you are on putty or have only CLI(Command Line Interface) as an option to  edit your files. I…
I have been pestered over the years to produce and distribute regular data extracts, and often the request have explicitly requested the data be emailed as an Excel attachement; specifically Excel, as it appears: CSV files confuse (no Red or Green h…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans

926 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question