Solved

understanding regex expressions in perl

Posted on 2006-06-19
13
202 Views
Last Modified: 2011-10-03
hi

i have come across this code in perl (written by someone else who is no longer around ) which has a bunch of statements like


1) $l =~ /(.*)^V\#REPLACEME^X^Y^W(.*)/;
 
 2)if ($1 =~ /^V/) {
                         do some stuff
                 }
3)if ($2 =~ /^V/) {
                        do more stuff
                }


 4)$l =~ s/^V\#REPLACEME^X^Y^W/$somevariable/;


Could you help me uunderstand what hes trying to do here

thanks

0
Comment
Question by:Vlearns
13 Comments
 
LVL 19

Expert Comment

by:Kim Ryan
ID: 16938747
Basically regex is identifying patterns and optionally substituting them with other values.  Rather than a detailed decoding of these examples (which are not written in the best style anyhow), I think you could gain a good understanding by studying this tutorial  http://search.cpan.org/dist/perl/pod/perlre.pod
0
 
LVL 84

Assisted Solution

by:ozo
ozo earned 100 total points
ID: 16939040
perl -MYAPE::Regex::Explain -e 'print YAPE::Regex::Explain->new($_)->explain for qr/(.*)^V\#REPLACEME^X^Y^W(.*)/, qr/^V/, qr/^V\#REPLACEME^X^Y^W/'
The regular expression:

(?-imsx:(.*)^V\#REPLACEME^X^Y^W(.*))

matches as follows:
 
NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    .*                       any character except \n (0 or more times
                             (matching the most amount possible))
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  V                        'V'
----------------------------------------------------------------------
  \#                       '#'
----------------------------------------------------------------------
  REPLACEME                'REPLACEME'
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  X                        'X'
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  Y                        'Y'
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  W                        'W'
----------------------------------------------------------------------
  (                        group and capture to \2:
----------------------------------------------------------------------
    .*                       any character except \n (0 or more times
                             (matching the most amount possible))
----------------------------------------------------------------------
  )                        end of \2
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------
The regular expression:

(?-imsx:^V)

matches as follows:
 
NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  V                        'V'
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------
The regular expression:

(?-imsx:^V\#REPLACEME^X^Y^W)

matches as follows:
 
NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  V                        'V'
----------------------------------------------------------------------
  \#                       '#'
----------------------------------------------------------------------
  REPLACEME                'REPLACEME'
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  X                        'X'
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  Y                        'Y'
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  W                        'W'
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------
0
 
LVL 22

Accepted Solution

by:
pjedmond earned 400 total points
ID: 16939119
1) $l =~ /(.*)^V\#REPLACEME^X^Y^W(.*)/;
 
In this case, the match is for .(any char), followed by * (0 or more times) followed by the string "^V#REPLACEME^X^Y^W", followed by any charachter any number of times.
Effectively, and string containing "^V#REPLACEME^X^Y^W" is matched. ^ only means the beginning of a string at the beginning of a regex!


 2)if ($1 =~ /^V/) {
                         do some stuff
                 }
If $1 starts (^)with a "V" then do some stuff.

3)if ($2 =~ /^V/) {
                        do more stuff
                }


 4)$l =~ s/^V\#REPLACEME^X^Y^W/$somevariable/;

If the string starts with "V#REPLACEME^X^Y^W", then replace the string with $somevariable. The s indicates that a substitution is required.

Worth having a look here, and adding it to your book mark!:

http://www.itlab.musc.edu/docs/perl_regexp/
http://www.perl.com/doc/manual/html/pod/perlre.html
http://www.anaesthetist.com/mnm/perl/regex.htm

HTH:)
0
Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
LVL 22

Expert Comment

by:pjedmond
ID: 16939123
The =~ is a 'matches a regegex' directive.

..andobviously 3 is the same as 2!

HTH:)
0
 
LVL 84

Expert Comment

by:ozo
ID: 16939239
Since matching the begining of the string several separate times within the string makes no sense,  was $* set anywhere in the program?
0
 
LVL 22

Expert Comment

by:pjedmond
ID: 16939269
ozo - No-one said that this programmer that is no longer around had 'sense';)
0
 
LVL 84

Expert Comment

by:ozo
ID: 16939339
Even if $* is set to non-zero, XY has no begining of a line between them, so it still makes no sense
perhaps the programmer thought ^Y meant \cY
what is expected to be in $l when 1) is executed?
 =~ is a binding operator, the right argument can me a m//, s/// or tr///
or an expression, which is interpreted as a search pattern
0
 
LVL 22

Expert Comment

by:pjedmond
ID: 16939351
The ^ only means the beginning of a line at the beginning of the regex. Elsewhere it means a literal ^
Although it is a bizarre string to be checking for, it is validas I understand it.
0
 
LVL 22

Expert Comment

by:pjedmond
ID: 16939361
The Regex::Explain module that you used has it's transistors in a twist! Would you trust a machine to get it right all the time? As the say:

To err is human...
but to foul things up completely requires a computer.

Remember computers are still programmed by humans;)
0
 
LVL 84

Expert Comment

by:ozo
ID: 16939392
> The ^ only means the beginning of a line at the beginning of the regex. Elsewhere it means a literal ^
that is incorrect
try
print "X^Y^Z" if "X^Y^Z" =~ /X^Y^Z/;

print "X\nY\nZ" if "X\nY\nZ" =~ /X\n^Y\n^Z/m;

print "XYZ:$1" if "XYZ" =~ /(?!^Z)((?<=^X|^Y).)/;

0
 

Author Comment

by:Vlearns
ID: 16940395
pjedmond  was correct about the intent of that crazy expression
ozo showed a new method of understanding  using that perl library

now i know why perl is a "write only" language :)

thanks experts..u guys are the best!
0
 
LVL 84

Expert Comment

by:ozo
ID: 16940410
To do a literal match for '^V#REPLACEME^X^Y^W', you could use
\Q^V#REPLACEME^X^Y^W\E
0
 
LVL 22

Expert Comment

by:pjedmond
ID: 16940506
Looks like I'm getting my sed regex expressions mixed up woth Perl!
0

Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

On Microsoft Windows, if  when you click or type the name of a .pl file, you get an error "is not recognized as an internal or external command, operable program or batch file", then this means you do not have the .pl file extension associated with …
I have been pestered over the years to produce and distribute regular data extracts, and often the request have explicitly requested the data be emailed as an Excel attachement; specifically Excel, as it appears: CSV files confuse (no Red or Green h…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

808 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question