[Last Call] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 209
  • Last Modified:

understanding regex expressions in perl

hi

i have come across this code in perl (written by someone else who is no longer around ) which has a bunch of statements like


1) $l =~ /(.*)^V\#REPLACEME^X^Y^W(.*)/;
 
 2)if ($1 =~ /^V/) {
                         do some stuff
                 }
3)if ($2 =~ /^V/) {
                        do more stuff
                }


 4)$l =~ s/^V\#REPLACEME^X^Y^W/$somevariable/;


Could you help me uunderstand what hes trying to do here

thanks

0
Vlearns
Asked:
Vlearns
2 Solutions
 
Kim RyanIT ConsultantCommented:
Basically regex is identifying patterns and optionally substituting them with other values.  Rather than a detailed decoding of these examples (which are not written in the best style anyhow), I think you could gain a good understanding by studying this tutorial  http://search.cpan.org/dist/perl/pod/perlre.pod
0
 
ozoCommented:
perl -MYAPE::Regex::Explain -e 'print YAPE::Regex::Explain->new($_)->explain for qr/(.*)^V\#REPLACEME^X^Y^W(.*)/, qr/^V/, qr/^V\#REPLACEME^X^Y^W/'
The regular expression:

(?-imsx:(.*)^V\#REPLACEME^X^Y^W(.*))

matches as follows:
 
NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    .*                       any character except \n (0 or more times
                             (matching the most amount possible))
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  V                        'V'
----------------------------------------------------------------------
  \#                       '#'
----------------------------------------------------------------------
  REPLACEME                'REPLACEME'
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  X                        'X'
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  Y                        'Y'
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  W                        'W'
----------------------------------------------------------------------
  (                        group and capture to \2:
----------------------------------------------------------------------
    .*                       any character except \n (0 or more times
                             (matching the most amount possible))
----------------------------------------------------------------------
  )                        end of \2
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------
The regular expression:

(?-imsx:^V)

matches as follows:
 
NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  V                        'V'
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------
The regular expression:

(?-imsx:^V\#REPLACEME^X^Y^W)

matches as follows:
 
NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  V                        'V'
----------------------------------------------------------------------
  \#                       '#'
----------------------------------------------------------------------
  REPLACEME                'REPLACEME'
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  X                        'X'
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  Y                        'Y'
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  W                        'W'
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------
0
 
pjedmondCommented:
1) $l =~ /(.*)^V\#REPLACEME^X^Y^W(.*)/;
 
In this case, the match is for .(any char), followed by * (0 or more times) followed by the string "^V#REPLACEME^X^Y^W", followed by any charachter any number of times.
Effectively, and string containing "^V#REPLACEME^X^Y^W" is matched. ^ only means the beginning of a string at the beginning of a regex!


 2)if ($1 =~ /^V/) {
                         do some stuff
                 }
If $1 starts (^)with a "V" then do some stuff.

3)if ($2 =~ /^V/) {
                        do more stuff
                }


 4)$l =~ s/^V\#REPLACEME^X^Y^W/$somevariable/;

If the string starts with "V#REPLACEME^X^Y^W", then replace the string with $somevariable. The s indicates that a substitution is required.

Worth having a look here, and adding it to your book mark!:

http://www.itlab.musc.edu/docs/perl_regexp/
http://www.perl.com/doc/manual/html/pod/perlre.html
http://www.anaesthetist.com/mnm/perl/regex.htm

HTH:)
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
pjedmondCommented:
The =~ is a 'matches a regegex' directive.

..andobviously 3 is the same as 2!

HTH:)
0
 
ozoCommented:
Since matching the begining of the string several separate times within the string makes no sense,  was $* set anywhere in the program?
0
 
pjedmondCommented:
ozo - No-one said that this programmer that is no longer around had 'sense';)
0
 
ozoCommented:
Even if $* is set to non-zero, XY has no begining of a line between them, so it still makes no sense
perhaps the programmer thought ^Y meant \cY
what is expected to be in $l when 1) is executed?
 =~ is a binding operator, the right argument can me a m//, s/// or tr///
or an expression, which is interpreted as a search pattern
0
 
pjedmondCommented:
The ^ only means the beginning of a line at the beginning of the regex. Elsewhere it means a literal ^
Although it is a bizarre string to be checking for, it is validas I understand it.
0
 
pjedmondCommented:
The Regex::Explain module that you used has it's transistors in a twist! Would you trust a machine to get it right all the time? As the say:

To err is human...
but to foul things up completely requires a computer.

Remember computers are still programmed by humans;)
0
 
ozoCommented:
> The ^ only means the beginning of a line at the beginning of the regex. Elsewhere it means a literal ^
that is incorrect
try
print "X^Y^Z" if "X^Y^Z" =~ /X^Y^Z/;

print "X\nY\nZ" if "X\nY\nZ" =~ /X\n^Y\n^Z/m;

print "XYZ:$1" if "XYZ" =~ /(?!^Z)((?<=^X|^Y).)/;

0
 
VlearnsAuthor Commented:
pjedmond  was correct about the intent of that crazy expression
ozo showed a new method of understanding  using that perl library

now i know why perl is a "write only" language :)

thanks experts..u guys are the best!
0
 
ozoCommented:
To do a literal match for '^V#REPLACEME^X^Y^W', you could use
\Q^V#REPLACEME^X^Y^W\E
0
 
pjedmondCommented:
Looks like I'm getting my sed regex expressions mixed up woth Perl!
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Tackle projects and never again get stuck behind a technical roadblock.
Join Now