Solved

understanding regex expressions in perl

Posted on 2006-06-19
13
204 Views
Last Modified: 2011-10-03
hi

i have come across this code in perl (written by someone else who is no longer around ) which has a bunch of statements like


1) $l =~ /(.*)^V\#REPLACEME^X^Y^W(.*)/;
 
 2)if ($1 =~ /^V/) {
                         do some stuff
                 }
3)if ($2 =~ /^V/) {
                        do more stuff
                }


 4)$l =~ s/^V\#REPLACEME^X^Y^W/$somevariable/;


Could you help me uunderstand what hes trying to do here

thanks

0
Comment
Question by:Vlearns
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
13 Comments
 
LVL 19

Expert Comment

by:Kim Ryan
ID: 16938747
Basically regex is identifying patterns and optionally substituting them with other values.  Rather than a detailed decoding of these examples (which are not written in the best style anyhow), I think you could gain a good understanding by studying this tutorial  http://search.cpan.org/dist/perl/pod/perlre.pod
0
 
LVL 84

Assisted Solution

by:ozo
ozo earned 100 total points
ID: 16939040
perl -MYAPE::Regex::Explain -e 'print YAPE::Regex::Explain->new($_)->explain for qr/(.*)^V\#REPLACEME^X^Y^W(.*)/, qr/^V/, qr/^V\#REPLACEME^X^Y^W/'
The regular expression:

(?-imsx:(.*)^V\#REPLACEME^X^Y^W(.*))

matches as follows:
 
NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    .*                       any character except \n (0 or more times
                             (matching the most amount possible))
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  V                        'V'
----------------------------------------------------------------------
  \#                       '#'
----------------------------------------------------------------------
  REPLACEME                'REPLACEME'
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  X                        'X'
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  Y                        'Y'
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  W                        'W'
----------------------------------------------------------------------
  (                        group and capture to \2:
----------------------------------------------------------------------
    .*                       any character except \n (0 or more times
                             (matching the most amount possible))
----------------------------------------------------------------------
  )                        end of \2
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------
The regular expression:

(?-imsx:^V)

matches as follows:
 
NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  V                        'V'
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------
The regular expression:

(?-imsx:^V\#REPLACEME^X^Y^W)

matches as follows:
 
NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  V                        'V'
----------------------------------------------------------------------
  \#                       '#'
----------------------------------------------------------------------
  REPLACEME                'REPLACEME'
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  X                        'X'
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  Y                        'Y'
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  W                        'W'
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------
0
 
LVL 22

Accepted Solution

by:
pjedmond earned 400 total points
ID: 16939119
1) $l =~ /(.*)^V\#REPLACEME^X^Y^W(.*)/;
 
In this case, the match is for .(any char), followed by * (0 or more times) followed by the string "^V#REPLACEME^X^Y^W", followed by any charachter any number of times.
Effectively, and string containing "^V#REPLACEME^X^Y^W" is matched. ^ only means the beginning of a string at the beginning of a regex!


 2)if ($1 =~ /^V/) {
                         do some stuff
                 }
If $1 starts (^)with a "V" then do some stuff.

3)if ($2 =~ /^V/) {
                        do more stuff
                }


 4)$l =~ s/^V\#REPLACEME^X^Y^W/$somevariable/;

If the string starts with "V#REPLACEME^X^Y^W", then replace the string with $somevariable. The s indicates that a substitution is required.

Worth having a look here, and adding it to your book mark!:

http://www.itlab.musc.edu/docs/perl_regexp/
http://www.perl.com/doc/manual/html/pod/perlre.html
http://www.anaesthetist.com/mnm/perl/regex.htm

HTH:)
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 22

Expert Comment

by:pjedmond
ID: 16939123
The =~ is a 'matches a regegex' directive.

..andobviously 3 is the same as 2!

HTH:)
0
 
LVL 84

Expert Comment

by:ozo
ID: 16939239
Since matching the begining of the string several separate times within the string makes no sense,  was $* set anywhere in the program?
0
 
LVL 22

Expert Comment

by:pjedmond
ID: 16939269
ozo - No-one said that this programmer that is no longer around had 'sense';)
0
 
LVL 84

Expert Comment

by:ozo
ID: 16939339
Even if $* is set to non-zero, XY has no begining of a line between them, so it still makes no sense
perhaps the programmer thought ^Y meant \cY
what is expected to be in $l when 1) is executed?
 =~ is a binding operator, the right argument can me a m//, s/// or tr///
or an expression, which is interpreted as a search pattern
0
 
LVL 22

Expert Comment

by:pjedmond
ID: 16939351
The ^ only means the beginning of a line at the beginning of the regex. Elsewhere it means a literal ^
Although it is a bizarre string to be checking for, it is validas I understand it.
0
 
LVL 22

Expert Comment

by:pjedmond
ID: 16939361
The Regex::Explain module that you used has it's transistors in a twist! Would you trust a machine to get it right all the time? As the say:

To err is human...
but to foul things up completely requires a computer.

Remember computers are still programmed by humans;)
0
 
LVL 84

Expert Comment

by:ozo
ID: 16939392
> The ^ only means the beginning of a line at the beginning of the regex. Elsewhere it means a literal ^
that is incorrect
try
print "X^Y^Z" if "X^Y^Z" =~ /X^Y^Z/;

print "X\nY\nZ" if "X\nY\nZ" =~ /X\n^Y\n^Z/m;

print "XYZ:$1" if "XYZ" =~ /(?!^Z)((?<=^X|^Y).)/;

0
 

Author Comment

by:Vlearns
ID: 16940395
pjedmond  was correct about the intent of that crazy expression
ozo showed a new method of understanding  using that perl library

now i know why perl is a "write only" language :)

thanks experts..u guys are the best!
0
 
LVL 84

Expert Comment

by:ozo
ID: 16940410
To do a literal match for '^V#REPLACEME^X^Y^W', you could use
\Q^V#REPLACEME^X^Y^W\E
0
 
LVL 22

Expert Comment

by:pjedmond
ID: 16940506
Looks like I'm getting my sed regex expressions mixed up woth Perl!
0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
perl search and replace 6 172
PERL export multiple query results to a JSON file 1 178
stftime format 4 58
Log File Creation with Header and Footer 17 161
There are many situations when we need to display the data in sorted order. For example: Student details by name or by rank or by total marks etc. If you are working on data driven based projects then you will use sorting techniques very frequently.…
In the distant past (last year) I hacked together a little toy that would allow a couple of Manager types to query, preview, and extract data from a number of MongoDB instances, to their tool of choice: Excel (http://dilbert.com/strips/comic/2007-08…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

726 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question