Solved

understanding regex expressions in perl

Posted on 2006-06-19
13
199 Views
Last Modified: 2011-10-03
hi

i have come across this code in perl (written by someone else who is no longer around ) which has a bunch of statements like


1) $l =~ /(.*)^V\#REPLACEME^X^Y^W(.*)/;
 
 2)if ($1 =~ /^V/) {
                         do some stuff
                 }
3)if ($2 =~ /^V/) {
                        do more stuff
                }


 4)$l =~ s/^V\#REPLACEME^X^Y^W/$somevariable/;


Could you help me uunderstand what hes trying to do here

thanks

0
Comment
Question by:Vlearns
13 Comments
 
LVL 19

Expert Comment

by:Kim Ryan
Comment Utility
Basically regex is identifying patterns and optionally substituting them with other values.  Rather than a detailed decoding of these examples (which are not written in the best style anyhow), I think you could gain a good understanding by studying this tutorial  http://search.cpan.org/dist/perl/pod/perlre.pod
0
 
LVL 84

Assisted Solution

by:ozo
ozo earned 100 total points
Comment Utility
perl -MYAPE::Regex::Explain -e 'print YAPE::Regex::Explain->new($_)->explain for qr/(.*)^V\#REPLACEME^X^Y^W(.*)/, qr/^V/, qr/^V\#REPLACEME^X^Y^W/'
The regular expression:

(?-imsx:(.*)^V\#REPLACEME^X^Y^W(.*))

matches as follows:
 
NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    .*                       any character except \n (0 or more times
                             (matching the most amount possible))
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  V                        'V'
----------------------------------------------------------------------
  \#                       '#'
----------------------------------------------------------------------
  REPLACEME                'REPLACEME'
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  X                        'X'
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  Y                        'Y'
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  W                        'W'
----------------------------------------------------------------------
  (                        group and capture to \2:
----------------------------------------------------------------------
    .*                       any character except \n (0 or more times
                             (matching the most amount possible))
----------------------------------------------------------------------
  )                        end of \2
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------
The regular expression:

(?-imsx:^V)

matches as follows:
 
NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  V                        'V'
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------
The regular expression:

(?-imsx:^V\#REPLACEME^X^Y^W)

matches as follows:
 
NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  V                        'V'
----------------------------------------------------------------------
  \#                       '#'
----------------------------------------------------------------------
  REPLACEME                'REPLACEME'
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  X                        'X'
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  Y                        'Y'
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  W                        'W'
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------
0
 
LVL 22

Accepted Solution

by:
pjedmond earned 400 total points
Comment Utility
1) $l =~ /(.*)^V\#REPLACEME^X^Y^W(.*)/;
 
In this case, the match is for .(any char), followed by * (0 or more times) followed by the string "^V#REPLACEME^X^Y^W", followed by any charachter any number of times.
Effectively, and string containing "^V#REPLACEME^X^Y^W" is matched. ^ only means the beginning of a string at the beginning of a regex!


 2)if ($1 =~ /^V/) {
                         do some stuff
                 }
If $1 starts (^)with a "V" then do some stuff.

3)if ($2 =~ /^V/) {
                        do more stuff
                }


 4)$l =~ s/^V\#REPLACEME^X^Y^W/$somevariable/;

If the string starts with "V#REPLACEME^X^Y^W", then replace the string with $somevariable. The s indicates that a substitution is required.

Worth having a look here, and adding it to your book mark!:

http://www.itlab.musc.edu/docs/perl_regexp/
http://www.perl.com/doc/manual/html/pod/perlre.html
http://www.anaesthetist.com/mnm/perl/regex.htm

HTH:)
0
 
LVL 22

Expert Comment

by:pjedmond
Comment Utility
The =~ is a 'matches a regegex' directive.

..andobviously 3 is the same as 2!

HTH:)
0
 
LVL 84

Expert Comment

by:ozo
Comment Utility
Since matching the begining of the string several separate times within the string makes no sense,  was $* set anywhere in the program?
0
 
LVL 22

Expert Comment

by:pjedmond
Comment Utility
ozo - No-one said that this programmer that is no longer around had 'sense';)
0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 
LVL 84

Expert Comment

by:ozo
Comment Utility
Even if $* is set to non-zero, XY has no begining of a line between them, so it still makes no sense
perhaps the programmer thought ^Y meant \cY
what is expected to be in $l when 1) is executed?
 =~ is a binding operator, the right argument can me a m//, s/// or tr///
or an expression, which is interpreted as a search pattern
0
 
LVL 22

Expert Comment

by:pjedmond
Comment Utility
The ^ only means the beginning of a line at the beginning of the regex. Elsewhere it means a literal ^
Although it is a bizarre string to be checking for, it is validas I understand it.
0
 
LVL 22

Expert Comment

by:pjedmond
Comment Utility
The Regex::Explain module that you used has it's transistors in a twist! Would you trust a machine to get it right all the time? As the say:

To err is human...
but to foul things up completely requires a computer.

Remember computers are still programmed by humans;)
0
 
LVL 84

Expert Comment

by:ozo
Comment Utility
> The ^ only means the beginning of a line at the beginning of the regex. Elsewhere it means a literal ^
that is incorrect
try
print "X^Y^Z" if "X^Y^Z" =~ /X^Y^Z/;

print "X\nY\nZ" if "X\nY\nZ" =~ /X\n^Y\n^Z/m;

print "XYZ:$1" if "XYZ" =~ /(?!^Z)((?<=^X|^Y).)/;

0
 

Author Comment

by:Vlearns
Comment Utility
pjedmond  was correct about the intent of that crazy expression
ozo showed a new method of understanding  using that perl library

now i know why perl is a "write only" language :)

thanks experts..u guys are the best!
0
 
LVL 84

Expert Comment

by:ozo
Comment Utility
To do a literal match for '^V#REPLACEME^X^Y^W', you could use
\Q^V#REPLACEME^X^Y^W\E
0
 
LVL 22

Expert Comment

by:pjedmond
Comment Utility
Looks like I'm getting my sed regex expressions mixed up woth Perl!
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Email validation in proper way is  very important validation required in any web pages. This code is self explainable except that Regular Expression which I used for pattern matching. I originally published as a thread on my website : http://www…
In the distant past (last year) I hacked together a little toy that would allow a couple of Manager types to query, preview, and extract data from a number of MongoDB instances, to their tool of choice: Excel (http://dilbert.com/strips/comic/2007-08…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Excel styles will make formatting consistent and let you apply and change formatting faster. In this tutorial, you'll learn how to use Excel's built-in styles, how to modify styles, and how to create your own. You'll also learn how to use your custo…

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now