Solved

understanding regex expressions in perl

Posted on 2006-06-19
13
201 Views
Last Modified: 2011-10-03
hi

i have come across this code in perl (written by someone else who is no longer around ) which has a bunch of statements like


1) $l =~ /(.*)^V\#REPLACEME^X^Y^W(.*)/;
 
 2)if ($1 =~ /^V/) {
                         do some stuff
                 }
3)if ($2 =~ /^V/) {
                        do more stuff
                }


 4)$l =~ s/^V\#REPLACEME^X^Y^W/$somevariable/;


Could you help me uunderstand what hes trying to do here

thanks

0
Comment
Question by:Vlearns
13 Comments
 
LVL 19

Expert Comment

by:Kim Ryan
ID: 16938747
Basically regex is identifying patterns and optionally substituting them with other values.  Rather than a detailed decoding of these examples (which are not written in the best style anyhow), I think you could gain a good understanding by studying this tutorial  http://search.cpan.org/dist/perl/pod/perlre.pod
0
 
LVL 84

Assisted Solution

by:ozo
ozo earned 100 total points
ID: 16939040
perl -MYAPE::Regex::Explain -e 'print YAPE::Regex::Explain->new($_)->explain for qr/(.*)^V\#REPLACEME^X^Y^W(.*)/, qr/^V/, qr/^V\#REPLACEME^X^Y^W/'
The regular expression:

(?-imsx:(.*)^V\#REPLACEME^X^Y^W(.*))

matches as follows:
 
NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    .*                       any character except \n (0 or more times
                             (matching the most amount possible))
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  V                        'V'
----------------------------------------------------------------------
  \#                       '#'
----------------------------------------------------------------------
  REPLACEME                'REPLACEME'
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  X                        'X'
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  Y                        'Y'
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  W                        'W'
----------------------------------------------------------------------
  (                        group and capture to \2:
----------------------------------------------------------------------
    .*                       any character except \n (0 or more times
                             (matching the most amount possible))
----------------------------------------------------------------------
  )                        end of \2
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------
The regular expression:

(?-imsx:^V)

matches as follows:
 
NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  V                        'V'
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------
The regular expression:

(?-imsx:^V\#REPLACEME^X^Y^W)

matches as follows:
 
NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  V                        'V'
----------------------------------------------------------------------
  \#                       '#'
----------------------------------------------------------------------
  REPLACEME                'REPLACEME'
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  X                        'X'
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  Y                        'Y'
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  W                        'W'
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------
0
 
LVL 22

Accepted Solution

by:
pjedmond earned 400 total points
ID: 16939119
1) $l =~ /(.*)^V\#REPLACEME^X^Y^W(.*)/;
 
In this case, the match is for .(any char), followed by * (0 or more times) followed by the string "^V#REPLACEME^X^Y^W", followed by any charachter any number of times.
Effectively, and string containing "^V#REPLACEME^X^Y^W" is matched. ^ only means the beginning of a string at the beginning of a regex!


 2)if ($1 =~ /^V/) {
                         do some stuff
                 }
If $1 starts (^)with a "V" then do some stuff.

3)if ($2 =~ /^V/) {
                        do more stuff
                }


 4)$l =~ s/^V\#REPLACEME^X^Y^W/$somevariable/;

If the string starts with "V#REPLACEME^X^Y^W", then replace the string with $somevariable. The s indicates that a substitution is required.

Worth having a look here, and adding it to your book mark!:

http://www.itlab.musc.edu/docs/perl_regexp/
http://www.perl.com/doc/manual/html/pod/perlre.html
http://www.anaesthetist.com/mnm/perl/regex.htm

HTH:)
0
PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

 
LVL 22

Expert Comment

by:pjedmond
ID: 16939123
The =~ is a 'matches a regegex' directive.

..andobviously 3 is the same as 2!

HTH:)
0
 
LVL 84

Expert Comment

by:ozo
ID: 16939239
Since matching the begining of the string several separate times within the string makes no sense,  was $* set anywhere in the program?
0
 
LVL 22

Expert Comment

by:pjedmond
ID: 16939269
ozo - No-one said that this programmer that is no longer around had 'sense';)
0
 
LVL 84

Expert Comment

by:ozo
ID: 16939339
Even if $* is set to non-zero, XY has no begining of a line between them, so it still makes no sense
perhaps the programmer thought ^Y meant \cY
what is expected to be in $l when 1) is executed?
 =~ is a binding operator, the right argument can me a m//, s/// or tr///
or an expression, which is interpreted as a search pattern
0
 
LVL 22

Expert Comment

by:pjedmond
ID: 16939351
The ^ only means the beginning of a line at the beginning of the regex. Elsewhere it means a literal ^
Although it is a bizarre string to be checking for, it is validas I understand it.
0
 
LVL 22

Expert Comment

by:pjedmond
ID: 16939361
The Regex::Explain module that you used has it's transistors in a twist! Would you trust a machine to get it right all the time? As the say:

To err is human...
but to foul things up completely requires a computer.

Remember computers are still programmed by humans;)
0
 
LVL 84

Expert Comment

by:ozo
ID: 16939392
> The ^ only means the beginning of a line at the beginning of the regex. Elsewhere it means a literal ^
that is incorrect
try
print "X^Y^Z" if "X^Y^Z" =~ /X^Y^Z/;

print "X\nY\nZ" if "X\nY\nZ" =~ /X\n^Y\n^Z/m;

print "XYZ:$1" if "XYZ" =~ /(?!^Z)((?<=^X|^Y).)/;

0
 

Author Comment

by:Vlearns
ID: 16940395
pjedmond  was correct about the intent of that crazy expression
ozo showed a new method of understanding  using that perl library

now i know why perl is a "write only" language :)

thanks experts..u guys are the best!
0
 
LVL 84

Expert Comment

by:ozo
ID: 16940410
To do a literal match for '^V#REPLACEME^X^Y^W', you could use
\Q^V#REPLACEME^X^Y^W\E
0
 
LVL 22

Expert Comment

by:pjedmond
ID: 16940506
Looks like I'm getting my sed regex expressions mixed up woth Perl!
0

Featured Post

3 Use Cases for Connected Systems

Our Dev teams are like yours. They’re continually cranking out code for new features/bugs fixes, testing, deploying, testing some more, responding to production monitoring events and more. It’s complex. So, we thought you’d like to see what’s working for us.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Removing file extension within a file. 4 94
Port 80 requests 16 96
Perl modules on linux ec2 3 104
Perl script to delete older files 6 87
In the distant past (last year) I hacked together a little toy that would allow a couple of Manager types to query, preview, and extract data from a number of MongoDB instances, to their tool of choice: Excel (http://dilbert.com/strips/comic/2007-08…
Checking the Alert Log in AWS RDS Oracle can be a pain through their user interface.  I made a script to download the Alert Log, look for errors, and email me the trace files.  In this article I'll describe what I did and share my script.
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
A short tutorial showing how to set up an email signature in Outlook on the Web (previously known as OWA). For free email signatures designs, visit https://www.mail-signatures.com/articles/signature-templates/?sts=6651 If you want to manage em…

810 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question