How to parse file with Perl

Posted on 2008-10-06
Last Modified: 2012-08-13
  I am trying to parse a message file and extract messages from the file that meet the criteria.  So what I am looking for are certain types of messages that are logged in a file.  Example.  I would like to pull out messages that have say CA, DC, OP, etc.. from this file example:  It is located after the ORC segment.  I only want to pull out the message when it finds the two letter code between the pipes, usually located on the ORC line.   I want it to match the |CA| in ORC and not consider the CA in CAMPING as a match.  

2nd Part:  How can I pull out a message that falls between the two Ignored: lines if the next line after Ignored: is ***Message Ignored, Incorrect Order Type?  I would want everything that falls after that first ignored until the next ignored appears.  Hopefully this isn't too confusing and I described exactly what I want.  My code can pull out info but it matches also on the CA in CAMPING.  I am also doing a count for each time it appears and I only want it to find the |CA| instances as well so this does not work correctly as well.

*** Message Ignored, Incorrect Order Type
RXE|1^BID&0800,2000^INDEF^200809220010^^R^^11111110^|8246300^CAMPING|50||mg|TAP|^|||1|EACH||100022||||||||VEND|||||||||F|24240000^Description|||||||||||||M| ^

*** Message Ignored, Incorrect Order Type:
Next message in similar structure as above....

Any help would be really appreciated.
#! /usr/bin/perl
use warnings;
use strict;
use diagnostics;
open(INFILE,  "rxp.ign")   or die "Can't open input.txt: $!";
open(OUTFILE, ">resultsCA.txt") or die "Can't open output.txt: $!";
while (<INFILE>) {
     if( /\bCA\b/ig ) {
         print OUTFILE $_;
close OUTFILE;
close INFILE;
my $val = <rxp2.ign>;
chomp ($val);
my $cnt=0;
open (HNDL, "$val") || die "wrong filename";
	while ($val = <HNDL>)
	while ($val =~ /\bCA\b/ig)
print "Number of instances of 'CA's' found: $cnt2\n\n";

Open in new window

Question by:simadownnow
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 10
  • 9
LVL 51

Expert Comment

ID: 22657596
awk '/^Ignored:/{if(f==1){print x};f==0;}/^ORC\|(CA|DC|OP)\|/{f=1}{x=sprintf("%s\n%s",x,$0)}END{if(f==1){print x}}'  you-file

Author Comment

ID: 22658824
I've never used the awk command before because I'm new to Perl.  Could you explain a little better what this is doing?  Do I need to declare any variables, awk?  Also when you write you-file, does that mean my file name that I want to parse?  rxp.ign?  in quotes or anything?  Does this also count the instances of each type of message CA, DC, OP?  
LVL 51

Expert Comment

ID: 22658986
> .. explain a little
set a flag if string Ignored is found at beginning of line
collect all lines if flaf is set
print collected line if string Ignored is found (and after reading file, as there is no more such line but probably a collected one)

> Do I need to declare any variables,
no, as all variables are o (integer) or '' (empty string) by default

> does that mean my file name that I want to parse?

> in quotes or anything?
depends on your shell (i.g. without quotes if the filename does not contains meta characters)

> Does this also count the instances of each type of message CA, DC, OP?
to do that, use something like:

awk '/^Ignored:/{if(f==1){print x};f=0;}/^ORC\|(CA|DC|OP)\|/{f=1}{x=sprintf("%s\n%s",x,$0);if(/\|CA\|/){c++}};if(/\|DC\|/){d++}};if(/\|OP\|/){o++}}}END{if(f==1){print x};print "CA: ",c;print "DC: ",d;print "OP: ",o}'  rxp.ign

(not that my first post contains an error: f==0 muxt be f=0)

that's quick&dirty with awk, if you need more text precessing it's probably better to start with perl right away
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!


Author Comment

ID: 22659222
This log file is created on a windows box so I am using active perl for winXP which is what my workstation is.  
Thanks for the quick reply and explanation.  I don't know how to insert this in my perl code and execute the awk command.  I've tried but I get errors.  Should there be a BEGIN statement to go with the END that is in the code?  
LVL 51

Expert Comment

ID: 22659366
same as perl code (quick&dirty converted from awk)

perl -ane 'm/^Ignored:/&&do{if($f==1){print $x};$f=0;};m/^ORC\|(CA|DC|OP)\|/&&do{$f=1};{$x.=$_;if(/\|CA\|/){$c++};if(/\|DC\|/){$d++};if(/\|OP\|/){$o++}}END{if($f==1){print $x};print "CA: ",$c;print ", DC: ",$d;print ", OP: ",$o;}' rxp.ign

Author Comment

ID: 22659987
  What is the -ane in the command line?  Usually when I run a perl script I run it from the command line i.e perl  Have you got this to parse the example I put up?  You can duplicate the message back to back to increase the message instances.  I can't get this to run, and I am probably executing it wrong.  Man I am a lamen with PERL, I need to pick up a book.  Sorry to keep asking, this must seem mundane to you..  
LVL 51

Expert Comment

ID: 22661337
> Have you got this to parse the example I put up?
simply stuff anthing between single quotes ' in your .pl file and execute it

> I can't get this to run ..
are you on unreliable systems like windoze? bad luck, you have to use a file for the script or fiddle arround M$'s strange handling of any kind of quotes.
Get any reliable shell and it works as posted. or use a script file. Sorry, I'm not responsible for stupid systems :)

> Man I am a lamen with PERL,
we fix that ;-)

> What is the -ane ...
man perl
man perlrun

-a  awk mode
-n  no print
-e  execute these commands

or more detailled (shamless stolen from perl's man-pages):

  -a   turns on autosplit mode when used with a -n or -p.  An implicit split command to the @F array is done as
        the first thing inside the implicit while loop produced by the -n or -p.

   -e commandline
       may be used to enter one line of program.  If -e is given, Perl will not look for a filename in the argu­
       ment list.  Multiple -e commands may be given to build up a multi-line script.  Make sure to use semicolons
       where you would in a normal program.

       -n   causes Perl to assume the following loop around your program, which makes it iterate over filename argu­
            ments somewhat like sed -n or awk:

                while (<>) {
                    ...             # your program goes here

            Note that the lines are not printed by default.  See -p to have lines printed.  If a file named by an argu­
            ment cannot be opened for some reason, Perl warns you about it and moves on to the next file.

            Here is an efficient way to delete all files older than a week:

                find . -mtime +7 -print | perl -nle unlink

            This is faster than using the -exec switch of find because you don't have to start a process on every file­
            name found.  It does suffer from the bug of mishandling newlines in pathnames, which you can fix if you
            follow the example under -0.

            "BEGIN" and "END" blocks may be used to capture control before or after the implicit program loop, just as
            in awk.

Author Comment

ID: 22689892
I am still confused how to get this to run.  I am using windowsXP and that is what I need to run the script on.  Unfortunatley I cannot use a more reliable shell.  I want to run the command against a file, I don't want to have to copy and paste data between quotes.   So I would like to run a cmd line  such as perl ( and have the script execute the commands from within script which will then open the file with the data and parse it.  this is how I usually get commands to run.

So am I suppose to put  'm/^Ignored:/&&do{if($f==1){print $x};$f=0;};m/^ORC\|(CA|DC|OP)\|/&&do{$f=1};{$x.=$_;if(/\|CA\|/){$c++};if(/\|DC\|/){$d++};if(/\|OP\|/){$o++}}END{if($f==1){print $x};print "CA: ",$c;print ", DC: ",$d;print ", OP: ",$o;}' rxp.ign  in the  so that it can run?  

LVL 51

Expert Comment

ID: 22691821
> .. would like to run a cmd line  such as perl (  ..
simply write the code between the single quotes in your file ( and run it like
  perl rxp.ign

Author Comment

ID: 22692380
I did and I got this

C:\Logs>perl rxp.ign
Useless use of a constant in void context at line 6 (#1)
    (W void) You did something without a side effect in a context that does
    nothing with the return value, such as a statement that doesn't return a
    value from a block, or the left side of a scalar comma operator.  Very
    often this points not to stupidity on your part, but a failure of Perl
    to parse your program the way you thought it would.  For example, you'd
    get this if you mixed up your C precedence with Python precedence and

        $one, $two = 1, 2;

    when you meant to say

        ($one, $two) = (1, 2);

    Another common error is to use ordinary parentheses to construct a list
    reference when you should be using square or curly brackets, for
    example, if you say

        $array = (1,2);

    when you should have said

        $array = [1,2];

    The square brackets explicitly turn a list value into a scalar value,
    while parentheses do not.  So when a parenthesized list is evaluated in
    a scalar context, the comma is treated like C's comma operator, which
    throws away the left argument, which is not what you want.  See
    perlref for more on this.

    This warning will not be issued for numerical constants equal to 0 or 1
    since they are often used in statements like

        1 while sub_with_side_effects();

    String constants that would normally evaluate to 0 or 1 are warned
LVL 84

Assisted Solution

ozo earned 20 total points
ID: 22692761
in a dosshell command line, you would have to change the quotes
perl -ane  "m/^Ignored:/&&do{if($f==1){print $x};$f=0;};m/^ORC\|(CA|DC|OP)\|/&&do{$f=1};{$x.=$_;if(/\|CA\|/){$c++};if(/\|DC\|/){$d++};if(/\|OP\|/){$o++}}END{if($f==1){print $x};print 'CA: ',$c;print ', DC: ',$d;print ', OP: ',$o;}" rxp.ign
LVL 51

Expert Comment

ID: 22698189
> I did and I got this
use following as first line of your script

my $c=$d=$o=$f=0; my $x='';

Author Comment

ID: 22710967
Okay I ran the exact command that  ozo did, and added the declarations that Ahoff. put up.  It works on counting the instances but I want to be able to extract each of those types of messages once the locator field is found as I mentioned in the question and put them in a file for each so CA.txt, DC.txt.  If a CA occurs I want the entire message from the end of the first ignore to the beginning of the next ignore if that makes it easy enough.   I dont want it to pull out CA if it finds it in a word within the entire message "like CALL" as well which I believe this is doing just like the script I wrote and pasted.  Hopefully we can make it a little stricter on searching and matching.   Almost there....

Author Comment

ID: 22713699
What if I wanted what was just on the second line for each message and possibly count those i.e
*** Message Ignored, Incorrect Order Type:
*** Message Ignored, Multi-component order not supported.
*** Message Ignored, Incorrect Order Type: OP
*** Message Ignored, Incorrect Order Type: NW
*** Message Ignored, Incorrect Order Type: DC

# of OP messages = $OP

SO would it be     if(/\|***Message Ignored,Incorrect Order Type: OP\|/){$OP++};
I think the ***create issues due to wildcard, not sure how to use as a search character within Perl

I would like it to capture anything that appears after the *** on that line after the Ignored and before the MSH next line.  If these could also be totalled, that would be great as well.  This may help me to be able to change the script to what I need in other cases.
LVL 51

Expert Comment

ID: 22715073
if(/\*\*\*Message Ignored,Incorrect Order Type: OP/){$OP++};

Author Comment

ID: 22716198
Okay I an going to work this in, to see if this helps.  So lets say if this line matches so the IF statement is TRUE and it increments OP by one, can I put the message that follows into a OP file and append all other OP messages when it matches that statement. so from the MSH to the end of that message or the next Ignored:?
LVL 51

Expert Comment

ID: 22718445
> .. can I put the message that follows into a OP file ..
assuming you opend that file at beginning, somehow like:

 open(OP, ">>path/to/OP-file")||die"ERROR: cannot open OP-file: $!";

then you can write to that file like:
   print OP $_; # assuming that $_ contains your current line

Author Comment

ID: 22720832
But how does it know where to start the message and end the message for output into the OP file after it matches the current line I am searching on?
LVL 51

Accepted Solution

ahoffmann earned 230 total points
ID: 22726002
> .. how does it know where to start the message and end the message for output
it prints one line (which contains tha match)

Author Comment

ID: 22758465
thanks for the help!

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
stftime format 4 59
Perl DBI Transactions Using Custom Module 7 49
how to exit a  for loop inside a function with return value in bash 5 114
Merging two files with Perl 5 60
There are many situations when we need to display the data in sorted order. For example: Student details by name or by rank or by total marks etc. If you are working on data driven based projects then you will use sorting techniques very frequently.…
In the distant past (last year) I hacked together a little toy that would allow a couple of Manager types to query, preview, and extract data from a number of MongoDB instances, to their tool of choice: Excel (…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

752 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question