Link to home
Start Free TrialLog in
Avatar of simadownnow
simadownnow

asked on

How to parse file with Perl

Hi,
  I am trying to parse a message file and extract messages from the file that meet the criteria.  So what I am looking for are certain types of messages that are logged in a file.  Example.  I would like to pull out messages that have say CA, DC, OP, etc.. from this file example:  It is located after the ORC segment.  I only want to pull out the message when it finds the two letter code between the pipes, usually located on the ORC line.   I want it to match the |CA| in ORC and not consider the CA in CAMPING as a match.  

2nd Part:  How can I pull out a message that falls between the two Ignored: lines if the next line after Ignored: is ***Message Ignored, Incorrect Order Type?  I would want everything that falls after that first ignored until the next ignored appears.  Hopefully this isn't too confusing and I described exactly what I want.  My code can pull out info but it matches also on the CA in CAMPING.  I am also doing a count for each time it appears and I only want it to find the |CA| instances as well so this does not work correctly as well.

Ignored:
*** Message Ignored, Incorrect Order Type
MSH|^~\&|RP|GJH|ALL|ASD|20080922001707||RDE^O01|20080922001707008697|P|2.3|||||||||
PID|||1234567^^^^^||Person||number|L||||||||||||||||||||||||||||||
PV1||H|B12^B1266^A^K||||100022^James^James,|||||||||||I|947258034|||||||||||||||||||||||||||||||||
ORC|CA|In63708710|947258034-24-1|20315933|||1^BID&0800,2000^INDEF^200809220010^^R^^11111110^||200809220017|1965435^FFR^SHN|1965435^FFR^SHN|100022^James^James,|||||||196u86735^FFR^SHN
RXO|5132^Product^SEQNO||||||||||||||||
RXE|1^BID&0800,2000^INDEF^200809220010^^R^^11111110^|8246300^CAMPING|50||mg|TAP|^|||1|EACH||100022||||||||VEND|||||||||F|24240000^Description|||||||||||||M| ^
RXG|1||1^20080922001000
RXG|2||1^20080922080000

Ignored:
*** Message Ignored, Incorrect Order Type:
MSH|^~\&|RX|GJH|ALL_..........
Next message in similar structure as above....


Any help would be really appreciated.
#! /usr/bin/perl
use warnings;
use strict;
use diagnostics;
 
open(INFILE,  "rxp.ign")   or die "Can't open input.txt: $!";
open(OUTFILE, ">resultsCA.txt") or die "Can't open output.txt: $!";
 
while (<INFILE>) {
 
     if( /\bCA\b/ig ) {
         print OUTFILE $_;
     }
 }
 
close OUTFILE;
close INFILE;
 
my $val = <rxp2.ign>;
chomp ($val);
my $cnt=0;
 
open (HNDL, "$val") || die "wrong filename";
	while ($val = <HNDL>)
  {
	while ($val =~ /\bCA\b/ig)
  {
        ++$cnt;
  }
print "Number of instances of 'CA's' found: $cnt2\n\n";

Open in new window

Avatar of ahoffmann
ahoffmann
Flag of Germany image

awk '/^Ignored:/{if(f==1){print x};f==0;}/^ORC\|(CA|DC|OP)\|/{f=1}{x=sprintf("%s\n%s",x,$0)}END{if(f==1){print x}}'  you-file
Avatar of simadownnow
simadownnow

ASKER

I've never used the awk command before because I'm new to Perl.  Could you explain a little better what this is doing?  Do I need to declare any variables, awk?  Also when you write you-file, does that mean my file name that I want to parse?  rxp.ign?  in quotes or anything?  Does this also count the instances of each type of message CA, DC, OP?  
> .. explain a little
set a flag if string Ignored is found at beginning of line
collect all lines if flaf is set
print collected line if string Ignored is found (and after reading file, as there is no more such line but probably a collected one)

> Do I need to declare any variables,
no, as all variables are o (integer) or '' (empty string) by default

> does that mean my file name that I want to parse?
yes

> in quotes or anything?
depends on your shell (i.g. without quotes if the filename does not contains meta characters)

> Does this also count the instances of each type of message CA, DC, OP?
no
to do that, use something like:

awk '/^Ignored:/{if(f==1){print x};f=0;}/^ORC\|(CA|DC|OP)\|/{f=1}{x=sprintf("%s\n%s",x,$0);if(/\|CA\|/){c++}};if(/\|DC\|/){d++}};if(/\|OP\|/){o++}}}END{if(f==1){print x};print "CA: ",c;print "DC: ",d;print "OP: ",o}'  rxp.ign

(not that my first post contains an error: f==0 muxt be f=0)

---
that's quick&dirty with awk, if you need more text precessing it's probably better to start with perl right away
This log file is created on a windows box so I am using active perl for winXP which is what my workstation is.  
Thanks for the quick reply and explanation.  I don't know how to insert this in my perl code and execute the awk command.  I've tried but I get errors.  Should there be a BEGIN statement to go with the END that is in the code?  
same as perl code (quick&dirty converted from awk)

perl -ane 'm/^Ignored:/&&do{if($f==1){print $x};$f=0;};m/^ORC\|(CA|DC|OP)\|/&&do{$f=1};{$x.=$_;if(/\|CA\|/){$c++};if(/\|DC\|/){$d++};if(/\|OP\|/){$o++}}END{if($f==1){print $x};print "CA: ",$c;print ", DC: ",$d;print ", OP: ",$o;}' rxp.ign
Hoffman,
  What is the -ane in the command line?  Usually when I run a perl script I run it from the command line i.e perl whatever.pl.  Have you got this to parse the example I put up?  You can duplicate the message back to back to increase the message instances.  I can't get this to run, and I am probably executing it wrong.  Man I am a lamen with PERL, I need to pick up a book.  Sorry to keep asking, this must seem mundane to you..  
> Have you got this to parse the example I put up?
simply stuff anthing between single quotes ' in your .pl file and execute it

> I can't get this to run ..
are you on unreliable systems like windoze? bad luck, you have to use a file for the script or fiddle arround M$'s strange handling of any kind of quotes.
Get any reliable shell and it works as posted. or use a script file. Sorry, I'm not responsible for stupid systems :)

> Man I am a lamen with PERL,
we fix that ;-)

> What is the -ane ...
man perl
man perlrun

-a  awk mode
-n  no print
-e  execute these commands

or more detailled (shamless stolen from perl's man-pages):

  -a   turns on autosplit mode when used with a -n or -p.  An implicit split command to the @F array is done as
        the first thing inside the implicit while loop produced by the -n or -p.

   -e commandline
       may be used to enter one line of program.  If -e is given, Perl will not look for a filename in the argu­
       ment list.  Multiple -e commands may be given to build up a multi-line script.  Make sure to use semicolons
       where you would in a normal program.

       -n   causes Perl to assume the following loop around your program, which makes it iterate over filename argu­
            ments somewhat like sed -n or awk:

              LINE:
                while (<>) {
                    ...             # your program goes here
                }

            Note that the lines are not printed by default.  See -p to have lines printed.  If a file named by an argu­
            ment cannot be opened for some reason, Perl warns you about it and moves on to the next file.

            Here is an efficient way to delete all files older than a week:

                find . -mtime +7 -print | perl -nle unlink

            This is faster than using the -exec switch of find because you don't have to start a process on every file­
            name found.  It does suffer from the bug of mishandling newlines in pathnames, which you can fix if you
            follow the example under -0.

            "BEGIN" and "END" blocks may be used to capture control before or after the implicit program loop, just as
            in awk.
I am still confused how to get this to run.  I am using windowsXP and that is what I need to run the script on.  Unfortunatley I cannot use a more reliable shell.  I want to run the command against a file, I don't want to have to copy and paste data between quotes.   So I would like to run a cmd line  such as perl (extract.pl) and have the script execute the commands from within extract.pl script which will then open the file with the data and parse it.  this is how I usually get commands to run.

So am I suppose to put  'm/^Ignored:/&&do{if($f==1){print $x};$f=0;};m/^ORC\|(CA|DC|OP)\|/&&do{$f=1};{$x.=$_;if(/\|CA\|/){$c++};if(/\|DC\|/){$d++};if(/\|OP\|/){$o++}}END{if($f==1){print $x};print "CA: ",$c;print ", DC: ",$d;print ", OP: ",$o;}' rxp.ign  in the extract.pl  so that it can run?  

> .. would like to run a cmd line  such as perl (extract.pl)  ..
simply write the code between the single quotes in your file (extract.pl) and run it like
  perl extract.pl rxp.ign
I did and I got this

C:\Logs>perl extract.pl rxp.ign
Useless use of a constant in void context at extract.pl line 6 (#1)
    (W void) You did something without a side effect in a context that does
    nothing with the return value, such as a statement that doesn't return a
    value from a block, or the left side of a scalar comma operator.  Very
    often this points not to stupidity on your part, but a failure of Perl
    to parse your program the way you thought it would.  For example, you'd
    get this if you mixed up your C precedence with Python precedence and
    said

        $one, $two = 1, 2;

    when you meant to say

        ($one, $two) = (1, 2);

    Another common error is to use ordinary parentheses to construct a list
    reference when you should be using square or curly brackets, for
    example, if you say

        $array = (1,2);

    when you should have said

        $array = [1,2];

    The square brackets explicitly turn a list value into a scalar value,
    while parentheses do not.  So when a parenthesized list is evaluated in
    a scalar context, the comma is treated like C's comma operator, which
    throws away the left argument, which is not what you want.  See
    perlref for more on this.

    This warning will not be issued for numerical constants equal to 0 or 1
    since they are often used in statements like

        1 while sub_with_side_effects();

    String constants that would normally evaluate to 0 or 1 are warned
SOLUTION
Avatar of ozo
ozo
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
> I did and I got this
use following as first line of your script

my $c=$d=$o=$f=0; my $x='';
Okay I ran the exact command that  ozo did, and added the declarations that Ahoff. put up.  It works on counting the instances but I want to be able to extract each of those types of messages once the locator field is found as I mentioned in the question and put them in a file for each so CA.txt, DC.txt.  If a CA occurs I want the entire message from the end of the first ignore to the beginning of the next ignore if that makes it easy enough.   I dont want it to pull out CA if it finds it in a word within the entire message "like CALL" as well which I believe this is doing just like the script I wrote and pasted.  Hopefully we can make it a little stricter on searching and matching.   Almost there....
What if I wanted what was just on the second line for each message and possibly count those i.e
*** Message Ignored, Incorrect Order Type:
*** Message Ignored, Multi-component order not supported.
*** Message Ignored, Incorrect Order Type: OP
*** Message Ignored, Incorrect Order Type: NW
*** Message Ignored, Incorrect Order Type: DC

# of OP messages = $OP

SO would it be     if(/\|***Message Ignored,Incorrect Order Type: OP\|/){$OP++};
I think the ***create issues due to wildcard, not sure how to use as a search character within Perl

I would like it to capture anything that appears after the *** on that line after the Ignored and before the MSH next line.  If these could also be totalled, that would be great as well.  This may help me to be able to change the script to what I need in other cases.
if(/\*\*\*Message Ignored,Incorrect Order Type: OP/){$OP++};
Okay I an going to work this in, to see if this helps.  So lets say if this line matches so the IF statement is TRUE and it increments OP by one, can I put the message that follows into a OP file and append all other OP messages when it matches that statement. so from the MSH to the end of that message or the next Ignored:?
> .. can I put the message that follows into a OP file ..
assuming you opend that file at beginning, somehow like:

 open(OP, ">>path/to/OP-file")||die"ERROR: cannot open OP-file: $!";

then you can write to that file like:
   print OP $_; # assuming that $_ contains your current line
But how does it know where to start the message and end the message for output into the OP file after it matches the current line I am searching on?
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
thanks for the help!